wordpiece
R Implementation of Wordpiece Tokenization
Description
Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.
Downloads
335
Last 30 days
13322nd
844
Last 90 days
14.8K
Last year
Trend: +42% (30d vs prior 30d)
1
Last 30 days
24
Last 90 days
108
Last year
Trend: -88.9% (30d vs prior 30d)
1
Last 7 days
8
Last 30 days
0
All-time
autoCRAN-only: this name is served only by autoCRAN, so the count is exact.
CRAN Check Status
Show all 13 flavors
| Flavor | Status |
|---|---|
| r-devel-linux-x86_64-debian-clang | NOTE |
| r-devel-linux-x86_64-debian-gcc | NOTE |
| r-devel-linux-x86_64-fedora-clang | NOTE |
| r-devel-linux-x86_64-fedora-gcc | NOTE |
| r-devel-windows-x86_64 | NOTE |
| r-oldrel-macos-arm64 | NOTE |
| r-oldrel-macos-x86_64 | NOTE |
| r-oldrel-windows-x86_64 | NOTE |
| r-patched-linux-x86_64 | NOTE |
| r-release-linux-x86_64 | NOTE |
| r-release-macos-arm64 | NOTE |
| r-release-macos-x86_64 | NOTE |
| r-release-windows-x86_64 | NOTE |
Check details (15 non-OK)
CRAN incoming feasibility
Maintainer: ‘Jonathan Bratt <jonathan.bratt@macmillan.com>’ The Description field contains Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
CRAN incoming feasibility
Maintainer: ‘Jonathan Bratt <jonathan.bratt@macmillan.com>’ The Description field contains Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Check History
NOTE 0 OK · 14 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
CRAN incoming feasibility
Maintainer: ‘Jonathan Bratt <jonathan.bratt@macmillan.com>’ The Description field contains Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.
CRAN incoming feasibility
Maintainer: ‘Jonathan Bratt <jonathan.bratt@macmillan.com>’ The Description field contains Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) Please refer to arXiv e-prints via their arXiv DOI <doi:10.48550/arXiv.YYMM.NNNNN>.
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Rd files
checkRd: (-1) wordpiece_cache_dir.Rd:16-17: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:18-19: Lost braces in \itemize; meant \describe ? checkRd: (-1) wordpiece_cache_dir.Rd:20-21: Lost braces in \itemize; meant \describe ?
Code & tests
Open call graph →Code intelligence has not been computed for this package yet.
Code
Structure
Lines of code
1,215
Files
40
Compiled share
0%
Has compiled src
No
Language breakdown
API
Exported functions
7
Internal functions
19
Recent export changes
Testing & CI
Has tests
Yes
Test-to-code ratio
0.33
testthat edition
3
CI present
No
CI type
[]
PR gated
No
Docs
Return-value doc rate
85.7%
\dontrun example ratio
0%
Roxygen coverage
100%
Has pkgdown
No
NEWS present
Yes
Health & Security signals
Informational signals; not verdicts.
on.exit coverage
0%
Unsafe pattern score
0
Dep constraint coverage
85.7%
Secret pattern count
0
Bundled 3rd-party code
2 items
Portability & License
Min R version
3.3.0
System requirements
–
C++ standard
–
License
Apache License (>= 2)
License flags
SPDX valid, OSI approved
History
Versions
3
First release
2021-02-11
Latest release
2022-03-03
Avg cadence
193 days
Cold removal rate
100%
Dep drift
9
LOC over versions
Per-file churn detail lives in the source pipeline: https://github.com/r-observatory/cran-code-metrics.