tokenizers.bpe
Byte Pair Encoding Text Tokenization
v0.1.4
·
Sep 5, 2025
·
MPL-2.0
Description
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Downloads
651
Last 30 days
5825th
2.5K
Last 90 days
18.2K
Last year
Trend: -16.5% (30d vs prior 30d)
CRAN Check Status
1
WARNING
2
NOTE
11
OK
Show all 14 flavors
| Flavor | Status |
|---|---|
| r-devel-linux-x86_64-debian-clang | WARNING |
| r-devel-linux-x86_64-debian-gcc | OK |
| r-devel-linux-x86_64-fedora-clang | OK |
| r-devel-linux-x86_64-fedora-gcc | OK |
| r-devel-macos-arm64 | OK |
| r-devel-windows-x86_64 | OK |
| r-oldrel-macos-arm64 | NOTE |
| r-oldrel-macos-x86_64 | NOTE |
| r-oldrel-windows-x86_64 | OK |
| r-patched-linux-x86_64 | OK |
| r-release-linux-x86_64 | OK |
| r-release-macos-arm64 | OK |
| r-release-macos-x86_64 | OK |
| r-release-windows-x86_64 | OK |
Check details (3 non-OK)
WARNING
r-devel-linux-x86_64-debian-clang
whether package can be installed
Found the following significant warnings: ./parallel_hashmap/phmap_base.h:1266:1: warning: 'is_always_equal' is deprecated: use 'std::allocator_traits::is_always_equal' instead [-Wdeprecated-declarations] See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/tokenizers.bpe.Rcheck/00install.out’ for details. * used C++ compiler: ‘Debian clang version 21.1.8 (3+b1)’
NOTE
r-oldrel-macos-arm64
installed package size
installed size is 6.1Mb
sub-directories of 1Mb or more:
libs 5.2Mb
NOTE
r-oldrel-macos-x86_64
installed package size
installed size is 6.2Mb
sub-directories of 1Mb or more:
libs 5.3Mb
Check History
WARNING 11 OK · 2 NOTE · 1 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
WARNING
r-devel-linux-x86_64-debian-clang
whether package can be installed
Found the following significant warnings: ./parallel_hashmap/phmap_base.h:1266:1: warning: 'is_always_equal' is deprecated: use 'std::allocator_traits::is_always_equal' instead [-Wdeprecated-declarations] See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/tokenizers.bpe.Rcheck/00install.out’ for details. * used C++ compiler: ‘Debian clang version 21.1.8 (3+b1)’
NOTE
r-oldrel-macos-arm64
installed package size
installed size is 6.1Mb
sub-directories of 1Mb or more:
libs 5.2Mb
NOTE
r-oldrel-macos-x86_64
installed package size
installed size is 6.2Mb
sub-directories of 1Mb or more:
libs 5.3Mb