Skip to content

tok

Fast Text Tokenization

v0.2.1 · Sep 30, 2025 · MIT + file LICENSE

Description

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Downloads

20.2K

Last 30 days

865th

43.7K

Last 90 days

128.4K

Last year

Trend: +54% (30d vs prior 30d)

CRAN Check Status

8 NOTE
6 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang NOTE
r-devel-linux-x86_64-debian-gcc NOTE
r-devel-linux-x86_64-fedora-clang NOTE
r-devel-linux-x86_64-fedora-gcc NOTE
r-devel-macos-arm64 OK
r-devel-windows-x86_64 NOTE
r-oldrel-macos-arm64 NOTE
r-oldrel-macos-x86_64 NOTE
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 NOTE
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (8 non-OK)
NOTE r-devel-linux-x86_64-debian-clang

compiled code

File ‘tok/libs/tok.so’:
  Found non-API call to R: ‘R_UnboundValue’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.
NOTE r-devel-linux-x86_64-debian-gcc

compiled code

File ‘tok/libs/tok.so’:
  Found non-API call to R: ‘R_UnboundValue’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.
NOTE r-devel-linux-x86_64-fedora-clang

compiled code

File ‘tok/libs/tok.so’:
  Found non-API call to R: ‘R_UnboundValue’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.
NOTE r-devel-linux-x86_64-fedora-gcc

compiled code

File ‘tok/libs/tok.so’:
  Found non-API call to R: ‘R_UnboundValue’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.
NOTE r-devel-windows-x86_64

compiled code

File 'tok/libs/x64/tok.dll':
  Found non-API call to R: 'R_UnboundValue'

Compiled code should not call non-API entry points in R.

See 'Writing portable packages' in the 'Writing R Extensions' manual,
and section 'Moving into C API compliance' for issues with the use of
non-API entry points.
NOTE r-oldrel-macos-arm64

installed package size

installed size is  6.5Mb
  sub-directories of 1Mb or more:
    libs   5.7Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is  6.6Mb
  sub-directories of 1Mb or more:
    libs   5.9Mb
NOTE r-patched-linux-x86_64

compiled code

File ‘tok/libs/tok.so’:
  Found non-API calls to R: ‘R_MissingArg’, ‘R_UnboundValue’

Compiled code should not call non-API entry points in R.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual,
and section ‘Moving into C API compliance’ for issues with the use of
non-API entry points.

Additional Issues

M1mac Details →
M1mac Details →

Check History

NOTE 12 OK · 2 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
NOTE r-oldrel-macos-arm64

installed package size

installed size is  6.5Mb
  sub-directories of 1Mb or more:
    libs   5.7Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is  6.6Mb
  sub-directories of 1Mb or more:
    libs   5.9Mb

Dependency Network

Dependencies Reverse dependencies R6 cli tok

Version History

new 0.2.1 Mar 10, 2026
updated 0.2.1 ← 0.2.0 diff Sep 29, 2025
updated 0.2.0 ← 0.1.4 diff Aug 26, 2025
updated 0.1.4 ← 0.1.3 diff Sep 3, 2024
updated 0.1.3 ← 0.1.2 diff Jul 5, 2024
updated 0.1.2 ← 0.1.1 diff Jun 26, 2024
updated 0.1.1 ← 0.1.0 diff Aug 17, 2023
new 0.1.0 Jul 5, 2023