Skip to content

piecemaker

Tools for Preparing Text for Tokenizers

v1.0.2 · Jun 2, 2023 · Apache License (>= 2)

Description

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Downloads

303

Last 30 days

13285th

1.2K

Last 90 days

14.8K

Last year

Trend: +4.1% (30d vs prior 30d)

CRAN Check Status

14 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK

Check History

OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Reverse Dependencies (2)

Dependency Network

Dependencies Reverse dependencies cli glue rlang stringi stringr morphemepiece wordpiece piecemaker

Version History

new 1.0.2 Mar 10, 2026
updated 1.0.2 ← 1.0.1 diff Jun 1, 2023
updated 1.0.1 ← 1.0.0 diff Mar 2, 2022
new 1.0.0 Aug 5, 2021