Skip to content

NUSS

Mixed N-Grams and Unigram Sequence Segmentation

v0.1.0 · Aug 19, 2024 · GPL (>= 3)

Description

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Downloads

185

Last 30 days

23086th

492

Last 90 days

1.9K

Last year

Trend: +15.6% (30d vs prior 30d)

CRAN Check Status

2 NOTE
12 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 NOTE
r-oldrel-macos-x86_64 NOTE
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (2 non-OK)
NOTE r-oldrel-macos-arm64

installed package size

installed size is  5.5Mb
  sub-directories of 1Mb or more:
    libs   4.8Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is  5.6Mb
  sub-directories of 1Mb or more:
    libs   4.9Mb

Check History

NOTE 12 OK · 2 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
NOTE r-oldrel-macos-arm64

installed package size

installed size is  5.5Mb
  sub-directories of 1Mb or more:
    libs   4.8Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is  5.6Mb
  sub-directories of 1Mb or more:
    libs   4.9Mb

Dependency Network

Dependencies Reverse dependencies dplyr magrittr Rcpp stringr text2vec textclean NUSS

Version History

new 0.1.0 Mar 10, 2026