Skip to content

corpustools

Managing, Querying and Analyzing Tokenized Text

v0.5.2 · Jul 7, 2025 · GPL-3

Description

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Downloads

464

Last 30 days

8962nd

1.5K

Last 90 days

13.1K

Last year

Trend: -6.6% (30d vs prior 30d)

CRAN Check Status

13 OK
Show all 13 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK

Check History

OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Apr 22, 2026
ERROR 13 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Apr 18, 2026
ERROR r-devel-windows-x86_64

whether package can be installed

Installation failed.
See 'd:/Rcompile/CRANpkg/local/4.6/corpustools.Rcheck/00install.out' for details.
OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Apr 10, 2026
ERROR 13 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Apr 9, 2026
ERROR r-devel-linux-x86_64-debian-gcc

package dependencies

Package required but not available: ‘RNewsflow’

See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’
manual.
OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Reverse Dependencies (2)

imports

suggests

Dependency Network

Dependencies Reverse dependencies wordcloud stringi Rcpp R6 udpipe digest data.table quanteda (>= 1.5.1) igraph tokenbrowser RNewsflow Matrix pbapply rsyntax text2sdg LexisNexisTools corpustools

Version History

15 tracked
new 0.5.2 Mar 10, 2026
updated 0.5.2 ← 0.5.1 diff Jul 6, 2025
updated 0.5.1 ← 0.4.10 diff May 7, 2023
updated 0.4.10 ← 0.4.9 diff May 10, 2022
updated 0.4.9 ← 0.4.8 diff Jan 22, 2022
updated 0.4.8 ← 0.4.7 diff Jun 24, 2021
updated 0.4.7 ← 0.4.6 diff Feb 27, 2021
updated 0.4.6 ← 0.4.5 diff Feb 2, 2021
updated 0.4.5 ← 0.4.4 diff Jan 12, 2021
updated 0.4.4 ← 0.4.2 diff Jan 6, 2021
updated 0.4.2 ← 0.4.1 diff Jan 22, 2020
updated 0.4.1 ← 0.3.3 diff Nov 19, 2019
updated 0.3.3 ← 0.3.1 diff Apr 19, 2018
updated 0.3.1 ← 0.3 diff Dec 12, 2017
new 0.3 Oct 2, 2017