udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
Description
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Downloads
5.4K
Last 30 days
1615th
19K
Last 90 days
105.5K
Last year
Trend: -24.4% (30d vs prior 30d)
63
Last 30 days
344
Last 90 days
1.3K
Last year
Trend: -57.1% (30d vs prior 30d)
CRAN Check Status
Show all 13 flavors
| Flavor | Status |
|---|---|
| r-devel-linux-x86_64-debian-clang | OK |
| r-devel-linux-x86_64-debian-gcc | OK |
| r-devel-linux-x86_64-fedora-clang | OK |
| r-devel-linux-x86_64-fedora-gcc | OK |
| r-devel-windows-x86_64 | OK |
| r-oldrel-macos-arm64 | OK |
| r-oldrel-macos-x86_64 | OK |
| r-oldrel-windows-x86_64 | OK |
| r-patched-linux-x86_64 | OK |
| r-release-linux-x86_64 | OK |
| r-release-macos-arm64 | OK |
| r-release-macos-x86_64 | OK |
| r-release-windows-x86_64 | OK |
Check History
OK 13 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Jun 8, 2026
ERROR 12 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Jun 7, 2026
PDF version of manual
Rd conversion errors: Converting parsed Rd's to LaTeX ......Warning in file(out, "wt") : cannot open file '/tmp/RtmpAvN8LI/ltx24990631382743/udpipe_accuracy.tex': No space left on device Error in file(out, "wt") : cannot open the connection
OK 12 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Apr 25, 2026
NOTE 11 OK · 3 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Apr 22, 2026
installed package size
installed size is 25.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 21.5Mb
installed package size
installed size is 26.8Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 22.9Mb
installed package size
installed size is 6.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 2.5Mb
ERROR 10 OK · 3 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Apr 18, 2026
whether package can be installed
Installation failed. See 'd:/Rcompile/CRANpkg/local/4.6/udpipe.Rcheck/00install.out' for details.
installed package size
installed size is 25.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 21.5Mb
installed package size
installed size is 26.8Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 22.9Mb
installed package size
installed size is 6.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 2.5Mb
NOTE 11 OK · 3 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
installed package size
installed size is 25.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 21.5Mb
installed package size
installed size is 26.8Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 22.9Mb
installed package size
installed size is 6.5Mb
sub-directories of 1Mb or more:
dummydata 1.4Mb
libs 2.5Mb