Skip to content

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.16 · Jan 30, 2026 · MPL-2.0

Description

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Downloads

5.9K

Last 30 days

1615th

20K

Last 90 days

103.4K

Last year

Trend: -12.9% (30d vs prior 30d)

CRAN Check Status

3 NOTE
11 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 NOTE
r-oldrel-macos-x86_64 NOTE
r-oldrel-windows-x86_64 NOTE
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (3 non-OK)
NOTE r-oldrel-macos-arm64

installed package size

installed size is 25.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       21.5Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is 26.8Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       22.9Mb
NOTE r-oldrel-windows-x86_64

installed package size

installed size is  6.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs        2.5Mb

Check History

NOTE 11 OK · 3 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026
NOTE r-oldrel-macos-arm64

installed package size

installed size is 25.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       21.5Mb
NOTE r-oldrel-macos-x86_64

installed package size

installed size is 26.8Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       22.9Mb
NOTE r-oldrel-windows-x86_64

installed package size

installed size is  6.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs        2.5Mb

Reverse Dependencies (18)

Dependency Network

Dependencies Reverse dependencies Rcpp data.table Matrix MadanText MadanTextNetwork cleanNLP corpustools finnsurveytext sumup tall BTM birddog doc2vec nametagger pseudobibeR text2vec textplot textrank +3 more reverse deps udpipe

Version History

new 0.8.16 Mar 10, 2026
updated 0.8.16 ← 0.8.15 diff Jan 29, 2026
updated 0.8.15 ← 0.8.14 diff Nov 27, 2025
updated 0.8.13 ← 0.8.12 diff Nov 25, 2025
updated 0.8.14 ← 0.8.13 diff Nov 25, 2025
updated 0.8.12 ← 0.8.11 diff Sep 3, 2025
updated 0.8.11 ← 0.8.10 diff Jan 5, 2023
updated 0.8.10 ← 0.8.9 diff Nov 9, 2022
updated 0.8.9 ← 0.8.8 diff Mar 23, 2022
updated 0.8.8 ← 0.8.6 diff Dec 1, 2021
updated 0.8.6 ← 0.8.5 diff May 31, 2021
updated 0.8.5 ← 0.8.4-1 diff Dec 9, 2020
updated 0.8.4-1 ← 0.8.4 diff Oct 11, 2020
updated 0.8.4 ← 0.8.3 diff Oct 9, 2020
updated 0.8.3 ← 0.8.2 diff Jul 5, 2019
updated 0.8.2 ← 0.8.1 diff May 28, 2019
updated 0.8.1 ← 0.8 diff Feb 14, 2019
updated 0.8 ← 0.7 diff Dec 8, 2018
updated 0.7 ← 0.6.1 diff Sep 9, 2018
updated 0.6.1 ← 0.6 diff Jul 29, 2018