Skip to content

orderanalyzer

Extracting Order Position Tables from PDF-Based Order Documents

v1.0.1 · Jan 15, 2026 · GPL-3

Description

Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.

Downloads

530

Last 30 days

7565th

1.4K

Last 90 days

2.8K

Last year

Trend: -2.9% (30d vs prior 30d)

CRAN Check Status

14 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK

Check History

OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Dependency Network

Dependencies Reverse dependencies tidyselect data.table dplyr matrixcalc quanteda rlist stringr tibble tidyr purrr digest lubridate orderanalyzer

Version History

new 1.0.1 Mar 10, 2026
updated 1.0.1 ← 1.0.0 diff Jan 14, 2026
new 1.0.0 Dec 11, 2024