orderanalyzer

Extracting Order Position Tables from PDF-Based Order Documents

v1.0.1 · Jan 15, 2026 · GPL-3

Description

Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.

Downloads

CRAN

471

Last 30 days

8212th

1.5K

Last 90 days

3.8K

Last year

Trend: -2.1% (30d vs prior 30d)

r2u CRAN

Last 30 days

Last 90 days

118

Last year

Trend: -47.1% (30d vs prior 30d)

autoCRAN

Last 7 days

Last 30 days

All-time

autoCRAN-only: this name is served only by autoCRAN, so the count is exact.

CRAN Check Status

13 OK

Show all 13 flavors

Flavor	Status	Time
r-devel-linux-x86_64-debian-clang	OK	116.5s
r-devel-linux-x86_64-debian-gcc	OK	84.9s
r-devel-linux-x86_64-fedora-clang	OK	183s
r-devel-linux-x86_64-fedora-gcc	OK	173.7s
r-devel-windows-x86_64	OK	108s
r-oldrel-macos-arm64	OK	27s
r-oldrel-macos-x86_64	OK	107s
r-oldrel-windows-x86_64	OK	147s
r-patched-linux-x86_64	OK	107s
r-release-linux-x86_64	OK
r-release-macos-arm64	OK	28s
r-release-macos-x86_64	OK	117s
r-release-windows-x86_64	OK	114s