archiveRetriever

Retrieve Archived Web Pages from the 'Internet Archive'

v0.4.1 · Oct 16, 2025 · Apache License (>= 2.0)

Description

Scraping content from archived web pages stored in the 'Internet Archive' (<https://archive.org>) using a systematic workflow. Get an overview of the mementos available from the respective homepage, retrieve the Urls and links of the page and finally scrape the content. The final output is stored in tibbles, which can be then easily used for further analysis.

Downloads

CRAN

358

Last 30 days

11095th

900

Last 90 days

5.3K

Last year

Trend: +43.8% (30d vs prior 30d)

r2u CRAN

Last 30 days

Last 90 days

137

Last year

Trend: -28.6% (30d vs prior 30d)

autoCRAN

Last 7 days

Last 30 days

All-time

autoCRAN-only: this name is served only by autoCRAN, so the count is exact.

CRAN Check Status

13 OK

Show all 13 flavors

Flavor	Status	Time
r-devel-linux-x86_64-debian-clang	OK	85.3s
r-devel-linux-x86_64-debian-gcc	OK	64.8s
r-devel-linux-x86_64-fedora-clang	OK	137.4s
r-devel-linux-x86_64-fedora-gcc	OK	127.7s
r-devel-windows-x86_64	OK	118s
r-oldrel-macos-arm64	OK
r-oldrel-macos-x86_64	OK	154s
r-oldrel-windows-x86_64	OK	149s
r-patched-linux-x86_64	OK	82.6s
r-release-linux-x86_64	OK
r-release-macos-arm64	OK	42s
r-release-macos-x86_64	OK	101s
r-release-windows-x86_64	OK	125s

Check History

OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Dependency Network

Version History

11 tracked

new 0.4.1 Mar 10, 2026

updated 0.4.1 ← 0.4.0 diff Oct 15, 2025

updated 0.4.0 ← 0.3.1 diff Jun 10, 2024

updated 0.3.1 ← 0.3.0 diff Dec 22, 2022

updated 0.3.0 ← 0.2.0 diff Dec 19, 2022

updated 0.2.0 ← 0.1.2 diff Jun 20, 2022

updated 0.1.2 ← 0.1.1 diff Jun 6, 2022

updated 0.1.1 ← 0.1.0 diff Mar 2, 2022

updated 0.1.0 ← 0.0.2 diff May 26, 2021

updated 0.0.2 ← 0.0.1 diff Mar 18, 2021

new 0.0.1 Mar 9, 2021

Authors

Lukas Isermann

Lukas Isermann author maintainer
Konstantin Gavras author

Available on

CRAN r2u Bioconductor autoCRAN COPR

Dependencies

Imports

anytime dplyr ggplot2 gridExtra httr jsonlite lubridate rvest stringr tibble tidyrutilsxml2

Suggests

vcr (>= 2.0.0)testthat webmockr

Compilation

No compilation needed

First Published

Mar 9, 2021

First appeared in R 4.0.4 · current R 4.6.1

RSS Feed

CRAN Checks

View on CRAN →