Skip to content

archiveRetriever

Retrieve Archived Web Pages from the 'Internet Archive'

v0.4.1 · Oct 16, 2025 · Apache License (>= 2.0)

Description

Scraping content from archived web pages stored in the 'Internet Archive' (<https://archive.org>) using a systematic workflow. Get an overview of the mementos available from the respective homepage, retrieve the Urls and links of the page and finally scrape the content. The final output is stored in tibbles, which can be then easily used for further analysis.

Downloads

360

Last 30 days

10624th

1K

Last 90 days

6.3K

Last year

Trend: +1.7% (30d vs prior 30d)

CRAN Check Status

14 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK

Check History

OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Dependency Network

Dependencies Reverse dependencies anytime dplyr ggplot2 gridExtra httr jsonlite lubridate rvest stringr tibble tidyr xml2 archiveRetriever

Version History

new 0.4.1 Mar 10, 2026
updated 0.4.1 ← 0.4.0 diff Oct 15, 2025
updated 0.4.0 ← 0.3.1 diff Jun 10, 2024
updated 0.3.1 ← 0.3.0 diff Dec 22, 2022
updated 0.3.0 ← 0.2.0 diff Dec 19, 2022
updated 0.2.0 ← 0.1.2 diff Jun 20, 2022
updated 0.1.2 ← 0.1.1 diff Jun 6, 2022
updated 0.1.1 ← 0.1.0 diff Mar 2, 2022
updated 0.1.0 ← 0.0.2 diff May 26, 2021
updated 0.0.2 ← 0.0.1 diff Mar 18, 2021
new 0.0.1 Mar 9, 2021