Skip to content

R Observatory

A CRAN-focused observatory that tracks the incoming submission queue, package changes, download trends, and package metadata.

About the Project

R Observatory provides a searchable, browsable interface to CRAN's ecosystem. It monitors the package submission queue, tracks new releases and updates, aggregates download statistics, and surfaces package metadata, giving the R community a window into the daily rhythm of CRAN.

Data Streams

Four independent pipelines collect data from different parts of the CRAN ecosystem. Each runs on its own schedule and publishes to a dedicated GitHub repository.

Incoming Queue

Hourly

Hourly snapshots of CRAN's FTP incoming directory, with historical data from the cransays project.

r-observatory/cran-queue

Package Feed

Every 6 hours

New, updated, and archived packages detected via R's available.packages() function, comparing current state against stored snapshots.

r-observatory/cran-feed

Downloads

Daily

Daily download counts from the RStudio CRAN mirror, aggregated into trends and rankings.

r-observatory/cran-downloads

Metadata

Daily

Package descriptions, authors, licenses, and reverse dependency counts from CRAN's bulk packages.rds file.

r-observatory/cran-metadata

Accessing the Data

Each pipeline runs on its own schedule and publishes results to its GitHub repository. The outputs are then merged daily into a single SQLite database (observatory.db) and published as a GitHub release.

Combined Database

Daily

All four data streams merged into a single SQLite file. Download the latest release to query queue history, package changes, download trends, and metadata offline.

r-observatory/data releases

How It Works

1

Collect

Automated pipelines scrape CRAN's FTP server, query package listings, and pull download logs on a scheduled basis.

2

Process

Raw data is normalized, deduplicated, and enriched with computed fields like submission status changes and version diffs.

3

Publish

Processed data is stored in SQLite databases and served through this site with search, filtering, and visualizations.

Hosting and Advertising

This site is hosted on personal infrastructure. Because the database is updated throughout the day and serves dynamic queries, a static site is not feasible. Ads from Google AdSense help offset the hosting costs. Google Analytics is used to understand how people use the site. All underlying data is freely available through the GitHub repositories listed above.

Open Source

The data pipelines and this site are open source, available under the r-observatory GitHub organization. Contributions, bug reports, and feature requests are welcome.

Related Projects

R Mailing List Archives

A searchable, browsable interface to the R Project mailing lists, preserving decades of discussions from language design to statistical methodology.

R-Universe

A platform by rOpenSci for browsing and searching R packages across CRAN, Bioconductor, and GitHub, with automated builds, documentation, and API access.

CRANberries

Dirk Eddelbuettel's long-running service that tracks new, updated, and removed CRAN packages, providing RSS feeds and summary pages for each change.

cransays Dashboard

An R-hub project that monitors CRAN's incoming directory and displays the review status of submitted packages, updated hourly. R Observatory's queue data builds on cransays' historical snapshots.

Links