R Observatory

A CRAN-focused observatory that tracks the incoming submission queue, package changes, download trends, and package metadata.

About the Project

R Observatory provides a searchable, browsable interface to CRAN's ecosystem. It monitors the package submission queue, tracks new releases and updates, aggregates download statistics, and surfaces package metadata, giving the R community a window into the daily rhythm of CRAN.

Data Streams

Four independent pipelines collect data from different parts of the CRAN ecosystem. Each runs on its own schedule and publishes to a dedicated GitHub repository.

Incoming Queue

Hourly

Hourly snapshots of CRAN's FTP incoming directory, with historical data from the cransays project.

r-observatory/cran-queue

Package Feed

Every 6 hours

New, updated, and archived packages detected via R's available.packages() function, comparing current state against stored snapshots.

r-observatory/cran-feed

Downloads

Daily

Daily download counts from the RStudio CRAN mirror, aggregated into trends and rankings.

r-observatory/cran-downloads

Metadata

Daily

Package descriptions, authors, licenses, and reverse dependency counts from CRAN's bulk packages.rds file.

r-observatory/cran-metadata

Accessing the Data

Each pipeline runs on its own schedule and publishes results to its GitHub repository. The outputs are then merged daily into a single SQLite database (observatory.db) and published as a GitHub release.

Combined Database

Daily

All four data streams merged into a single SQLite file. Download the latest release to query queue history, package changes, download trends, and metadata offline.

r-observatory/data releases

How It Works

Collect

Automated pipelines scrape CRAN's FTP server, query package listings, and pull download logs on a scheduled basis.

Process

Raw data is normalized, deduplicated, and enriched with computed fields like submission status changes and version diffs.

Publish

Processed data is stored in SQLite databases and served through this site with search, filtering, and visualizations.

Hosting and Advertising

This site is hosted on personal infrastructure. Because the database is updated throughout the day and serves dynamic queries, a static site is not feasible. Ads from Google AdSense help offset the hosting costs. Google Analytics is used to understand how people use the site. All underlying data is freely available through the GitHub repositories listed above.

Open Source

The data pipelines and this site are open source, available under the r-observatory GitHub organization. Contributions, bug reports, and feature requests are welcome.

Links

GitHub Organization

Source code for all data pipelines and this site.

Data Releases

Download the latest observatory.db SQLite database.

CRAN

The Comprehensive R Archive Network, source of all package data.

R Project

The R Project for Statistical Computing.

R Observatory

About the Project

Data Streams

Incoming Queue

Package Feed

Downloads

Metadata

Accessing the Data

Combined Database

How It Works

Collect

Process

Publish

Hosting and Advertising

Open Source

Related Projects

R Mailing List Archives

R-Universe

CRANberries

cransays Dashboard

Links