Skip to content

blocking

Various Blocking Methods for Entity Resolution

v1.0.3 · Jun 30, 2026 · GPL-3

Description

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.

Downloads

CRAN

507

Last 30 days

7403rd

1.6K

Last 90 days

4.4K

Last year

Trend: -11.4% (30d vs prior 30d)

r2u CRAN

20

Last 30 days

68

Last 90 days

191

Last year

Trend: -23.1% (30d vs prior 30d)

autoCRAN

23

Last 7 days

99

Last 30 days

1

All-time

autoCRAN-only: this name is served only by autoCRAN, so the count is exact.

CRAN Check Status

1 ERROR
12 OK
Show all 13 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 ERROR
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (1 non-OK)
ERROR r-oldrel-macos-arm64

tests

Running ‘tinytest.R’ [2s/3s]
Running the tests in ‘tests/tinytest.R’ failed.
Complete output:
  > 
  > if ( requireNamespace("tinytest", quietly=TRUE) ){
  +   tinytest::test_package("blocking")
  + }
  
  test_annoy.R..................    0 tests    
  test_annoy.R..................    0 tests    
  test_annoy.R..................    1 tests OK 
  test_annoy.R..................    2 tests OK 
  test_annoy.R..................    3 tests OK 
  test_annoy.R..................    4 tests OK Reallocating to 1 nodes: old_address=0x0, new_address=0x600002d20c80
  Reallocating to 2 nodes: old_address=0x600002d20c80, new_address=0x600003f18c00
  Reallocating to 3 nodes: old_address=0x600003f18c00, new_address=0x11fe22e70
  Reallocating to 5 nodes: old_address=0x11fe22e70, new_address=0x11fe83dd0
  Reallocating to 7 nodes: old_address=0x11fe83dd0, new_address=0x11fe0c000
  Reallocating to 10 nodes: old_address=0x11fe0c000, new_address=0x119679c00
  pass 0...
  pass 1...
  pass 2...
  Reallocating
...[truncated]...
s = 10, ef_c = 10))), structure(list(result = structure(list(x = c(1, 
   call| -->    1, 1, 1, 2, 2, 2, 2), y = c(5, 6, 7, 8, 1, 2, 3, 4), block = c(2, 
   call| -->    2, 2, 2, 1, 1, 1, 1), dist = c(1.19209289550781e-07, 0.0425729155540466, 
   call| -->    1.19209289550781e-07, 0.278312206268311, 0.0513166785240173, 
   call| -->    -1.19209289550781e-07, 0.0513166785240173, 0.225403368473053)), 
   call| -->    row.names = c(NA, -8), class = c("data.table", "data.frame")), 
   call| -->    method = "hnsw", deduplication = FALSE, representation = "custom_matrix", 
   call| -->    metrics = NULL, confusion = NULL, colnames = c("al", "an", 
   call| -->        "ho", "ij", "ja", "ki", "ko", "ls", "mo", "nt", "ow", 
   call| -->        "py", "sk", "ty", "wa", "yp", "yt", "on", "th"), graph = NULL), 
   call| -->    class = "blocking", n_x = 3, n_y = 8))
   diff| Component "result": Column 'dist': Mean relative difference: 1.837035e-07
  Error: 2 out of 95 tests failed
  Execution halted

Additional Issues

linux-arm64 Details →

Check History

ERROR 12 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Jul 1, 2026
ERROR r-oldrel-macos-arm64

tests

Running ‘tinytest.R’ [2s/3s]
Running the tests in ‘tests/tinytest.R’ failed.
Complete output:
  > 
  > if ( requireNamespace("tinytest", quietly=TRUE) ){
  +   tinytest::test_package("blocking")
  + }
  
  test_annoy.R..................    0 tests   
...[truncated]...
 "sk", "ty", "wa", "yp", "yt", "on", "th"), graph = NULL), 
   call| -->    class = "blocking", n_x = 3, n_y = 8))
   diff| Component "result": Column 'dist': Mean relative difference: 1.837035e-07
  Error: 2 out of 95 tests failed
  Execution halted
OK 13 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Jun 9, 2026
ERROR 12 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Jun 8, 2026
ERROR r-devel-linux-x86_64-debian-gcc

whether package can be installed

Installation failed.
See ‘/home/hornik/tmp/R.check/r-devel-gcc/Work/PKGS/blocking.Rcheck/00install.out’ for details.
OK 14 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 10, 2026

Reverse Dependencies (1)

imports

Dependency Network

Dependencies Reverse dependencies text2vec tokenizers RcppHNSW RcppAnnoy mlpack rnndescent igraph data.table readr Matrix automatedRecLin blocking

Version History

5 tracked
updated 1.0.3 ← 1.0.2 diff Jun 30, 2026
updated 1.0.2 ← 1.0.1 diff Mar 11, 2026
new 1.0.1 Mar 10, 2026
updated 1.0.1 ← 1.0.0 diff Jun 17, 2025
new 1.0.0 Jun 12, 2025