blocking
Various Blocking Methods for Entity Resolution
Description
The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.
Downloads
507
Last 30 days
7403rd
1.6K
Last 90 days
4.4K
Last year
Trend: -11.4% (30d vs prior 30d)
20
Last 30 days
68
Last 90 days
191
Last year
Trend: -23.1% (30d vs prior 30d)
23
Last 7 days
99
Last 30 days
1
All-time
autoCRAN-only: this name is served only by autoCRAN, so the count is exact.
CRAN Check Status
Show all 13 flavors
| Flavor | Status |
|---|---|
| r-devel-linux-x86_64-debian-clang | OK |
| r-devel-linux-x86_64-debian-gcc | OK |
| r-devel-linux-x86_64-fedora-clang | OK |
| r-devel-linux-x86_64-fedora-gcc | OK |
| r-devel-windows-x86_64 | OK |
| r-oldrel-macos-arm64 | ERROR |
| r-oldrel-macos-x86_64 | OK |
| r-oldrel-windows-x86_64 | OK |
| r-patched-linux-x86_64 | OK |
| r-release-linux-x86_64 | OK |
| r-release-macos-arm64 | OK |
| r-release-macos-x86_64 | OK |
| r-release-windows-x86_64 | OK |
Check details (1 non-OK)
tests
Running ‘tinytest.R’ [2s/3s]
Running the tests in ‘tests/tinytest.R’ failed.
Complete output:
>
> if ( requireNamespace("tinytest", quietly=TRUE) ){
+ tinytest::test_package("blocking")
+ }
test_annoy.R.................. 0 tests
test_annoy.R.................. 0 tests
test_annoy.R.................. 1 tests OK
test_annoy.R.................. 2 tests OK
test_annoy.R.................. 3 tests OK
test_annoy.R.................. 4 tests OK Reallocating to 1 nodes: old_address=0x0, new_address=0x600002d20c80
Reallocating to 2 nodes: old_address=0x600002d20c80, new_address=0x600003f18c00
Reallocating to 3 nodes: old_address=0x600003f18c00, new_address=0x11fe22e70
Reallocating to 5 nodes: old_address=0x11fe22e70, new_address=0x11fe83dd0
Reallocating to 7 nodes: old_address=0x11fe83dd0, new_address=0x11fe0c000
Reallocating to 10 nodes: old_address=0x11fe0c000, new_address=0x119679c00
pass 0...
pass 1...
pass 2...
Reallocating
...[truncated]...
s = 10, ef_c = 10))), structure(list(result = structure(list(x = c(1,
call| --> 1, 1, 1, 2, 2, 2, 2), y = c(5, 6, 7, 8, 1, 2, 3, 4), block = c(2,
call| --> 2, 2, 2, 1, 1, 1, 1), dist = c(1.19209289550781e-07, 0.0425729155540466,
call| --> 1.19209289550781e-07, 0.278312206268311, 0.0513166785240173,
call| --> -1.19209289550781e-07, 0.0513166785240173, 0.225403368473053)),
call| --> row.names = c(NA, -8), class = c("data.table", "data.frame")),
call| --> method = "hnsw", deduplication = FALSE, representation = "custom_matrix",
call| --> metrics = NULL, confusion = NULL, colnames = c("al", "an",
call| --> "ho", "ij", "ja", "ki", "ko", "ls", "mo", "nt", "ow",
call| --> "py", "sk", "ty", "wa", "yp", "yt", "on", "th"), graph = NULL),
call| --> class = "blocking", n_x = 3, n_y = 8))
diff| Component "result": Column 'dist': Mean relative difference: 1.837035e-07
Error: 2 out of 95 tests failed
Execution halted
Additional Issues
Check History
ERROR 12 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Jul 1, 2026
tests
Running ‘tinytest.R’ [2s/3s]
Running the tests in ‘tests/tinytest.R’ failed.
Complete output:
>
> if ( requireNamespace("tinytest", quietly=TRUE) ){
+ tinytest::test_package("blocking")
+ }
test_annoy.R.................. 0 tests
...[truncated]...
"sk", "ty", "wa", "yp", "yt", "on", "th"), graph = NULL),
call| --> class = "blocking", n_x = 3, n_y = 8))
diff| Component "result": Column 'dist': Mean relative difference: 1.837035e-07
Error: 2 out of 95 tests failed
Execution halted
OK 13 OK · 0 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Jun 9, 2026
ERROR 12 OK · 0 NOTE · 0 WARNING · 1 ERROR · 0 FAILURE Jun 8, 2026
whether package can be installed
Installation failed. See ‘/home/hornik/tmp/R.check/r-devel-gcc/Work/PKGS/blocking.Rcheck/00install.out’ for details.