In the example we will use the same dataset as in the Blocking records for record linkage vignette.
reclin2 packageThe package contains function pair_ann() which aims at
integration with reclin2 package. This function works as
follows.
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE) |>
head()| .x | .y | block |
|---|---|---|
| 204 | 1 | 1 |
| 204 | 176 | 1 |
| 204 | 375 | 1 |
| 204 | 391 | 1 |
| 204 | 405 | 1 |
| 204 | 424 | 1 |
Which provides you information on the total number of pairs. This can
be further included in the pipeline of the reclin2 package
(note that we use a different ANN this time).
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE,
ann = "hnsw") |>
compare_pairs(on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
comparators = list(cmp_jarowinkler())) |>
score_simple("score",
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc")) |>
select_threshold("threshold", score = "score", threshold = 6) |>
link(selection = "threshold") |>
head()| .y | .x | person_id.x | pername1.x | pername2.x | sex.x | dob_day.x | dob_mon.x | dob_year.x | hse_num | enumcap.x | enumpc.x | str_nam | cap_add | census_id | x | person_id.y | pername1.y | pername2.y | sex.y | dob_day.y | dob_mon.y | dob_year.y | enumcap.y | enumpc.y | cis_id | y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 945 | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 1995 | 39 | 39 SPRINGFIELD ROAD | DE256NG | Springfield Road | 39, Springfield Road | CENSDE256NG039003 | 945 | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 39 SPRINGFIELD ROAD | DE256NG | CISDE256NG039003 | 11 | |
| 71 | 427 | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 1973 | 62 | 62 CHURCH ROAD | DE159QA | Church Road | 62, Church Road | CENSDE159QA062001 | 427 | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 62 CHURCH ROAD | DE159QA | CISDE159QA062001 | 71 | |
| 83 | 720 | DE237GG025002 | IMOGEN | DARIS | F | 6 | 4 | 1968 | 25 | 25 WOODLANDS ROAD | DE237GG | Woodlands Road | 25, Woodlands Road | CENSDE237GG025002 | 720 | DE237GG025002 | IMOGEW | DAVIS | F | 6 | 4 | 25 WOODLANDS ROAD | DE237GG | CISDE237GG025002 | 83 | |
| 99 | 136 | DE125LU022001 | DANIEC | MICCER | M | 21 | 4 | 1947 | 22 | 22 PARK LANE | DE125LU | Park Lane | 22, Park Lane | CENSDE125LU022001 | 136 | DE125LU022001 | DAMIEL | HILLER | M | 21 | 4 | 22 PARK LANE | DE125LU | CISDE125LU022001 | 99 | |
| 154 | 949 | DE256NG040002 | CHLOE | WILSON | F | 5 | 7 | 1978 | 40 | 40 SPRINGFIELD ROAD | DE256NG | Springfield Road | 40, Springfield Road | CENSDE256NG040002 | 949 | DE256NG040002 | CHLOE | WILSOM | F | 5 | 7 | 40 SPRINGFIELD ROAD | DE256NG | CISDE256NG040002 | 154 | |
| 156 | 549 | DE159QY035002 | AVA | KING | F | 7 | 7 | 1969 | 35 | 35 CHURCH ROAD | DE159QY | Church Road | 35, Church Road | CENSDE159QY035002 | 549 | DE159QY035002 | AVA | KING | F | 7 | 7 | 35 CHURCH ROAD | DE159QY | CISDE159QY035002 | 156 |
fastLink packageJust use the block column in the function
fastLink::blockData(). As a result you will obtain a list
of records blocked for further processing.
RecordLinkage packageJust use the block column in the argument
blockfld in the compare.dedup() or
compare.linkage() function. Please note that
block column for the RecordLinkage package
should be stored as a character not a
numeric/integer vector.