The Fellegi and Sunter method is a probabilistic approach to solve record linkage problem based on decision model. Records in data sources are assumed to represent observations of entities taken from a particular population (individuals, companies, enterprises, farms, geographic region, families, households.).
data integration, probabilistic record linkage, string comparators, blockingsortingindexing, deduplication, open source software Contact name Luca Valentino email Software and documentation. SOFTWARE DEPENDENCIES. Java SE Development Kit (version 13) R (version 3.4.0) R packages ROI, ROI.plugin.clp, slam,
The team therefore set about developing a record linking package called Splink. 4. Introducing Splink Splink is a PySpark package that implements the Fellegi-Sunter model of record linking,
splink is a Python package for probabilistic record linkage (entity resolution). Its key features are It is extremely fast. It is capable of linking a million records on a laptop in around a minute. It is
for each linkage variable, the pre-specified rate of missing and error determined the probability that a record pair agrees on a linkage variable given that the pair is a true match (mi probability); the pre-specified discriminative power determined the probability that a record pair agrees on a variable given that the pair is a true non-match (
Coding example for the question Probabilistic record linkage (matching) in PostgreSQL and Python-postgresql. Solved-Probabilistic record linkage (matching) in PostgreSQL and
In general there are two broad types of record linkage methods (i) deterministic and (ii) probabilistic. Deterministic record linkage is the process of linking information by a uniquely shared key (s). Records are matched if linkage fields agree or unmatched if they disagree.
Probabilistic record linkage These pages present some introductory training material on probabilistic record linkage using the Fellegi Sunter model. Many of the articles are interactive. This material presents a simplified version of the model used by Splink, a piece of probabalistic linkage software for which I'm lead developer.