Impact on the Classification of Record Pairs Between the Naïve and Nonnaïve Strategy in Scenario 2 (Empirical Datasets) and Scenario 3 (Simulated Datasets) Both Using 5 Linking Variables
| Scenario 2: Empirical Datasets | Scenario 3: Simulated Datasets | ||||
| Naïve | Nonnaïve | Naïve | Nonnaïve | Truth | |
| Dataset 1 | 129,576 | 40,000 | |||
| Dataset 2 | 116,390 | 40,000 | |||
| Number of pairs | 15,081,350,640 | 1,600,000,000 | |||
| Estimated prevalence | 8.30E-05 | 4.37E-06 | 7.07E-05 | 4.37E-06 | 4.38E-06 |
| Number of estimated matches | 1,251,752 | 65,951 | 113,069 | 6,998 | 7,000 |
| Number of links | 1,226,322 | 65,639 | 112,988 | 6,983 | 7,000 |
| Number of false-positive links | NA | NA | 106,009 | 51 | 0 |
| Number of false-negative links | NA | NA | 20 | 68 | 0 |
-
NA = not applicable.









