rss
JAMIA 2009;16:738-745 doi:10.1197/jamia.M3186
  • Original Investigation
  • Research Paper

An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling

  1. Vivienne J Zhu, MD, MSa,b,
  2. Marc J Overhage, MD, PhDa,b,
  3. James Egga,
  4. Stephen M Downs, MD, MSa,b,
  5. Shaun J Grannis, MD, MSa,b
  1. aRegenstrief Institute, Inc, Indianapolis, IN
  2. bIndiana University School of Medicine, Indianapolis, IN
  1. Correspondence: Dr. Shaun J. Grannis, Regenstrief Institute, Inc., 410 West 10th Street, Suite 2000, Indianapolis, IN 46202-3012 (Email: sgrannis{at}regenstrief.org).
  • Received 16 February 2009
  • Accepted 2 June 2009

Abstract

Objective To incorporate value-based weight scaling into the Fellegi-Sunter (F–S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm.

Background Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F–S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F–S method has been implemented and specifically evaluated using real-world clinical data.

Methods The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values.

Results The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F–S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity.

Conclusion By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F–S method with minimal decrease in sensitivity.

Footnotes

    Access policy for JAMIA

    All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

    The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.