rss
J Am Med Inform Assoc 2008;15:627-637 doi:10.1197/jamia.M2716
  • Original Investigation
  • Research Paper

Protecting Privacy Using k-Anonymity

  1. Khaled El Emama,b,
  2. Fida Kamal Dankara
  1. aChildren's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
  2. bPediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
  1. Correspondence: Khaled El Emam, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario K1J 8L1, Canada (Email: <kelemam{at}uottawa.ca>)
  • Received 9 January 2008
  • Accepted 21 May 2008

Abstract

Objective There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets.

Design Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets.

Measurement Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric.

Results For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity.

Conclusion Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.

Footnotes

  • The author(s) declare that they have no competing interests.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.