rss
J Am Med Inform Assoc 16:670-682 doi:10.1197/jamia.M3144
  • Original Investigation
  • Research Paper

A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

  1. Khaled El Emam, PhDa,b,
  2. Fida Kamal Dankar, PhDa,
  3. Romeo Issa, MSd,
  4. Elizabeth Jonker, BAa,
  5. Daniel Amyot, PhDc,
  6. Elise Cogo, NDa,
  7. Jean-Pierre Corriveau, PhDd,
  8. Mark Walker, MS, MDe,
  9. Sadrul Chowdhury, MSc,
  10. Regis Vaillancourt, BPharm, PharmD, a,
  11. Tyson Roffey, BAa,
  12. Jim Bottomley, BScH, MHAa
  1. aChildren's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
  2. bPediatrics, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
  3. cSchool of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada
  4. dSchool of Computer Science, Carleton University, Ottawa, Ontario, Canada
  5. eOttawa Hospital Research Institute, Ottawa, Ontario, Canada
  1. Correspondence: Khaled El Emam, CHEO Research Institute, 401 Smyth Road, Ott, ON K1H 8L1, Canada (Email: kelemam{at}uottawa.ca).
  • Received 19 January 2009
  • Accepted 2 June 2009

Abstract

Background Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified.

Objective The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets.

Design Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits.

Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated.

Results The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution.

Conclusions For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.

Footnotes

    Free Sample

    This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
    View free sample issue >>

    Access policy for JAMIA

    All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

    All content older than 12 months is freely available on this website.

    AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

    Navigate This Article