Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule
- 1Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee, USA
- 2Department of Electrical Engineering & Computer Science, School of Engineering, Vanderbilt University, Nashville, Tennessee, USA
- 3Deptartment of Medicine, School of Medicine, Vanderbilt University, Nashville, Tennessee, USA
- Correspondence to Dr Bradley Malin, Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA;
- Received 2 April 2010
- Accepted 15 October 2010
Objective Healthcare organizations must de-identify patient records before sharing data. Many organizations rely on the Safe Harbor Standard of the HIPAA Privacy Rule, which enumerates 18 identifiers that must be suppressed (eg, ages over 89). An alternative model in the Privacy Rule, known as the Statistical Standard, can facilitate the sharing of more detailed data, but is rarely applied because of a lack of published methodologies. The authors propose an intuitive approach to de-identifying patient demographics in accordance with the Statistical Standard.
Design The authors conduct an analysis of the demographics of patient cohorts in five medical centers developed for the NIH-sponsored Electronic Medical Records and Genomics network, with respect to the US census. They report the re-identification risk of patient demographics disclosed according to the Safe Harbor policy and the relative risk rate for sharing such information via alternative policies.
Measurements The re-identification risk of Safe Harbor demographics ranged from 0.01% to 0.19%. The findings show alternative de-identification models can be created with risks no greater than Safe Harbor. The authors illustrate that the disclosure of patient ages over the age of 89 is possible when other features are reduced in granularity.
Limitations The de-identification approach described in this paper was evaluated with demographic data only and should be evaluated with other potential identifiers.
Conclusion Alternative de-identification policies to the Safe Harbor model can be derived for patient demographics to enable the disclosure of values that were previously suppressed. The method is generalizable to any environment in which population statistics are available.
Funding This was work was supported by grant 1U01HG00460301 from the National Human Genome Research Institute and 1R01LM009989 from the National Library of Medicine. eMERGE network members involved in the review of the manuscript include Eric Larson, MD (Group Health Cooperation), Cathy McCarty, PhD (Marshfield Clinic), Christopher Chute, MD, DrPH (Mayo Clinic), Rex Chisholm, PhD (Northwestern University), and Daniel Roden (Vanderbilt University).
Ethics approval This study was conducted with the approval of the Vanderbilt University, Group Health Cooperation, Mayo Clinic, Marshfield Clinic, Northwestern University.
Provenance and peer review Not commissioned; externally peer reviewed.