rss
JAMIA 2006;13:160-165 doi:10.1197/jamia.M1920
  • The Practice of Informatics
  • Application of Information Technology

A Context-sensitive Approach to Anonymizing Spatial Surveillance Data

Impact on Outbreak Detection

  1. Christopher A Cassa,
  2. Shaun J Grannis,
  3. J Marc Overhage,
  4. Kenneth D Mandl
  1. Affiliations of the authors: Children's Hospital Informatics Program, Children's Hospital Boston, Boston, MA (CAC, KDM); Clinical Decision Making Group, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA (CAC); Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA (CAC, KDM); Indiana University School of Medicine, Indianapolis, IN (SJG, JMO); The Regenstrief Institute, Inc., Indianapolis, IN (SJG, JMO); Harvard Medical School, Boston, MA (KDM)
  1. Correspondence and reprints: Christopher A. Cassa, Children's Hospital Boston, Informatics Program–Mandl Group, 1 Autumn Street, #721, Boston, MA 02215-5362; e-mail: <cassa{at}mit.edu>
  • Received 27 July 2005
  • Accepted 28 November 2005

Abstract

Objective The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic.

Design Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density.

Measurements A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured.

Results The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%.

Conclusion A population-density–based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.

Footnotes

  • The work was supported by R01LM007970-01 from the National Library of Medicine, National Institutes of Health.

  • The authors thank Dr. Karen Olson for input on creating semisynthetic data sets.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.