A Context-sensitive Approach to Anonymizing Spatial Surveillance Data
Impact on Outbreak Detection
- Affiliations of the authors: Children's Hospital Informatics Program, Children's Hospital Boston, Boston, MA (CAC, KDM); Clinical Decision Making Group, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA (CAC); Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA (CAC, KDM); Indiana University School of Medicine, Indianapolis, IN (SJG, JMO); The Regenstrief Institute, Inc., Indianapolis, IN (SJG, JMO); Harvard Medical School, Boston, MA (KDM)
- Correspondence and reprints: Christopher A. Cassa, Children's Hospital Boston, Informatics Program–Mandl Group, 1 Autumn Street, #721, Boston, MA 02215-5362; e-mail: <cassa{at}mit.edu>
- Received 27 July 2005
- Accepted 28 November 2005
Abstract
Objective The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic.
Design Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density.
Measurements A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured.
Results The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%.
Conclusion A population-density–based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.
Footnotes
-
The work was supported by R01LM007970-01 from the National Library of Medicine, National Institutes of Health.
-
The authors thank Dr. Karen Olson for input on creating semisynthetic data sets.








