A secure protocol for protecting the identity of providers when disclosing data for disease surveillance
- Khaled El Emam1,2,
- Jun Hu3,
- Jay Mercer4,
- Liam Peyton3,
- Murat Kantarcioglu5,
- Bradley Malin6,
- David Buckeridge7,
- Saeed Samet1,
- Craig Earle8
- 1Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
- 2Paediatrics, University of Ottawa, Ottawa, Ontario, Canada
- 3School of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada
- 4Family Medicine, University of Ottawa, Ottawa, Ontario, Canada
- 5Computer Science, University of Texas at Dallas, Dallas, Texas, USA
- 6Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
- 7Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada
- 8Institute for Clinical Evaluative Sciences and the Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Correspondence to Khaled El Emam, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON K1H 8L1, Canada;
- Received 16 January 2011
- Accepted 3 February 2011
Background Providers have been reluctant to disclose patient data for public-health purposes. Even if patient privacy is ensured, the desire to protect provider confidentiality has been an important driver of this reluctance.
Methods Six requirements for a surveillance protocol were defined that satisfy the confidentiality needs of providers and ensure utility to public health. The authors developed a secure multi-party computation protocol using the Paillier cryptosystem to allow the disclosure of stratified case counts and denominators to meet these requirements. The authors evaluated the protocol in a simulated environment on its computation performance and ability to detect disease outbreak clusters.
Results Theoretical and empirical assessments demonstrate that all requirements are met by the protocol. A system implementing the protocol scales linearly in terms of computation time as the number of providers is increased. The absolute time to perform the computations was 12.5 s for data from 3000 practices. This is acceptable performance, given that the reporting would normally be done at 24 h intervals. The accuracy of detection disease outbreak cluster was unchanged compared with a non-secure distributed surveillance protocol, with an F-score higher than 0.92 for outbreaks involving 500 or more cases.
Conclusion The protocol and associated software provide a practical method for providers to disclose patient data for sentinel, syndromic or other indicator-based surveillance while protecting patient privacy and the identity of individual providers.
- integration across care settings (inter- and intraenterprise)
- computational methods
- advanced algorithms
- personal health records and self-care systems
- assuring information system security and personal privacy
- other methods for security and policy enforcement
- data exchange
- ethical study methods
- statistical analysis of large datasets
- methods for integration of information from disparate sources
- distributed systems
- software engineering: architecture
- detecting disease outbreaks and biological threats
- simulation of complex systems (at all levels: molecules to work groups to organizations)
- monitoring the health of populations
- privacy and security
- machine learning
- health data standards
- scientific information and health data policy
- consumer health/patient education information
- information retrieval
- public-health informatics
- clinical trials
- syndromic surveillance
- secure computation
Funding This work was partially funded by the Canadian Institutes of Health Research, the GeoConnections program of Natural Resources Canada, the Ontario Institute for Cancer Research, the Natural Sciences and Engineering Research Council and grant number R01-LM009989 from the National Library of Medicine, National Institutes of Health.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.