Syndromic Surveillance Using Ambulatory Electronic Health Records
- George Hripcsaka,b,
- Nicholas D Soulakisb,
- Li Lia,
- Frances P Morrisona,
- Albert M Laia,
- Carol Friedmana,
- Neil S Calmanc,
- Farzad Mostasharib
- aDepartment of Biomedical Informatics, Columbia University, New York, NY
- bNew York City Department of Health and Mental Hygiene, New York, NY
- cInstitute for Family Health, New York, NY
- Correspondence: George Hripcsak, MD, MS, Department of Biomedical Informatics, 622 W 168 St, VC5, New York, NY, 10032; e-mail: < >
- Received 11 July 2008
- Accepted 30 January 2009
Objective To assess the performance of electronic health record data for syndromic surveillance and to assess the feasibility of broadly distributed surveillance.
Design Two systems were developed to identify influenza-like illness and gastrointestinal infectious disease in ambulatory electronic health record data from a network of community health centers. The first system used queries on structured data and was designed for this specific electronic health record. The second used natural language processing of narrative data, but its queries were developed independently from this health record. Both were compared to influenza isolates and to a verified emergency department chief complaint surveillance system.
Measurements Lagged cross-correlation and graphs of the three time series.
Results For influenza-like illness, both the structured and narrative data correlated well with the influenza isolates and with the emergency department data, achieving cross-correlations of 0.89 (structured) and 0.84 (narrative) for isolates and 0.93 and 0.89 for emergency department data, and having similar peaks during influenza season. For gastrointestinal infectious disease, the structured data correlated fairly well with the emergency department data (0.81) with a similar peak, but the narrative data correlated less well (0.47).
Conclusions It is feasible to use electronic health records for syndromic surveillance. The structured data performed best but required knowledge engineering to match the health record data to the queries. The narrative data illustrated the potential performance of a broadly disseminated system and achieved mixed results.
Supported by NLM grants R01 LM06910, R01 LM07659, and R01 LM08635, and Centers for Disease Control and Prevention grant P01 HK000029.