J Am Med Inform Assoc doi:10.1136/amiajnl-2012-000896
  • Research and applications

Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network

  1. Joshua C Denny9
  1. 1Group Health Research Institute, Seattle, Washington, USA
  2. 2Department of Biomedical Informatics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA
  3. 3Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
  4. 4Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
  5. 5Office of Personalized Medicine, Vanderbilt University, Nashville, Tennessee, USA
  6. 6Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
  7. 7Division of Cardiovascular Diseases, Mayo Clinic, Rochester, Minnesota, USA
  8. 8Office of Population Genomics, National Human Genome Research Institute, Bethesda,  Maryland, USA
  9. 9Departments of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, Tennessee, USA
  1. Correspondence to Dr Katherine M Newton, Group Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101, USA; newton.k{at}
  • Received 12 February 2012
  • Accepted 5 March 2013
  • Published Online First 26 March 2013


Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats.

Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies.

Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University.

Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results.

Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.

Related Article

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article