J Am Med Inform Assoc 20:e341-e348 doi:10.1136/amiajnl-2013-001939
  • Research and applications

Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium

Editor's Choice
  1. Christopher G Chute1
  1. 1Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
  2. 2Department of Linguistics, University of Colorado, Boulder, Colorado, USA
  3. 3Group Health Research Institute, Seattle, Washington, USA
  4. 4Boston Children's Hospital, Harvard University, Boston, Massachusetts, USA
  5. 5Homer Warner Center for Informatics Research, Intermountain Healthcare, Salt Lake City, Utah, USA
  6. 6Agilex Technologies, Chantilly, Virginia, USA
  7. 7School of Biomedical Informatics, University of Texas Health Sciences Center, Houston, Texas, USA
  1. Correspondence to Dr Jyotishman Pathak, Mayo Clinic College of Medicine, Rochester, MN 55902, USA; Pathak.Jyotishman{at}
  • Received 17 April 2013
  • Revised 7 October 2013
  • Accepted 11 October 2013
  • Published Online First 4 November 2013


Research objective To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction.

Materials and methods Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems—Mayo Clinic and Intermountain Healthcare—were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine.

Results Using CEMs and open-source natural language processing and terminology services engines—namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)—we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria.

Conclusions End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article