Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture
- Kenneth D Mandl1,2,3,4,
- Isaac S Kohane1,2,3,4,
- Douglas McFadden2,3,
- Griffin M Weber2,5,
- Marc Natter1,4,
- Joshua Mandel1,4,
- Sebastian Schneeweiss6,
- Sarah Weiler3,
- Jeffrey G Klann7,
- Jonathan Bickel1,4,8,
- William G Adams9,10,
- Yaorong Ge11,
- Xiaobo Zhou12,
- James Perkins13,14,
- Keith Marsolo15,
- Elmer Bernstam16,
- John Showalter17,
- Alexander Quarshie18,
- Elizabeth Ofili19,
- George Hripcsak20,
- Shawn N Murphy7,21
- 1Children's Hospital Informatics Program at Harvard–MIT Health Sciences and Technology, Boston, Massachusetts, USA
- 2Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- 3Harvard Catalyst, Harvard Medical School, Boston, Massachusetts, USA
- 4Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
- 5Biomedical Research Informatics Core, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
- 6Division of Pharmacoepidemiology, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
- 7Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA
- 8Information Services Department, Boston Children's Hospital, Boston, Massachusetts, USA
- 9Boston University School of Medicine/Boston Medical Center, Boston, Massachusetts, USA
- 10Boston University Clinical and Translational Sciences Institute, Boston, Massachusetts, USA
- 11College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
- 12Department of Radiology, Center for Bioinformatics & Systems Biology, Wake Forest University Health Science, Winston-Salem, North Carolina, USA
- 13Clark Atlanta University, Atlanta, Georgia, USA
- 14Research Centers in Minority Institutions Translational Research Network, Data Coordinating Center, Jackson State University, Jackson, Mississippi, USA
- 15Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- 16Division of Biomedical Informatics, Biomedical Informatics and Department of Internal Medicine, The University of Texas Health Science Center at Houston, Houston, Texas, USA
- 17University of Mississippi Medical Center, Jackson, Mississippi, USA
- 18Department of Internal Medicine, Community Health and Preventive Medicine and Clinical Research Center, Morehouse School of Medicine, Atlanta, Georgia, USA
- 19Department of Internal Medicine, Clinical Research Center, Morehouse School of Medicine, Atlanta, Georgia, USA
- 20Department of Biomedical Informatics, Columbia University, New York, New York, USA
- 21Partners HealthCare Systems, Information Systems, Charlestown, Massachusetts, USA
- Correspondence to Dr Kenneth D Mandl, Boston Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, USA;
- Received 18 February 2014
- Accepted 8 March 2014
- Published Online First 12 May 2014
We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative ‘apps’ to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components.
- Electronic Health Record
- Learning Health System
- Clinical Trials
- Patient Engagement
- Distributed Computing
The Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, pronounced ‘skills’) is one of 11 clinical data research networks (CDRNs) funded by the Patient Centered Outcomes Research Institute (PCORI) in 2014. PCORI, a non-governmental organization created under the Patient Affordable Care Act seeks to build an information technology (IT) backbone to support comparative effectiveness research at a national scale across both CDRNs and also patient powered research networks (PPRNs).
SCILHS engages patients, clinicians, health systems leadership, and key healthcare stakeholders as collaborators to build on an existing network of hospitals and health systems that have already adopted a common clinical and translational research IT and regulatory framework. SCILHS, comprising 10 health systems (box 1), is a step toward answering the Institute of Medicine's call for a learning healthcare system (LHS)1 ,2 to ‘generate and apply the best evidence for the collaborative healthcare choices of each patient and provider; to drive the process of discovery as a natural outgrowth of patient care; and to ensure innovation, quality, safety, and value in health care’.
Box 1 Alphabetical list of Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) sites
Beth Israel Deaconess Medical Center
Boston Children's Hospital
Boston Health Net (Boston Medical Center and Community Health Centers)
Cincinnati Children's Hospital Medical Center
Columbia University Medical Center and New York Presbyterian Hospital
Morehouse School of Medicine/Grady Memorial Hospital (Research Centers in Minority Institutions)
Partners HealthCare System (includes Massachusetts General and Brigham & Women's Hospital)
University Mississippi Medical Center
The University of Texas Health Science Center at Houston
Wake Forest Baptist Medical Center
Fifteen years ago, SCILHS informatics leaders began a quest to develop informatics infrastructure and regulatory innovation that would convert the emerging electronic health record (EHR) into a research tool for improving patient outcomes. All of our work and open source toolkits have been supported by grants from the National Institutes of Health, Centers for Disease Control and Prevention, and Office of the National Coordinator of Health Information Technology (ONC). First, we built Indivo,3 ,4 the first personally controlled health record, which gave patients their data, and apps to make those data useful. Then, i2b2 (Informatics for Integrating Biology and the Bedside)5–7 created an open source analytic platform to the EHR, to fuse and analyze data produced by the delivery system, and identify research cohorts. i2b2’s flexible common semantic data model readily accommodates a variety of clinical data. Our next advance was SHRINE (Shared Health Research Information Network),8–10 a tool enabling investigators to query i2b2 nodes in real time across multiple sites for collaborative population research. i2b2 has been successfully implemented at more than 100 sites across the USA, thereby enabling investigators to use delivery system data to identify patients with specific illnesses and clinical characteristics. A recent PCORI survey of all PCORnet sites revealed that 37% of the existing CDRN nodes and 31% of the PPRN nodes already used i2b2. Finally, we built SMART (Substitutable Medical Applications, Reusable Technologies)—a platform to enable any developer to contribute to an ‘App Store for Health and Research’ compatible with i2b2-SHRINE instances or compliant EHRs.11–13
These informatics tools and associated research policy advances have already contributed to transformation in the clinical research enterprise—real-time, collaborative population health research is now enabled across SHRINE member sites distributed nationally—but they have yet to yield substantial improvements in the health of our patients. Now, in establishing PCORnet, PCORI has catalyzed a new national research dialog to answer patient-oriented questions and improve human health. We directly address this challenge via a strategy intended to avoid prior mistakes of large-scale, top-down, costly software infrastructure efforts that failed to scale (eg, caBIG14), instead building SCILHS with open source, free, modular components5 ,15 with vibrant user and software developer communities that have already spread virally to scale across heterogeneous health systems.
Here, we detail the informatics approaches taken by SCILHS to identify large cohorts of patients and engage them for research. Our technology strategy links lockstep to processes for regulatory innovation, development of robust governance constructs and policies, and local adoption by hospital leadership and institutional review boards.
The sidecar approach
SCILHS adopts and extends a strategy of establishing a freely accessible health data ‘sidecar’ warehouse to the EHR, effectively leveraging existing data collected by EHRs during routine care while avoiding costly, time-consuming EHR integrations (figure 1). Developed intensively over the past 5 years at Harvard Medical School, this approach employs vendor agnostic, free, open source, scalable, and interoperable technologies to produce the only research-based, shared repository of EHR data that can be queried in real-time. Of already proven value in the research ecosystem, these components support a cost-effective and sustainable research network of >8 million patients.
We consider the heterogeneity of collaborating institutions to be a key measure of success; via adoption of the sidecar approach, we enable any institution to join our SCILHS network. Specifically, a primary goal is inclusion of diverse populations within our CDRN network, thereby enabling capture of the genetic, genomic, and socioeconomic variation that exists beyond insured populations in managed care settings alone. Further, by freely sharing the processes and software that have been developed and supported by Harvard, we hope to catalyze the formation of many other new networks across heterogeneous health systems and institutions, and involve new partners in improving our core components, common data models, and ontologies.
The sidecar infrastructure is composed of the following:
i2b2 (Informatics for Integrating Biology and the Bedside). Data analytic platform employed for EHR data analytics and clinical research at >100 academic medical centers worldwide (NIH funded).
SHRINE (Shared Health Research Informatics Network). Federated query and response system that enables investigators to discover EHR data housed in i2b2 nodes across multiple independent institutions (NIH CTSA funded).
SMART Platforms. First described in the New England Journal of Medicine,12 SMART has programmatic interfaces and applications that transform both EHRs and their sidecars into platforms that run substitutable iPhone-like apps.11 SMART enables a national scale ‘App Store’ for PCOR for rapid cycle innovation of PCOR methods (ONC funded).
Indivo. The original personally controlled health record3 ,4 ,16 ,17 links patients to clinical and research settings. Used by hundreds of thousands employees of Dossia's founding companies (Wal-Mart, Intel, and AT&T), Indivo was also the initial software codebase for Microsoft's HealthVault platform (NIH, CDC, and ONC funded).
Data models and ontologies
SCILHS will combine EHR data with payer claims to facilitate longitudinal tracking of patients over time and across sites of care. The sidecar approach provides the capability to implement new data models without transforming all of the stored source data—a key element in the scalability and interoperability of our platform (table 1). By enabling well-designed, cross-mapped ontologies that support a PCORnet common data model, this approach incorporates otherwise disparate clinical data sources into an easily-queried system that stores data in a flexible format. Data are stored in i2b2 using an entity–attribute–value model,20 ,21 employing a central ‘fact’ table based upon Kimball's Star Schema22 wherein each row stores a flexibly defined, atomic ‘fact’ or observation for a patient.5 Much of i2b2's versatility arises from its focus on a semantic definition of patient observations that can represent various existing and newly defined data elements: claims, EHR, genetic and imaging data, as well as patient reported outcomes and demographics. Analogous to a capacious warehouse with adjustable shelves and bins, i2b2 accommodates various nomenclatures for data elements, and supports robust tags of associated modifiers and values. This approach enables database indexing of facts and observations to support high performance execution of expressive queries and filters.
i2b2 employs an ontology-based approach that supports flexible, on-the-fly incorporation of new data elements and coding systems. Terminologies such as ICD, NDC, and LOINC may be pre-loaded as hierarchical concept trees; new or ad-hoc terminologies including patient-reported outcome measures or locally defined data dictionaries readily coexist and may be cross-mapped in i2b2. Concepts may be grouped using simple hierarchies and then optionally re-mapped into other reference coding systems and data models (eg, Observational Medical Outcomes Partnership (OMOP) data model).23 In this way, i2b2 accommodates diverse real-world coding systems while maintaining a straightforward query interface for its users.
The SHRINE Adaptor Cell maps local i2b2 terminologies into a common, standards-based SHRINE ontology. This enables a common shared ontology for federated queries while allowing individual i2b2 instances within institutions to retain local hierarchies and terminologies. The Adaptor transforms a federated SHRINE query into a query that runs on the local i2b2 database. The Adaptor then converts the result of that query back into the common SHRINE message format, using well-maintained standards including RxNorm, ICD9, and LOINC. In addition, SHRINE includes tools for ontology mapping and ontology-based data mining. Simple SHRINE customizations enable use of other query systems, for example the QueryHealth distributed query system (ONC) uses PopMedNet to query i2b2.24 ,25
Success to date
SHRINE and i2b2-based research includes characterization of rare morbidities of common diseases,26 very rare diseases such as peripartum cardiomyopathy (discovered in SHRINE and published in Nature27), detections of drug–drug interactions,28 and measures of quality and clinical efficacy across self-organized SHRINE networks in Europe, the University of California healthcare systems, and a just-in-time network to study the prevalence of complication rates of type 1 and type 2 diabetes in hospitals across this country. Others have used SHRINE to characterize and track the rising incidence of colorectal cancer29 and further characterize it, and to identify and optimize practice variation in inflammatory bowel disease and intervene to change that practice.30 i2b2 and SHRINE have been implemented as the base infrastructure for a variety of enhanced chronic disease registry-based research efforts.31 The Childhood Arthritis and Rheumatology Research Alliance uses the SHRINE/i2b2 registry framework to federate clinical care data and patient-reported data from 62 academic medical centers in the USA and Canada32 ,33 and is currently piloting consensus treatment protocol trials.34–37 The Harvard Inflammatory Bowel Disease (IBD) Longitudinal Data Repository employs the same infrastructure.31 ImproveCareNow38 utilizes i2b2 as its centralized data warehouse for IBD-related quality improvement development at 50 centers.
The health systems that have joined SCILHS reflect the American demographic—an essential requirement for reaching statistically valid, clinically meaningful, and patient-centric conclusions about therapies across the diverse spectrum of all healthcare consumers. In order to achieve the comprehensive, patient-centered outcomes infrastructure called for by PCORI, we introduce a new, patient-centric platform (mySCILHS) based on the Indivo system and incorporating the REDCap electronic data capture tool.
mySCILHS will support the Blue Button REST API for standards-based interactions with PPRNs and other patient-selected tools. This API exposes up-to-date, structured clinical summary data for each participating patient. Via a consumer-friendly workflow based on web standards including OAuth2, patients can authorize third-party apps and services, including PPRNs, to access their clinical data.
Figure 2 shows the workflow from an initial query through the analytic phase in a comparative effectiveness study. Each node in the network maintains an instance of i2b2 containing claims and de-identified electronic medical record data. SCILHS is a true peer-to-peer network, meaning that any SHRINE-based node can initiate a query, using a common ontology, that aggregates results from all participating sites. After the initial query, the investigator can automatically pass the query to each site where duly authorized local site investigators may review individual subject data for study eligibility using i2b2 SMART apps (figure 3). The final patient list is transmitted to the mySCILHS patient-facing software. The mySCILHS research contact management module links de-identified i2b2 records to patient demographics and contact information. Patients are engaged by web survey, telephony, or SMART apps; patient-reported data are returned to i2b2 and are then transferred into a secure comparative effectiveness (CE) study environment for analyses. In the CE environment, further transformations may occur, supporting many other analytic tools and processes. We anticipate that PCORnet-level queries, which may launch against the full complement of 11 CDRNs and 18 PPRNs, will be initiated at the PCORnet adapter. We anticipate that natural language processing (NLP) of provider notes will play an important role for adding complete longitudinal coded data to the hospital-based record.39 Early findings demonstrate that NLP of hospital-based EHR notes provides quite complete longitudinal data even when compared with Centers for Medicare and Medicaid Services claims data (personal communication, Katherine Liao, Brigham and Women's Hospital, 2014). Using NLP on hospital and clinic notes will complement our strategy of concatenating EHR data with external sources such as claims and pharmacy data.
Implementing and scaling
SCILHS includes 10 legally and financially independent institutions whose CEO or equivalent senior institutional official has committed to active participation in governance, policy development, data sharing, and sustainability planning. Each member has pledged to invest additional personnel and resources to ensure the network meets local patient and clinical stakeholder needs. By harmonizing informatics infrastructure, data models, regulatory processes and policies, and patient participation within and across member institutions, we anticipate that SCILHS will achieve and remain a successful model for inter-institutional PCOR. Utilizing the innovative SCILHS sidecar IT approach to EHR access, we minimize local informatics burden, further enabling a sustainable and adaptable PCOR infrastructure.
We acknowledge the invaluable contributions of the many SCILHS investigators, leaders, and supporters and specifically call out those most involved in designing the network in the early phases: Barbara Bierer, Susan Edgeman-Levitan, Jonathan Finkelstein, Alison Goldfine, Jennifer Haas, John Halamka, Manny Hernandez, John Hutton, Ann Klibanski, David Ludwig, Joshua Metlay, Mary Mullen, Lee Marshall Nadler, Andrew Nierenberg, Harry Orf, Patricia O'Rourke, Eric Peraksilis, Lee Schwamm, Daniel Solomon, Herman Taylor, Patrick Taylor, Aaron Waxman, Laura Weisel, and James Wilson.
Collaborators SCILHS Network.
Contributors The authors all: made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; and drafted the work or revised it critically for important intellectual content; and gave final approval of the version to be published; and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This work was supported by the National Institutes of Health: National Library of Medicine R01LM011185 and U54 LM008748; National Institute of General Medical Sciences R01GM104303; National Center for Advancing Translational Sciences 1KL2TR001100, UL1TR000454; National Institute on Minority Health and Health Disparities U54 MD007588; the Office of the National Coordinator of Health Information Technology SHARP Program Contract 90TR0001; and by Contract CDRN-1306-04608 from the Patient Centered Outcomes Research Institute (PCORI).
Competing interests SS is consultant to WHISCON, LLC and to Aetion, Inc., a software manufacturer in which he also owns shares.
Provenance and peer review Commissioned; internally peer reviewed.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/