Development of a Clinical Data Warehouse for Hospital Infection Control
- Mary F Wisniewski, MSN,
- Piotr Kieszkowski, BS,
- Brandon M Zagorski, MS,
- William E Trick, MD,
- Michael Sommers, BA,
- Robert A Weinstein, MD
- Affiliations of the authors: Department of Medicine, Cook County Hospital, Chicago, Illinois (MFW, PK, BMZ, RAW); Division of Healthcare Quality Promotion, National Center for Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia (WET); Department of Hospital Information Services, Cook County Bureau of Health Services, Chicago, Illinois (MS); Rush Medical College, Chicago, Illinois (RAW)
- Correspondence and reprints: Mary Wisniewski, MSN, Division of Infectious Diseases, Cook County Hospital, 1901 West Harrison Street, Suite 124 Durand, Chicago, IL 60612; e-mail: < >.
- Received 27 November 2002
- Accepted 19 April 2003
Existing data stored in a hospital's transactional servers have enormous potential to improve performance measurement and health care quality. Accessing, organizing, and using these data to support research and quality improvement projects are evolving challenges for hospital systems. The authors report development of a clinical data warehouse that they created by importing data from the information systems of three affiliated public hospitals. They describe their methodology; difficulties encountered; responses from administrators, computer specialists, and clinicians; and the steps taken to capture and store patient-level data. The authors provide examples of their use of the clinical data warehouse to monitor antimicrobial resistance, to measure antimicrobial use, to detect hospital-acquired bloodstream infections, to measure the cost of infections, and to detect antimicrobial prescribing errors. In addition, they estimate the amount of time and money saved and the increased precision achieved through the practical application of the data warehouse.
Antimicrobial resistance among pathogens that cause hospital-acquired infections increases health care costs and patient morbidity and mortality.1 Strategies to control antimicrobial resistance in hospitals have been outlined by a multidisciplinary group of experts.2 If implemented, these strategies are likely to significantly reduce resistance rates. One of the key goals is development of a hospital information system that could recognize and report trends in antimicrobial use and resistance.
Currently, most researchers and quality promotion professionals abstract patient level data manually from patients' medical records and reenter the data into databases. This process is labor-intensive, which limits the population size evaluated, and is error-prone.3 4 5 Current technology makes electronic transfer of data elements into a clinical data warehouse feasible, efficient, and accurate.6 We describe how our Infectious Diseases Division developed a relational clinical data warehouse using existing data from our health care system's clinical computer applications (laboratory, pharmacy, radiology, admission/discharge). We report our successful use of the clinical data warehouse to automate measurement of performance indicators and our initial experience with electronic surveillance for infection control. We outline the steps involved, describe barriers encountered, report our responses (countermeasures) to those barriers, and provide examples of how we have used the data to facilitate performance improvement activities.
Beginning in 1998, we undertook a five-year hospital-based demonstration project focusing on control of antimicrobial resistance, the Chicago Antimicrobial Resistance Project (CARP), under a cooperative agreement with the Centers for Disease Control and Prevention.7 CARP undertook a series of discrete quality improvement projects to reduce unnecessary antibiotic prescribing and to reduce the transmission of antimicrobial-resistant pathogens among hospitalized patients within the Cook County Bureau of Healthcare Services, a three-hospital public health care system. Essential (core) measures for CARP interventions included secular trends of hospital-acquired infections, antimicrobial resistance, and antimicrobial utilization. An additional goal of CARP was to determine the costs of treating and preventing hospital-acquired antibiotic resistant infections.
An essential infrastructure requirement was an information management system designed to detect, track, and report the occurrence of antimicrobial-resistant organisms and to quantify antimicrobial use for the entire inpatient population of 48,000 admissions per year. As is common in health care, a system-wide relational database linking laboratory, pharmacy, and administrative data did not exist.4 To manage data for the large number of inpatients being evaluated, we built a clinical data warehouse in client/server architecture. The clinical data warehouse was designed to store data collected from nonelectronic sources (e.g., manually abstracted data from patient medical records and surveys scanned using optical character recognition) and to store imported electronic data from existing hospital information systems. To facilitate communication among investigators, we developed a local area network and an intranet Web site that linked investigators and information (Fig. 1).
The development of the CARP electronic Infectious Disease (eID) clinical data warehouse required several steps (Table 1). Initial activities included translating the CARP objectives into core measures of performance, seeking administrative and technical support, determining which data fields (data elements) were needed, and learning where and how data were stored in each hospital's information systems (Table 2). We learned that each clinical department (e.g., pharmacy, laboratory, medical records) stored data in a separate database server and that each of these servers was interfaced to a single proprietary database that was housed and managed in Pennsylvania as part of the hospitals' information system. This two-tiered configuration enabled us to access data locally from each department's server (Table 3). This was necessary because the hospitals' information system report writers lacked an electronic export function. Obtaining written information about each database's architecture, such as documentation about the database tables' relationships and a data dictionary of encoded elements, was essential for understanding the data and for extracting relevant variables into our own clinical data warehouse.
The eID clinical data warehouse is a relational database built with:
Relational database management system: Microsoft SQL 7.0 Server (Microsoft Inc., Redmond, WA)* which includes the following applications: object linking and embedding database (OLE DB), Data Transformation Services (DTS), online analytical processing (OLAP), and data mining.7 9 10
Software applications used to display information: Visual Basic 611 (Visual Basics, Inc., Cape Town, South Africa), Web enabled Crystal Reports12 (Crystal Decision, Inc., Edison, NJ), and Microsoft Office Professional.
Hardware: Two Dell Power Edge 4300 Redundant Base 400 MHz dual Intel Pentium II Servers (Dell Computer, Inc., Round Rock, TX); three Dell Dimension XPS R450MHz Pentium II Minitowers; two Dell Inspiron R450MHz Pentium II Notebooks.
Operating System: Microsoft Windows NT Server 4.13
Hard disk space: 180 GB redundant array of independent disks (RAID) and external Dell PowerVault 220.
Tape Backup: Dell Power Vault 110T.
Power Backup: Smart UPS (undisturbed power supply) 1400 (APC, Inc., Kingston, RI).
Statistical Software: SAS (SAS Institute, Cary, NC) and SPSS (SPSS Inc., Chicago, IL).
Scanning: AutoData Systems (Auto Data Systems Design, Inc., San Angelo, TX).
The Data Transformation Services (DTS)8 function of the SQL server enabled open database connectivity (ODBC) linkages to different database platforms (e.g., application databases from Pharmacy and Laboratory) via the Active X scripting engine. This function was programmed to automate data extraction from each primary server to our eID clinical data warehouse. To minimize any potential impact on the primary transactional server, the extraction step was scheduled once every twenty-four hours, between 1:00 am and 5:00 pm. Data extraction of one day's data takes about two hours. The selected data elements are stored in Microsoft SQL tables. Although extraction could occur more frequently than once a day, currently, no CARP studies require this.
We imported data from 13 different computing and operating systems—pharmacy (3), laboratory (2), radiology (1), medical records (6), and emergency department (1)—in the three CARP hospitals. The clinical data warehouse also stores data generated by CARP studies or surveillance programs (e.g., infection control surveillance). After three years of operation during 1999 to 2002, the database contained 50 GB of information encompassing approximately 130,000 admission/discharge, 22,000,000 laboratory, 6,000,000 pharmacy, 2,000,000 radiology, 1,000,000 emergency department, and 500,000 procedure/diagnosis records (Tables 2 and 3).
Electronic data sources do not encompass all the data necessary for analysis. Much of the primary clinical data, such as vital signs (e.g., blood pressure or temperature) or clinical assessments (e.g., coma scores, urinary output) are collected manually for specific studies. Additional data, such as outcome evaluations (e.g., infection control surveillance) and investigator determination about appropriateness of antimicrobial therapy, also must be transformed into an electronic format for analysis. We have used an optical character recognition (OCR) program for this transfer. In our studies, there has been ≥99% agreement between data collection forms and scanned data. In general, we have used scanning methods rather than keystroke data entries when greater than 3,000 data fields (e.g., 30 data fields × 100 surveys) require input. In our experience, the amount of time for keystroke data entry for more than 3,000 data fields exceeds the effort needed to develop a scan form and to input and verify data.
Validating electronic data for completeness, continuity, and accuracy requires a substantial investment of time and is an ongoing process. Missing or incorrect data from the primary sources, the hospitals' transactional databases, can result from errors in the data entry by care providers, hospital clerks, pharmacy technicians, or laboratory workers.5 14 Also, corruption of data can occur during data transmission, storage, or retrieval. By abstracting a sample of our data and manually comparing the data set with the source data, we were able to find missing data and revise the abstraction programming code for each data set. We repeated this process until there was 100% agreement between data sets from our sample. A final check on validating our abstraction and data management for any new report was to compare patient-specific data from three systems, our data warehouse, the hospital information system vendor's application used to view patient results, and the patient medical record. Data from the hospital information system and from our clinical data warehouse matched, but the paper record missed one or more laboratory results 80% of the time. As an ongoing validation, we periodically compare reports programmed from our data warehouse with existing hospital reports (e.g., admission data, pharmacy expenditures, or shoe-leather surveillance programs). This process is performed by the personnel who use the information and by the programmers who developed the applications.
The development of the clinical data warehouse took two years and approximately 4,000 hours. The development was performed by one director-developer (20% of time) who also served as chairperson of CARP's Informatics Sub-committee, one database administrator (70% of time), and one part-time system analyst (20% of time). A physician-administrator serving as the principal investigator of CARP led the staff.
Difficulites Encountered and Lessons Learned
Politics and Regulatory Issues
Two of the biggest challenges in the planning process were accommodating the security and confidentiality mandates of regulatory agencies and obtaining institutional approvals. Our first contact was with our own system's chief information officer, who required that we obtain secondary approval from the chief operating officer at each of the three hospitals and from each of the clinical department administrators (Table 4).
Keepers of data—often self-viewed as “owners” of the data—have a fiduciary responsibility to maintain data in a confidential and secure manner and to ensure that data are interpreted accurately. Allowing a third party direct access to computer servers may be perceived as compromising data-keeper responsibilities. Our request to create a copy of patient level data for CARP projects challenged the paradigm of data “ownership.” The introduction of the Health Insurance Portability and Accountability Act of 1996 (HIPAA),15 16 which addresses confidentiality of patient level data and the safeguards required for security of electronic data transfer, further complicated our task by introducing another level of scrutiny. HIPAA does not specify how patient data should be protected but does note that institutions and individuals will be held accountable if confidentiality is breached.
To overcome the barriers associated with ownership and confidentiality concerns, we obtained senior administrative-level endorsement of the project. In addition, we informed administrators and data keepers that use of the information by the infectious diseases and infection control divisions of our system's hospitals would help meet regulatory mandates for infection control and improve patient care. Administrative acceptance included approval and endorsement from the chief of the Cook County Bureau of Health Services, hospital attorneys, and department directors. We implemented a security program that included policies, procedures, and technological safeguards to protect patient-level data and to assure compliance with our institutional review board (IRB) requirements. Establishment of the data warehouse was determined to be exempt from IRB review.
The administration of the hospitals' information system data repository is a contracted service through an extramural vendor. This operational practice limited the development of technical expertise in database management within the hospitals' information services staff and restricted their ability to assist us in establishing a connection to the application servers. We learned that staff from each clinical department (e.g., pharmacy, laboratory, medical records, radiology) assisted with the design and management of data in its own server. Understanding the design of the information system software architecture and determining data flow of the two-tiered server configuration were critical to successfully build our clinical data warehouse.
To create links to local servers, we found it essential to use established, or to develop new, relationships with the clinical department directors and staff who managed each local server. In exchange for the time provided by these staffers, we helped them access data. For example, we abstracted pharmacy data about unit-level drug dispensing practices, information the pharmacy needed to plan implementation of automated drug dispensing machines.
Database Content Knowledge
Selecting data elements from each database required knowledge of the database model and the data dictionary and clinical content expertise. Ideally, we would have had the documentation supporting each application, but often this was not available. Once we could view the database contents, clinically trained staff helped interpret the data variables and relational table design. The pharmacy database was relatively simple in design and required abstracting drug data from only one table. In addition, one of the pharmacy directors was able to provide important assistance by directing us to tables containing census information that we needed to report the antimicrobial use per patient days. In contrast, the laboratory database was complicated, and we required assistance from the software vendor's application specialist. The microbiology laboratory database had over 600 tables, only a fraction of which contained data elements relevant to the CARP study. We created one microbiology table by linking ten different tables; some of these tables contained only two variables (e.g., tables that contained encoded data descriptors). After data aggregation and validation, we learned that some laboratory results were archived to other data tables. Much effort was required to review the large number of tables, to determine the meaning of encoded variables, to locate stored variables, and to learn how to link tables and to abstract complete data sets. In addition to the complicated table structure, the microbiology data were partitioned into time-dependent preliminary and final results, and each of the final pathogens had a panel of 1 to ≥10 antimicrobial susceptibility results.
Usability of Data
Few published studies report on the accuracy or usability of clinical warehouse data. Because data accuracy is highly variable,14 methods should be applied for continuous monitoring. In addition, data may be accurate but not in a usable format for computation. For example, in our system's pharmacy tables, the field containing the dose of a medication could be expressed in three different ways: the dose field could be a number representing the actual dosage (e.g., 250 mg of medication); the dose field could represent the volume of fluid to be infused to administer one dose (e.g., 10 mL of medication); or the dose field could indicate the number of units to be administered (e.g., two pills). To address this problem, we had to create a new dose field by having clinical staff manually transform the free text into uniformly comparable data. Once the initial mapping of the 3,247 unique antibiotic dosing combinations in our pharmacies' formularies was completed (approximately 80 person-hours of labor), the ongoing update of new lines of data was minimal but repeated every three months. Another usability problem occurred when the clinical data warehouse encountered nonstandardized data formats. For example, in our pharmacy tables, at times, the drug name also included the preparation strength, which made simple queries by medication name nearly impossible. A third set of usability issues resulted from storage of data as text. The microbiology table contained the results of infrequently performed tests in a free-text comment field that was not abstracted easily. Also, the radiology text-based report files were compressed, and access to them required proprietary extraction software and natural language processing software to interpret these reports.17
Changes in application software and vendors are inevitable, and this requires clinical data warehouse developers to be continually in “development and refinement” mode. Table 2 provides an overview of our hospitals' information systems, the data sets that we have been able to obtain, and the time frame for new systems. In 2002, the Cook County Bureau of Health Services contracted with a new clinical information system vendor. This evolution in the information systems will require that CARP repeat the data acquisition process (Tables 1 and 3) or access these data from a data repository function planned by the Cook County Bureau of Health Services Information Systems Department.
Having information stored in a database and analyzing these data to answer hypotheses are two separate domains. Although programming simple queries to provide descriptive statistics was performed readily, the desire for applying business intelligence models or analyzing and displaying patient level data in a complex arrangement of variables required more advanced programming and analytic skills. We considered purchasing an Application Service Provider, i.e., a software program developed, marketed, and sold for specific applications, but we found none to meet our specific objectives. We developed business specifications outlining our requirements and solicited bids from four consulting firms. The four responses provided plans to use our existing clinical data warehouse and to program the data into information about the care received by patients with infectious diseases. We assessed these proposals and determined that purchasing analytical services was expensive, that intellectual property would be jointly owned, that the final product (programming code) would be proprietary, and that any product modifications or enhancements would require an ongoing contractual relationship. Entering into long-term contractual agreements was deemed too expansive for the scope of the project and assumed that the consultant could ensure data integrity and provide ongoing product support. In addition, the consultant required the study investigators with content expertise to work intensely with contractors in the design of the application. As an alternate approach, we chose to develop internal resources by employing a statistician and recruiting a master's-level graduate student.
Status Report: Use of the eID Clinical Data Warehouse—Surveillance and Measurement
Through timely access to reports about antimicrobial use and resistance trends and the rapid tracking of microorganisms in individual patients, the clinical data warehouse has facilitated assessment of, and actions to promote, quality health care; has automated the examination of resource utilization; and has provided the ground work to replace labor-intensive “shoe-leather” surveillance with electronic processes. The surveillance, quality improvement, and cost accounting uses of the clinical data warehouse are summarized in Table 5, and four sets of examples are described here.
Because antibiotic combinations with redundant antimicrobial spectra are a potentially remediable source of excessive antimicrobial use, we wrote a computer program to detect combinations of antibiotics with overlapping spectrum of activity (i.e., two antibiotics that target the same or very similar bacteria).21 Using Web-based access to the computer program, a clinical pharmacist was able to query the database using drop-down menus, select a specific day, and obtain a list of patients who were receiving two or more antibiotics with overlapping activity. The patient list, sorted by unit location, noted all antimicrobials prescribed and highlighted the redundant combinations. The patient information could be printed with an attached data collection instrument to record follow-up observations. During a five-week test period, 71% of eligible cases flagged by the computer were judged to be truly redundant based on a clinical review, physicians readily accepted this unsolicited advice in 98% of these cases, and drug cost savings were $4,500.21 Before the clinical data warehouse, surveillance for this type of medication error was not performed routinely. During the development for this intervention, a manual review of one day's antimicrobial use in our hospital required 10 person hours (PH) just for case identification. Automated real-time identification of this potential medication error allowed the clinical pharmacist to shift his focus from reviewing patient charts for errant medication orders to evaluating the appropriateness of therapy and conferring with the prescribing physician.
The measurement of antimicrobial utilization was programmed to trend specific drugs by predefined daily dose, duration of therapy in days, and number of courses of antimicrobial therapy.20 Utilization was stratified by administration route (intravenous or oral), patient location, and time (monthly aggregation). Using the inpatient census data, rates of use were normalized (e.g., rates per 1,000 patient-days) for comparison. Automated trend reports of antimicrobial utilization are available at the investigators' desktop computers, through the study's intranet Web site. Without the clinical data warehouse, 0.37 full-time equivalent (FTE) or 770 PH per year would be required to quantify antimicrobial use; with the clinical data warehouse, after the initial development of programming code (160 PH), clinical staff have “as needed” access to reports offering additional levels of unit-based analysis. Previous improvement activities undertaken by study investigators required manual tabulation of antimicrobial utilization.24
Surveillance for Infections and Trending Antibiotic-resistant Organisms
Use of the clinical data warehouse has automated the daily identification of patients with new positive cultures. Infection control personnel use a Web-based program to select a particular day (or days) and create a line listing of patients with positive cultures. This method has replaced the 0.44 FTE (915 PH/yr) previously required to manually review printed laboratory reports and to transcribe cases to a traditional report log and has detected approximately 6% more cases.
Our clinical data warehouse also has been used to create “unit-based antibiograms” (i.e., bacterial susceptibility patterns for infections occurring on a specific patient care unit) that investigators can use to guide selection of empiric antimicrobial therapy and that CARP staff use to evaluate the effectiveness of their interventions aimed to reduce rates of antibiotic resistance. Before the clinical data warehouse, 33 PH per year were used to compile the annual hospitalwide antibiograms; with the clinical data warehouse, we needed a one-time expenditure of 30 PH to write code and use 4 PH for each annual update. Moreover, using the clinical data warehouse, we are able to combine microbiology results with administrative data to create antibiograms by hospital unit, length of stay, or other variables. The clinical data warehouse also allows application of programming rules to simplify removal of “duplicate” isolates (repeated cultures obtained from a single patient).
Programming the clinical data warehouse also has allowed quantification and trending of resistant infections by anatomic site (e.g., positive cultures of sputum or urine) and by bacterial species detected. We display normalized rates (e.g., number of isolates per 1,000 patient days). Before the clinical data warehouse, 0.2 FTE (416 PH/yr) were required to trend the five most common antibiotic resistant organisms; with the clinical data warehouse, after the initial development of programming code (40 PH), clinical staff have on-demand access to reports. This level of retrieval and detail was not available to the clinical staff before development of the clinical data warehouse.
Surveillance for Hospital-acquired Infections
Determining whether positive culture results reflect infections acquired before hospitalization or in the hospital is an important aspect of infection control programs and traditionally requires time-consuming manual data collection. We computed bloodstream infection rates electronically at our health care facilities by developing computer algorithms and applying them to our clinical data warehouse. Before the clinical data warehouse, 0.22 FTE (458 PH/yr) were required to perform chart review of patients with positive blood cultures and to determine when and where the infection was acquired; with the clinical data warehouse, the computer algorithm performs this task rapidly, and the results are well correlated with rates determined by the infection control professional.18
We believe that a relational data warehouse should be a component of every hospital information system. Because our vendor did not provide an easily accessed clinical data warehouse for research and quality improvement purposes, we developed our own. The computer hardware and software technology for developing clinical data warehouses are available and relatively affordable. However, clinical and technical expertise, personnel time, administrative support, and substantial work are required to develop and implement a clinical data warehouse, especially across a network of hospitals. CARP, through this demonstration project, was able to successfully access microbiology, pharmacy, and related data; to overcome the barriers to create a clinical data warehouse; and to build an infection control information system that led to savings of time and money and that allowed personnel to redirect their efforts from acquiring data to implementing infection control interventions.
The availability of information systems–based data for our entire inpatient population has provided close to real-time, desktop access to information for our clinicians and investigators. With this information, we measure performance, monitor infection rates and antimicrobial use, and calculate costs of patient care.18 19 20 21 22 23 24 25 The next steps are to expand our electronically facilitated patient care interventions,18 19 to address new infection control challenges,27 28 to further automate disease surveillance and electronic reporting to public health agencies, to assess the value of data mining techniques18 26 for infection control risk factor identification, to explore additional sources and types of data as our health care system begins to develop an electronic patient medical record, and to evaluate the ability of our system to meet challenges of bioterrorism surveillance.29 The health care information system industry should add this level of function—electronic surveillance, data transfer, and reporting—into its hospital applications to remain competitive and to provide information that will give health care purchasers a greater return on their investment.
This work was supported by The Centers for Disease Control and Prevention Cooperative agreement # U50/CCU515853. The Cook County Bureau of Health Service Institutional Review Board approved this project on December 1, 1998.
↵* Use of trade names and commercial sources is for identification only and does not imply endorsement by the Public Health Service of the U.S. Department of Health and Human Services.