rss
J Am Med Inform Assoc 1997;4:313-321 doi:10.1136/jamia.1997.0040313
  • Original Investigation
  • Research Paper

Using Computer-based Medical Records to Predict Mortality Risk for Inner-city Patients with Reactive Airways Disease

  1. William M Tierney,
  2. Michael D Murray,
  3. Denise L Gaskins,
  4. Xiao-Hua Zhou
  1. Affiliations of the authors: Wishard Memorial Hospital (WMT, MDM); Regenstrief Institute for Health Care (WMT, MDM, DLG, XHZ); Indiana University School of Medicine (WMT, XHZ); Richard L. Roudebush Veterans Affairs Medical Center (WMT, MDM); Purdue School of Pharmacy (MDM), Indianapolis, IN
  1. Correspondence and reprints: William M. Tierney, MD, Regenstrief Institute for Health Care, Sixth Floor, RHC, 1001 West Tenth Street, Indianapolis, IN 46202. E-mail: btierney{at}vax1.iupui.edu
  • Received 17 January 1997
  • Accepted 24 February 1997

Abstract

Objective To use routine data from a comprehensive electronic medical record system to predict death among patients with reactive airways disease.

Design Retrospective cohort study conducted in an academic primary care internal medicine practice. Subjects were 1,536 adults with reactive airways disease: 542 with asthma and 994 with chronic obstructive pulmonary disease (COPD).

Measurements The dependent variable was death from any cause within 3 years following patients' first primary care appointment in 1992. Multivariable logistic regression was used to identify independent predictors of 3-year mortality, with half of the patients used to derive the predictive model and the other half used to assess its predictability.

Results Of the 1,536 study patients, 191 (12%) died in the 3-year follow-up period. From information available on or before patients' first primary care visit in 1992, multivariable predictors of 3-year mortality were coincidental heart failure, male sex, presence of COPD, lower weight, low serum albumin concentration level, and a prior arterial PO2 of less than 60 mmHg; use of an inhaled corticosteroid was protective. The c-statistic (ROC curve area) in the validation cohort was 0.76, indicating good discrimination, and goodness of fit was excellent by Hosmer-Lemeshow chi-square (P > 0.5). Only 24% of the patients in the validation cohort were designated at high risk (estimated ≥15% 3-year mortality), but this group contained more than half of the deaths within 3 years for the entire cohort.

Conclusions Data generated during routine care and stored in a comprehensive electronic medical record can accurately predict mortality among patients with reactive airways disease. Such technology can be used by practices to control for severity of illness when assessing clinical practice and to identify high-risk patients for interventions to improve prognosis.

In recent years, there has been a disturbing trend toward increasing prevalence and severity of reactive airways disease1 2 3 4 5 6 7 and associated mortality,1 4 5 8 9 especially among inner-city residents.9 10 11 If patients with reactive airways diseases at higher mortality risk could be identified, interventions to reduce this risk could be made more cost effective, targeting those most likely to benefit.

Predicting adverse outcomes depends on having a source of predictive information. One could prospectively collect such information by repeatedly examining and testing patients with reactive airways disease, but this would be expensive and time consuming. Routine clinical information, gathered during the process of delivering care, may predict utilization12 13 and adverse outcomes,14 15 16 17 yet such information is not usually accessible. Comprehensive electronic medical record systems could be one source for such predictive information,18 yet such systems are not currently widely available. If data in such systems could be shown to aid the delivery of cost-effective preventive care, this would be one more benefit to weigh against the cost of establishing and maintaining such systems.

Using a state-of-the-art comprehensive electronic medical information system,19 we tested the hypothesis that routine clinical data obtained from electronic medical records can accurately predict morbid outcomes in adults with reactive airways disease.

Methods

This study was performed in the General Medicine Practice (GMP), an academic primary care general internal medicine practice affiliated with an inner-city teaching hospital.20 Approximately 13,000 patients visit this practice more than 50,000 times per year and are cared for by approximately 150 physicians, two thirds of whom are residents (predominantly in categorical internal medicine and medicine-pediatrics programs). This health care system is served by the Regenstrief Medical Record System (RMRS),21 a comprehensive information system that handles all laboratory, pharmacy, and appointment information for a network of inner-city facilities, including a 340-bed hospital, an adjacent outpatient facility with more than 65 offices and clinics, and more than 20 primary care neighborhood health centers and public health clinics.19 21

The RMRS contains more than 150 million observations on more than 1.5 million separate patients cared for over the past 20 years. All data are stored permanently, with the exception of repeated inpatient clinical laboratory tests, for which the RMRS stores the first test value obtained upon admission, the last value before discharge, and the highest and lowest value for each week of the hospital stay. Death data are obtained from discharge case-abstracts, dictated death summaries, and the Indiana State Department of Health' s computerized death certificate files. Names of patients in the death certificate files are matched with names in the RMRS using a validated matching algorithm.22

Patients eligible for this study were those more than 14 years old who visited the GMP at least once in 1992 and had any one or more of the following indicators of reactive airways disease: any diagnosis of asthma, chronic obstructive pulmonary disease (COPD), emphysema, or chronic bronchitis recorded at any inpatient, emergent, urgent, or outpatient visit; any chest radiograph read as showing COPD; any prior prescription for inhaled beta-adrenergic agonists, cromolyn, parasympathetic blockers, or corticosteroids; or any prescription for oral beta-adrenergic agonists or theophylline. This definition was purposefully broad to be more sensitive (rather than specific) for patients with reactive airways disease in order to include a full spectrum of such patients.

For descriptive purposes, we arbitrarily assigned the diagnosis of “asthma” to patients who had ever had that specific diagnosis recorded from any site of care (and had never had the diagnosis of COPD recorded from any site) and to patients who had never had either specific diagnosis but had first taken one of the above-mentioned medications before 45 years of age. All other patients were deemed to have COPD. (We recognize that patients who never smoked should be considered asthmatic regardless of their age of onset of reactive airways disease. However, because our smoking information is not complete, and because of the blurring of the boundaries between asthma and chronic airways obstruction, we opted to arbitrarily separate these two groups of patients on the basis of their diagnoses and age at first treatment.)

We excluded patients who had never visited the GMP before their first visit in 1992 (who had too few data to analyze) as well as patients not known to have died within 3 years and who never returned to any inpatient, emergent, urgent, or outpatient site after their first GMP visit in 1992. All remaining eligible patients were included in the analysis.

We then extracted potential predictor variables from data stored in patients' electronic medical records on or before their first GMP visits in 1992. These variables fell into the following general categories: demographic data, vital signs, diagnoses from clinical and billing problem lists, diagnostic test results (clinical laboratory test results as well as coded results for imaging studies), drugs prescribed from any outpatient, emergency, or urgent care site, and clinical activity (visits to outpatient, emergent, urgent, or inpatient sites both before and after the first diagnosis of reactive airways disease). Because of the high prevalence of comorbid disease in this population, we assessed utilization (e.g., days hospitalized) both before and after the first evidence of reactive airways disease.

The dependent variable of interest was death from any cause after study patients' first GMP visit in 1992.

We chose to study all-cause mortality because of the known inaccuracy of causal information on death certificates23 24 and the possibility that deaths attributed to infectious or cardiac diseases may have had reactive airways disease as an inciting or contributing cause. Moreover, to patients, their families, and their physicians, the fact of death is more important than the immediate cause of death.

Using a random number generator in the Statistical Analysis System (SAS Institute, Cary, NC), we randomly selected half of our cohort for deriving the predictive model (the derivation cohort). The predictive power of this model was then assessed using the other half of the patients (the validation cohort). Because of the large number of potential predictor variables, we employed a two-stage data reduction process. The predictive power of individual variables among the derivation cohort was assessed using univariable logistic regression. All univariably significant predictors (p < 0.10) were then entered into multivariable logistic regression analysis (with backward removal of non-significant predictors). Selker25 has recommended that the number of potential predictor variables should not exceed one tenth of the number of outcomes being predicted to avoid overfitting the model to the data. However, instead of arbitrarily excluding univariably significant predictors, we opted instead to only report the model' s performance in the validation cohort, using the c-statistic (a measure of discrimination equal to the area under the Receiver Operating Characteristic, or ROC, curve)26 and the Hosmer-Lemeshow chi-square statistic (a measure of model calibration or goodness-of-fit). Consistent with our prior modeling efforts,14 27 before we began the analyses we established a c-statistic (ROC curve area) of ≥0.70 as acceptable discrimination28 and a Hosmer-Lemeshow chi-square p-value of ≥0.10 as acceptable calibration.

The resulting logistic regression model can be used to calculate a probability of dying in 3 years for each patient by using the following standard formula:Graphicwhere P is the calculated probability, B0 is the intercept (constant), Bn is the nonstandardized parameter estimate for the nth variable, and Xn is the raw value of the nth predictor variable. All indicator variables (e.g., diagnoses, findings on imaging studies) were coded as 1 if they were found in a patient' s electronic medical record on or before the patient' s first GMP visit in 1992; otherwise, they were coded as 0. (This means that if a condition or finding was not mentioned in the patient' s record or the imaging test had not been performed, the condition was assumed to be absent.) Therefore, there were no missing data for these variables. Counting variables (e.g., number of prior emergency department visits) were likewise coded as 0 if none was noted in a patient' s electronic record. Thus, the only variables that were potentially missing were clinical laboratory tests and vital signs. As in our previous modeling efforts,14 27 we arbitrarily excluded from analysis all such variables missing in more than 15% of subjects. We then dealt with missing data for the remaining variables by substituting the sample mean for the missing values. We repeated the analysis by excluding any patient whose record was missing any univariably significant laboratory test or vital sign. In a few instances, most notably arterial blood gases, we assumed that a patient who never had the test performed had normal results. We then created indicator variables for patients who had significant prior abnormalities on these tests (e.g., an arterial pH < 7.30 or an arterial Po2 < 60 mmHg).

Results

In 1992, 1,557 patients visited the GMP who had prior evidence of reactive airways disease and had visited the GMP at least once before. Of these patients, 21 (1.3%) were excluded from analysis because they had no subsequent inpatient, emergent, urgent, or outpatient encounters and were not known to have died. Of the 1,536 remaining patients, 542 (35%) had asthma, while the other 994 (65%) had COPD by our definitions. The mean length of time that these 1,536 patients had received care in the GMP before their first visit in 1992 was 8.1 ± 5.2 (SD) years (range 0-16). The length of time followed in any facilities served by the RMRS prior to 1992 was 9.6 ± 5.2 years (range 0-16.5). The total time that these 1,536 patients were treated for reactive airways disease in this system was 6.9 ± 4.6 years (range 0-17).

By 3 years after their first GMP visits in 1992, 191 patients (12% of the entire study cohort) had died. Table 1 compares the baseline characteristics (i.e., data available on subjects' first GMP visits in 1992) of patients who died with those who survived. As expected, patients who died were significantly older, more often had COPD, less often had asthma, and had a greater burden of comorbid disease. There was more current treatment with ipratropium and home oxygen by patients who died but no difference in inhaled and oral corticosteroids, beta-adrenergic agonists, theophylline, or cromolyn. Patients who died had significantly lower weight and serum albumin concentrations than patients who survived. Patients who died had more prior hospitalizations and outpatient visits but no difference in emergency department visits than patients who survived. Hospitalizations specifically for reactive airways disease and pneumonia were also higher among patients who died. Finally, patients who died within 3 years more frequently had prior abnormal arterial blood gas values (PO2 and pH) than patients alive 3 years after their first GMP visit in 1992.

Table 1

Characteristics of Study Patients Who Died in 3 Years Compared With Those Who Survived

There were 752 patients (49%) randomly assigned to the model derivation cohort, 93 (12%) of whom died within 3 years of their first GMP visit in 1992. Of the 90 potentially predictive variables assessed, 39 passed the univariable screen (p < 0.10) and were submitted to multivariable logistic regression analysis. Seventy-seven derivation cohort patients (10%) were missing one or more of the candidate predictor variables (predominantly laboratory test results). We performed the multivariable analysis with and without substituting the derivation sample mean for missing variables; the results were almost identical. We therefore present the results for the analysis using the entire dataset with mean substitution for missing values for candidate predictors.

Table 2 shows the eight variables that were retained in the final logistic regression model by virtue of their having a multivariable p-value less than 0.05. By far, the strongest predictor was the presence of heart failure (as evidenced by that diagnosis recorded during any encounter or by evidence of heart failure on cardiac imaging studies). Patients with heart failure had three times the risk of dying in 3 years as did patients without heart failure. The next-strongest predictor was male sex (again, associated with an almost tripling of the risk of death in 3 years), followed by the diagnosis of COPD, lower weight, lower serum albumin concentration, having a prior PO2 of less than 60 mmHg, and the number of prior hospitalizations for pneumonia. Current treatment with an inhaled corticosteroid (at the time of their first GMP visit in 1992) was associated with a 60% reduction in the risk of dying in 3 years.

Table 2

Multivariable Predictors of 3-Year Mortality in the Derivation Cohort

The model' s ability to predict 3-year mortality was assessed among the 784 patients (51%) who were randomly assigned to the model validation cohort, 98 (12%) of whom died within 3 years of their first GMP visit in 1992. The c-statistic (equivalent to area under the ROC curve) for the model was 0.76, which indicates good discrimination by our a priori criteria and those of others.28 The Hosmer-Lemeshow goodness-of-fit chi-square statistic was 7.13, with 8 degrees of freedom (p =.52), which indicates excellent calibration. Figure 1 shows the calibration curve for the validation cohort, which also demonstrates excellent agreement between predicted and observed deaths and shows that the model does not routinely overestimate or underestimate 3-year mortality.

Figure 1

Calibration curve comparing predicted versus actual deaths in deciles of the model validation cohort, sorted by the logistic regression model' s predicted probability of death.

As an example of how this model could be used to identify low- and high-risk subgroups of patients, we arbitrarily defined as high risk those patients in the validation cohort who had a calculated probability of dying in 3 years of more than 15% (Table 3). Of the 24% of patients in the validation cohort who would be defined as high risk using this criterion, 28.5% died within 3 years compared with only 7.5% mortality among the remaining 76% of validation set patients thus defined as low risk. The high-risk group contains more than half of all patients who died.

Table 3

Example of Using a Threshold of Predicted Probability of Death of >0.15 To Define Patients in the Validation Cohort as High Risk

Discussion

This study demonstrates that information gathered during routine care by a state-of-the-art electronic medical record system can accurately predict 3-year mortality among patients with reactive airways diseases. If risk estimates based on such data were available for quality improvement activities, these activities could be made more cost effective by focusing on higher risk patients (Table 3). As health care resources become increasingly constrained, they and their associated costs can thus be allocated to those patients most likely to benefit from them. Using a predictive equation such as ours (Table 2), higher risk patients could, for example, be referred to pulmonary subspecialists, followed more closely by clinical nurse-specialists, or aggressively pursued for preventive care (such as influenza and pneumococcal vaccination).29 30 Primary care physicians and/or specialists might also treat comorbid conditions more aggressively or emphasize medication and visit compliance among higher risk patients.

Such predictive models can also be used to correct for severity of illness in evaluating the performance of physicians or other health care providers.31 32 33 For example, assessments of our physicians' health care utilization for patients with reactive airways diseases could be adjusted for their patients' severity of illness, giving them “credit” for caring for sicker patients. With increasing frequency, health care is being scrutinized because of its high costs and the unexplained (and often unjustifiable) variability in the processes and outcomes of care observed when comparing physicians, practices, hospitals, and sometimes geographic areas.34 35 36 37 38 If providers are not given credit for caring for sicker patients, the higher risk of adverse events and greater costliness associated with these patients would be a disincentive for caring for them. Unfortunately, systems for adjusting for severity of illness often perform poorly in venues other than those in which they were developed.39 Using one' s own electronic medical records to adjust for illness severity avoids such problems with generalizability of others' predictive models.40 However, adjusting for risk using such models is only valid within a health care system served by an electronic medical record. Comparisons between systems would be limited by the inherent differences in patients, data systems, and processes of providing care.

Predictive models can also provide clinicians with risk assessments for individual patients. For example, because all orders written in the GMP are entered by patients' primary care physicians directly into computer workstations,19 41 physicians writing outpatient orders can be shown the risk of each patient' s dying in the next 3 years. As part of a larger trial of the effects of computer-based practice guidelines, we are currently using the model in Table 2 and another predicting mortality in patients with heart failure14 to provide mortality estimates during outpatient ordering for appropriate GMP patients.

However, for such information to be useful, physicians must be able to understand and act upon quantitative probabilistic information. We believe they can. In a prior study of our emergency department patients with chest pain, treating physicians' numeric probability assessments were the strongest predictors of myocardial infarction.42 In another study performed using the order-writing workstations in the GMP, physicians shown low calculated probabilities that the tests they were ordering would be abnormal27 ordered significantly fewer tests than did control physicians.43

To have any effect on physician behavior, such predictive models must have face validity. Clearly, patients with heart failure alone are at increased risk of mortality.14 Their risk is even greater when combined with reactive airways diseases: 28% of validation cohort patients with heart failure died in 3 years compared with 8% of those without heart failure. Also, COPD (17% mortality in 3 years) was more morbid than asthma (8% mortality), which was also expected given the chronicity and irreversibility of physiologic dysfunction with COPD. It also makes sense that those with more severe pulmonary disease, as indicated by having a prior arterial PO2 of less than 60 mmHg or prior hospitalizations with pneumonia, were at increased risk. Lower weight and serum albumin levels have been shown by us14 and others44 45 46 47 to be predictive of death. Although both factors can indicate general physical decline, chronic disease, or malnutrition, direct effects causing mortality have been postulated for hypoalbuminemia.47 48 49 50 51 Men may have been at higher risk because, in this cohort, they were older and had a greater burden of comorbidity: they had 50% more coronary artery disease and COPD than women. They also had fewer prior visits for primary care, suggesting that men with reactive airways disease less often received care specifically aimed at forestalling morbidity. In a prior epidemiologic study using RMRS data, Murray et al.52 showed that, especially among adolescents and young adults, men were less likely to receive primary care, more likely to be cared for in the emergency department, and more likely to be hospitalized for reactive airways disease.

Finally, treatment with inhaled corticosteroids (in this case, predominantly beclomethasone) was independently associated with a marked reduction in 3-year mortality. However, further analyses of our data by type of reactive airways disease shows that the protective effect of inhaled corticosteroids is seen only among patients with asthma, not patients with COPD. This result is consistent with recent guidelines for treatment of asthma published by the National Institutes of Health, which reinforce the use of inhaled corticosteroids in what is now considered primarily an inflammatory condition.53 This lends strong support for a computer reminder that encourages the use of inhaled corticosteroids among asthmatics.

Selker25 has listed nine criteria upon which to assess predictive models. The model reported herein meets eight of them. The sole unmet criterion (not evaluating more than one independent variable for every ten patients suffering the outcome of interest—in this case, death in 3 years) is included by Selker to avoid overfitting the model to the data. However, this point is moot in our study because we have only reported our model' s predictive power among patients in half of the patients who were excluded from the model-building analysis.

This study has limitations. First, the model shown in Table 2 may not be useful in other populations. These inner-city patients are not likely to be representative of most patients with reactive airways diseases. Moreover, the data come from the Regenstrief Medical Record System,19 which is unique, albeit a prototype of future electronic medical records.18 However, even though the model may not perform well in other populations, the methods we employed to derive our local model are certainly generalizable to practices with comprehensive electronic medical record systems. Our results should not only encourage those with sophisticated electronic medical record systems to profile their own patients' risk but also provide those lacking such systems with evidence that the value of these systems goes beyond generating bills and storing diagnostic test results.

This study also included some patients with very weak evidence of reactive airways disease. For example, a patient with one prior bout of acute bronchitis with bronchospasm treated with a single metered-dose inhaler would have been included in this study. However, those wishing to use electronic medical record systems to assess patient risk will also not have the luxury of verifying the prevalence of reactive airways disease in all patients with any evidence of its existence. If anything, including such patients is likely to bias the analysis against finding accurate predictors of mortality. We opted to include patients with any evidence of reactive airways disease rather than attempt to estimate its severity, which would be especially arbitrary among asthma patients, whose disease is by nature often intermittent and characterized by long intervening periods of normality.

We conclude that information stored in a state-of-the-art electronic medical record system can be used to accurately predict mortality among inner-city primary care patients with reactive airways disease. Predictive information was found among demographic data (race), outpatient and inpatient diagnoses, arterial blood gas values and other diagnostic test results, and drug use data. As electronic medical record systems are installed into health care facilities or are created from existing electronic medical information resources,54 those sufficiently comprehensive to include such data can (and should) be used to identify such patients at high mortality risk. Such prognostic models can be used to target treatment interventions and quantify severity of illness.

References

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.