Effects of clinical decision-support systems on practitioner performance and patient outcomes: a synthesis of high-quality systematic review findings
- 1Department of Medical Informatics, Academic Medical Center, Amsterdam, The Netherlands
- 2Department of Quality Assurance & Process Innovation, Academic Medical Center, Amsterdam, The Netherlands
- Correspondence to Dr Monique W M Jaspers, Department of Medical Informatics, Academic Medical Center, PO Box 22700, 1100 DE Amsterdam, The Netherlands;
- Received 15 July 2010
- Accepted 2 January 2011
- Published Online First 21 March 2011
Objective To synthesize the literature on clinical decision-support systems' (CDSS) impact on healthcare practitioner performance and patient outcomes.
Design Literature search on Medline, Embase, Inspec, Cinahl, Cochrane/Dare and analysis of high-quality systematic reviews (SRs) on CDSS in hospital settings. Two-stage inclusion procedure: (1) selection of publications on predefined inclusion criteria; (2) independent methodological assessment of preincluded SRs by the 11-item measurement tool, AMSTAR. Inclusion of SRs with AMSTAR score 9 or above. SRs were thereafter rated on level of evidence. Each stage was performed by two independent reviewers.
Results 17 out of 35 preincluded SRs were of high methodological quality and further analyzed. Evidence that CDSS significantly impacted practitioner performance was found in 52 out of 91 unique studies of the 16 SRs examining this effect (57%). Only 25 out of 82 unique studies of the 16 SRs reported evidence that CDSS positively impacted patient outcomes (30%).
Conclusions Few studies have found any benefits on patient outcomes, though many of these have been too small in sample size or too short in time to reveal clinically important effects. There is significant evidence that CDSS can positively impact healthcare providers' performance with drug ordering and preventive care reminder systems as most clear examples. These outcomes may be explained by the fact that these types of CDSS require a minimum of patient data that are largely available before the advice is (to be) generated: at the time clinicians make the decisions.
In ‘Crossing the quality chasm,’ the Institute of Medicine pointed out the wide variations in healthcare practice, and the inefficiencies, dangers, and inequalities that have resulted from nonoptimal patient care.1 Evidence-based medicine (EBM) aims to reduce practice variation and improve quality of care. It does so by combining the clinical skills and experience of the healthcare professional and preferences of the patient with the best external clinical evidence available in order to make balanced decisions about medical care.2 EBM has, since its introduction in the 1980s, become widespread and has been adopted by international healthcare organizations such as the WHO and the Institute of Medicine. EBM seems a fairly common-sense solution, but it has proved to be far from simple to implement. A report from Grol et al provides a brief overview of strategies for the effective implementation of change in patient care.3 One of the interventions discussed is the use of reminders and computers for the implementation of evidence in daily practice. It is concluded that, among other interventions on the organizational and team level, professional development needs to be built into daily patient care as much as possible. This preferably should take place at the point of care with clinical decision-support tools and real-time patient-specific reminders to help doctors make the best decisions. Clinical decision support is defined as: ‘providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care.’4 Clinical knowledge incorporated in clinical decision-support systems (CDSS), for instance, can be based on available best evidence which is represented in guideline recommendations.
There are many different types of clinical tasks that can be supported by CDSS. A well-known and frequently applied CDSS is the patient-monitoring device (eg, an ECG or pulse oximeter) that warns of changes in a patient's condition. CDSS integrated in Electronic Medical Record systems (EMRs) and computerized physician order entry systems (CPOEs) can send reminders or warnings for deviating laboratory test results, check for drug–drug interactions, dosage errors, and other prescribing contraindications such as a patient's allergies, and generate lists of patients eligible for a particular intervention (eg, immunizations or follow-up visits). When a patient's case is complex or rare, or the healthcare practitioner making the diagnosis is inexperienced, a CDSS can help in formulating likely diagnoses based on (a) patient data and (b) the system's knowledge base of diseases. Subsequently, the CDSS can formulate treatment suggestions based upon treatment guidelines.
Research into the impact of CDSS on healthcare practitioner performance and patient outcomes in hospital settings has increased, and evidence of the effectiveness of CDSS has been synthesized into several systematic reviews (SRs). However, an overview of this evidence based on a critical appraisal of SRs focusing on CDSS impact is not available. Therefore, we set out to provide a synthesis of high-quality SRs examining CDSS interventions in hospital settings. The objective is (a) to summarize their effects on practitioner performance and patient outcome, and (b) to highlight areas where more research is needed.
To find relevant SRs, we developed a search strategy in cooperation with an experienced clinical Librarian. The search strategy was first developed for Medline to be adapted later to search Embase, Inspec, Cinahl, and Cochrane/Dare. To identify SRs we used a multiple-term search strategy as proposed by Montori et al5 and multiple keywords and Medical Subject Headings (MeSH) terms for the interventions CDSS or CPOE (table 1). No time period or language limitation was applied. Box 1 displays the search strategy for PubMed.
Search strategy in PubMed
(literature review[tiab] OR critical appraisal[tiab] OR meta analysis[pt] OR systematic review[tw] OR medline[tw]) AND (medical order entry systems[mh] OR medical order entry system*[tiab] OR computerized order entry[tiab] OR computerized prescriber order entry[tiab] OR computerized provider order entry[tiab] OR computerized physician order entry[tiab] OR electronic order entry[tiab] OR electronic prescribing[mh] OR electronic prescribing[tiab] OR cpoe[tiab] OR drug-therapy, computer assisted[mh] OR computer assisted drug therapy[tiab] OR decision support systems, clinical[mh] OR decision support system*[tiab] OR reminder system*[tiab] OR decision-making, computer assisted[mh] OR computer assisted decision making[tiab] OR diagnosis, computer assisted[mh] OR computer assisted diagnosis[tiab] OR therapy, computer assisted[mh] OR computer assisted therapy[tiab] OR expert systems[mh] OR expert system*[tiab] OR *CDS*[tiab]).
Inclusion of relevant studies
To assess whether publications that were found were relevant, we applied the following inclusion criteria: types of studies, intervention, target groups, and outcome measures.
Only SRs were eligible for inclusion. To determine wether a publication was a SR, we used the checklist for assessment of systematic reviews of the Dutch Cochrane Centre.6 For the initial screening of titles and abstracts, we considered a review to be systematic if at least (a) Medline had been searched, and (b) the methodological quality of the included studies had been assessed by the reviewer(s). For the screening based on full text, we added the AMSTAR criteria7 8 as described hereafter.
CDSS combining clinical knowledge with patient characteristics, including CPOE systems decision-support functionality and CDSS for diagnostic performance.
The CDSS interventions studied should be aimed at healthcare professionals such as: physicians, nurses, and other practitioners who are directly responsible for patient care in the hospital setting (in- and outpatient). We excluded CDSS interventions aimed at healthcare professionals who are indirectly involved in patient care at ancillary clinical departments such as laboratories, radiology, pathology, and physiological function departments.
Within the SRs, either practitioner performance or patient outcomes should be measured.
A two-stage inclusion process was applied. In the first stage, titles and abstracts of articles identified by the search strategy were screened by two reviewers independently to assess whether these publications met the inclusion criteria. When the title and/or abstract provided insufficient information to determine relevance, full paper copies of the articles were retrieved in order to determine wether they fulfilled the inclusion criteria. Additionally, a manual search of the reference list of the selected full text papers was performed to identify SRs which our search could have missed. Any disagreements between the two reviewers were resolved by discussion and consultation of a third reviewer to reach consensus.
The second stage of inclusion relates to the methodological assessment of the reviews. All reviews that remained after the first stage were assessed with the Assessment of Multiple Systematic Reviews (AMSTAR) tool.7 8 AMSTAR is an 11-item measurement tool for the assessment of multiple systematic reviews that have good reliability and validity.8 9 The AMSTAR items are scored as ‘Yes,’ ‘No,’ ‘Can't answer,’ or ‘Not applicable.’ The AMSTAR criteria comprise: (1) ‘A priori’ design provided; (2) duplicate study selection/data extraction; (3) comprehensive literature search; (4) status of publication used as inclusion criterion; (5) list of studies (included/excluded) provided; (6) characteristics of included studies documented; (7) scientific quality assessed and documented; (8) appropriate formulation of conclusions; (9) appropriate methods of combining studies; (10) assessment of publication bias; and (11) conflict of interest statement.
The maximum score on AMSTAR is 11 and scores of 0–4 indicate that the review is of low quality; 5–8 that the review is of moderate quality; and 9–11 that the review is of high quality. Data were extracted only from high-quality reviews (=with scores of 9 and above) because low-quality reviews may reach different conclusions than high-quality reviews, and also to avoid false conclusions that are based on low-quality evidence.7
Two reviewers undertook independent critical appraisal. Discussion among the two reviewers and a third independent reviewer occurred on all dual-appraised articles to verify appraisal processes, and to resolve disagreements on individual item score allocation.
Data extraction and management
Data extraction was independently performed by two reviewers and verified by a third reviewer. The data abstracted were categorized in outcome measures: (a) practitioner performance and (b) patient outcome. The following data were extracted from each included SR using a structured data collection form: data related to clinical settings and target groups of CDSS implementation, number and type of trials included, and main outcomes and conclusions. Separate summaries were made for practitioner performance and patient outcomes. Within these summaries, a distinction was made between general SRs, disease/therapy-specific SRs and setting/patient population-specific SRs. Results on practitioner performance and patient outcomes were assessed by grading them on the strength of evidence for improvement. The evidence strength is based on the included randomized controlled trials (RCT) in the SRs. RCT was defined as an experimental design used for testing the effectiveness of a clinical decision-support tool in which individuals are assigned randomly to the intervention and a control group (standard procedure) and for which the outcomes are compared:
strong evidence: results based on RCTs and effect in 50% of more of the studies;
limited evidence: results based on RCTs and effect in 40–50% of the studies;
insufficient evidence: results based on non-randomized studies or effects in less than 40% of the studies.
Selection of studies
After duplicates had been removed, the searches in the different databases resulted in an initial set of 849 references of potential interest. Initial sifting based on title and abstract resulted in exclusion of 703 articles with full agreement between the two reviewers, reducing the first set to 146 references. An additional manual search of the reference list of the selected studies resulted in another 12 potentially relevant references. Full texts of the remaining potential relevant articles (n=158) were assessed independently by two reviewers against the inclusion criteria. Agreement between reviewers in this phase was 94% with subsequent exclusion of 115 articles. Discussion among the three reviewers was needed for eight references, and agreement was subsequently reached (all eight articles excluded). A set of 35 references finally proved to fulfil the inclusion criteria for type and content of study.
In the following stage, two reviewers independently assessed the 35 included reviews on their methodological quality, using the AMSTAR tool.7 8 Initial reviewer agreement on 354 of the 385 (35×11) individual AMSTAR item scores was reached in this phase (92%); disagreements on score allocations were resolved through discussion with a third reviewer.
Further study excluded two SRs as preliminary reports in conference proceedings for which a full SR article was included.10 11 Of the remaining set of 33 SRs, 15 had a mean quality score lower than 9 and were excluded,10–26 and 18 (51%) were high-quality SRs and advanced to the stage of data-extraction and analysis. One of these 18 SRs was excluded because it did not provide results on practitioner performance or patient outcomes. The flow diagram of the inclusion process is shown in figure 1. Table 2 provides the critical appraisal results of the SRs included.
Synthesis of evidence
The results of the 17 included SRs are summarized in tables 3, 4.27–43 The SRs were published between 1994 and 2009. In total these SRs included 411 references, 229 of which represented unique studies, with 188 RCTs. Of these unique 188 RCTs, 108 RCTs studied practitioner performance or patient outcome. No increase in SR quality, with regard to fulfilllment of the AMSTAR criteria, was visible over the years. Sixteen of the 17 SRs examined the influence of CDSS on practitioner performance: nine general SRs,27–33 40 41 five disease- or therapy-specific SRs34–37 42 and two setting-or patient-population-specific SRs.38 39 Evidence that CDSS significantly impacted practitioner performance was found in 52 out of 91 unique studies of the 16 SRs that examined this effect (57%). Twelve of these 16 SRs found strong evidence that CDSS improved practitioner performance: six general SRs,27–32 four disease-or therapy specific SRs 34–37 and two setting- or patient-population-specific SRs.38 39 Two general SRs found limited evidence,40 41 and one general33 and one disease- or therapy-specific SRs42 found insufficient evidence that CDSS impacted practitioner performance. Findings were mainly positive for computer reminder systems for preventive care and computer-assisted drug ordering and dosing systems. Preventive care CDDS led to improvements in management of high blood pressure, diabetes care, and asthma care. Drug-prescribing CDDS resulted in improvements in clinicians' ordering patterns of drug dosages and frequencies, decreases in serious medication ordering errors, and adequate drug concentrations in patients. There was insufficient evidence that CDSS improved anticoagulant prescription by physicians.
Sixteen out of the 17 SRs studied the impact of CDSS on patient outcomes. Evidence that CDSS significantly impacted patient outcomes was found in 25 out of 81 unique studies of the 16 SRs that examined this effect (30%). These effects were related to drug ordering and dosing systems and CDSS for preventive care and disease management. Only three of these 16 SRs29 35 37 found strong evidence that CDSS impacted patient outcomes: one general SR29 and two disease-/therapy-specific SRs.35 37 Two general SRs found limited evidence,30 41 and the remaining 11 SRs found insufficient evidence: seven general,27 28 31–33 40 41 two disease-/therapy-specific34 36 and the two setting-/patient-population-specific SRs.38 39
Quality of studies included in the systematic reviews
Although we merely synthesized the findings of SRs with an AMSTAR score of 9–11, the methodology quality of the studies included in these SRs was a major discussion point. The SRs included a total of 229 unique studies with 188 RCTs (82%). Non-randomized uncontrolled interventions may provide biased overestimated effects of CDSS. None of the 10 studies in the SR by Wolfstadt et al,43 for example, were RCTs, which highly reduced the significance of their findings. Risk of contamination of results was another concern in the RCT studies included in at least three SRs.32 37 38 Finally, Liu et al36 reported that the uncontrolled before–after study and interrupted-time series, both taking limited account of known but not at all for unknown confounding factors, have been the most popular to evaluate CDSS for acute abdominal pain. Walton et al29 concluded that a common bias in many of the studies of their SR was that the same clinicians treated patients allocated to the CDSS intervention and control condition. As a result, the effects of the CDSS may spill over into the control group. Contamination of the control group in this manner would tend to make it more difficult to show a beneficial effect from CDSS.
Regarding the lack of positive findings on patient outcomes, many authors discussed that this may be due to the small sample sizes in the original studies that consequently were underpowered.9 28 38 Furthermore, follow-up periods in most studies were often not long enough to assess long-term differences on patient outcomes related to the computerized interventions (eg, see Shamliyan et al33). Studies with too small sample sizes or too short follow-up periods are at risk of overinterpretation of non-significant results. Fortunately, Hunt et al28 and Garg et al31 indicate that the number and methodological quality of trials have improved over time. An increase in RCTs including a power analysis to calculate the minimum sample size required to show an impact of CDSS was also reported.
Synthesis of the systematic reviews results
It is clear from our synthesis that few SRs have found benefits on patient outcomes, though many SRs have been too small in sample size or too short in time to reveal clinically important effects related to patients. There is, however, significant evidence that CDSS can positively impact healthcare providers' performance with preventive care reminder systems and drug prescription systems as most clear examples. Exceptions are anticoagulant prescription systems for which the findings thus far are inconclusive. The studies of diagnostic CDSS are likewise less positive.
An explanation for the findings related to diagnostic CDSS can be found in evidence that suggests that clinicians, based on their clinical experience, are better able to rule out alternative diagnoses than diagnostic CDSS. This may lower the impact of these CDSS in clinical practice. Moreover, the level of specificity of diagnostic advice varies considerably among diagnostic CDSS.36 The specificity level of computer-generated advice is also known to highly influence the chance that physicians adhere to the advice, with low specificity resulting in computer-advice fatigue and in situations where physicians ignore the advice.12 Second, the diagnostic reasoning models of these CDSS often require input of a large number of patient data (demographic data, data on complaints, symptoms, previous history, physical examination, laboratory, and other tests) to deliver the decision support. As long as these data are not electronically available, for example in an electronic patient record (EMR), clinicians are to enter all these data items manually. The burden of data entry may make them give up and not use the CDSS: as a consequence, they may perform no better than unaided clinicians. There is evidence that arduous data-entry facilities adversely affect clinicians' satisfaction with CDSS that makes them abandon the CDSS.12 When data entry is incomplete, the diagnoses generated by CDSS will be less accurate, and this also may reduce their impact in practice. Part of the data that diagnostic CDSS need becomes available during the clinical process. This prolongs the time that these CDSS can deliver their advice. Advice that shows up too late in the work flow of CDSS users increases the likelihood that they over-ride it. There is indeed evidence that the impact of diagnostic CDSS is lowered if the time that their output becomes available mismatches clinicians' workflow and does not reach them in time.36 44
Unlike diagnostic CDSS, most preventive care systems and CDSS drug-prescribing systems require a limited number of patient data items for input to the decision-support facility. Preventive care reminder systems are for the most applied in routine tasks (blood-pressure tests, Pap smears, vaccinations), prompt doctors to call patients in for a procedure, or alert them that a procedure is due when the patient is at the physician's office. Most CDSS for drug prescription warn clinicians when there is a drug interaction or an allergy listed in the patient's data file, or when they have ordered an unusual dose or frequency of a certain drug. Only few drug prescription CDSSs can also perform drug-disease and/or drug–lab interaction checking and have the advanced feature of patient-specific dose calculation. So, the majority of these CDSS require patient data that are largely available before the advice is (to be) generated: at the time and place clinicians make the decisions. It has been shown that computer advice improves providers' right drug choices and reduces the likelihood of adverse drug events when it is delivered at the time when it is most needed.33 45 So, usage patterns of diagnostic CDSS seem to depend on their complexity, the number of additional data items to collect, the ease of data entry, and the extra time needed to work with the CDSS.36 Several studies on diagnostic aids indeed suggest that CDSS was inefficient because it required more time and effort from the user compared to the paper-based situation.31 36 In contrast, preventive care and drug-prescribing CDSS require minimal to no extra data input from the user except the data they already produced in the context of a patient visit or ordering task. As a result, these types of CDSS minimize the interruption of the user's workflow.
Anticoagulant prescribing CDSS share certain features with diagnostic CDSS and are therefore more complex than other drug-prescribing CDSS. Anticoagulant therapy is a course of drug therapy that a clinician must supervise carefully because it carries a number of risks. For example, many drugs can interact dangerously with anticoagulants, and the patient needs to be monitored continuously for complications. Anticoagulation therapy guidelines thus vary by patient and situation, and the clinician must take care to confirm that the course of therapy is appropriate.46 Management of this therapy by CDSS therefore requires a full patient history dataset to learn about the patient's lifestyle and to identify any risk factors which could complicate the therapy. These factors likewise complicate the entry of patient data: data which are needed to deliver the computer advice for support of clinicians in their decision-making and in managing the therapy during the course of its administration. All these aspects may explain the low impact of anticoagulant prescribing CDSS in practice.
The efficacy of CDSS can be improved when the specificity and sensitivity levels of their advice increase, the need for manual input of (extra) patient data is minimized, and the computer advice is given at the time the clinicians make decisions. The future impact of CDSS will therefore depend on (1) the progress in the biomedical-informatics research domain related to knowledge discovery and reasoning, and (2) the development of integrated environments with a merging of EMRs and CDSS. Research progress in knowledge discovery and reasoning has been impressive over the past decade with advanced machine learning, data-mining techniques, and temporal reasoning as a few examples. Even more sophisticated knowledge discovery techniques that permit the integration of clinical expertise with machine-learning methods are under way.47 Less advancement has, however, been achieved in the integration and sharing of this knowledge in EMRs. The current lack of commonly accepted terminologies, ontologies, and standards for intelligent interfacing make the electronic exchange and interoperation of healthcare data and knowledge hard to achieve. To fully realize the potential of EMRs, future challenges are in defining and reaching agreement on these communication and data-sharing standards, and in realizing complete datasets that are coded according to agreed upon terminological systems.47 Future CDSS should be integrated with these EMRS and provide their decision support at the right time with minimal interruption of clinicians' work flow. Physicians seem to perform better in circumstances where CDSS automatically prompt them than when they have to initiate the interaction themselves.45 This suggests that these systems should work in the background and continuously monitor and check whether the care (to be) delivered to individual patients is in accordance with applicable guidelines. The CDSS should then only deliver its advice in situations where clinicians do not follow these guideline recommendations or when unforeseen patient outcomes occur.
Advantages and limitations of selection process
This synthesis of high-quality SRs on CDSS has several strengths and weaknesses. First, the literature search was thorough: we did not limit our search to a certain time period, and we screened 849 SRs for relevance. Second, we critically appraised the quality of the preincluded SRs based on the standardized AMSTAR 11-item measurement tool. Third, we used two independent reviewers for preselection of SRs based on predefined inclusion criteria, for the assessment of SRs' quality and for the final data extraction. Fourth, we manually searched the reference lists of the selected SRs to identify SRs that we could have missed in our literature search.
One limitation of the study is that we excluded SRs with an AMSTAR score below 9.
One could argue that not all item scores of the AMSTAR measurement tool should have an equal weight in critical appraisal of the SRs. Furthermore, in certain clinical domains, the quality of candidate studies may be systematically poorer than in other domains. As a consequence, SRs of CDSS studies in these domains would never achieve a score of 9 or above while these same studies may be the benchmark by which clinical guidelines are set.
Another limitation of the study is that 14 of the 17 SRs with an AMSTAR score of 9 or higher did not assess the likelihood of publication bias. As explained earlier, publication bias against studies that failed to show an effect which were not included in the SRs limits the results of this synthesis of evidence on CDSS that impact practitioner performance and patient outcomes.
On the one hand, we defined effect in 50% or more of the RCTs as strong evidence, in 40–50% of the RCTs as limited evidence, and in less than 40% of the RCTs or non-randomized studies as insufficient evidence respectively. These ranges, along with the strict inclusion criteria of this synthesis may have underestimated CDSS success rates and their impact on practitioners' performance and patient outcomes. On the other hand, there was a large overlap in studies included in the SRs which may have led to an overestimation of CDSS impact rates. We therefore analyzed the number of unique studies from the total number included in all SRs and provided overall estimates of the evidence that CDSS significantly impacted practitioner performance and patient outcomes. Finally, the conclusions from this synthesis are probably limited in so far as the SRs included in this synthesis described CDSS, some of which were developed more than a decade ago. As discussed, CDSS are constantly evolving; newer generations of CDSS probably have greater capability and usability, and will therefore have a different impact on practitioners' performance and patient outcomes.
We thank A Leenders, clinical librarian, for the development of the search strategy.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed