Standards for reporting randomized controlled trials in medical informatics: a systematic review of CONSORT adherence in RCTs on clinical decision support
- K M Augestad1,2,
- G Berntsen1,
- K Lassen2,4,
- J G Bellika1,3,
- R Wootton1,
- R O Lindsetmo2,4,
- Study Group of Research Quality in Medical Informatics and Decision Support (SQUID)
- 1Department of Telemedicine and Integrated Care, University Hospital North Norway, Tromsø, Norway
- 2Department of Gastrointestinal Surgery, University Hospital North Norway, Tromsø, Norway
- 3Department of Computer Science, University of Tromsø, Tromsø, Norway
- 4Institute of Clinical Medicine, University of Tromsø, Tromsø, Norway
- Correspondence to Dr Knut Magne Augestad, Department of Telemedicine and Integrated Care, University Hospital North Norway, 9037 Breivika, Tromsø, Norway;
Contributors Study concept and design: KMA and GB. Acquisition of data: GB and KMA. Analysis and interpretation of data: KMA, GB, RW, and KL. Drafting of the manuscript: KMA. Critical revision of the manuscript for important intellectual content: KMA, GB, KL, JGB, RW, and ROL. Statistical analysis: KMA and GB. Administrative, technical, and material support: KMA and GB. Study supervision: KMA, GB, JGB, and RW. KMA had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
- Received 31 May 2011
- Accepted 29 June 2011
- Published Online First 29 July 2011
Introduction The Consolidated Standards for Reporting Trials (CONSORT) were published to standardize reporting and improve the quality of clinical trials. The objective of this study is to assess CONSORT adherence in randomized clinical trials (RCT) of disease specific clinical decision support (CDS).
Methods A systematic search was conducted of the Medline, EMBASE, and Cochrane databases. RCTs on CDS were assessed against CONSORT guidelines and the Jadad score.
Result 32 of 3784 papers identified in the primary search were included in the final review. 181 702 patients and 7315 physicians participated in the selected trials. Most trials were performed in primary care (22), including 897 general practitioner offices. RCTs assessing CDS for asthma (4), diabetes (4), and hyperlipidemia (3) were the most common. Thirteen CDS systems (40%) were implemented in electronic medical records, and 14 (43%) provided automatic alerts. CONSORT and Jadad scores were generally low; the mean CONSORT score was 30.75 (95% CI 27.0 to 34.5), median score 32, range 21–38. Fourteen trials (43%) did not clearly define the study objective, and 11 studies (34%) did not include a sample size calculation. Outcome measures were adequately identified and defined in 23 (71%) trials; adverse events or side effects were not reported in 20 trials (62%). Thirteen trials (40%) were of superior quality according to the Jadad score (≥3 points). Six trials (18%) reported on long-term implementation of CDS.
Conclusion The overall quality of reporting RCTs was low. There is a need to develop standards for reporting RCTs in medical informatics.
- Clinical decision support
- medical Informatics
- disease surveillance
- clinical decision support systems
Randomized controlled trials (RCTs) are considered the gold standard for investigating the results of clinical research because they inherently correct for unknown confounders and minimize investigator bias.1–3 The results of these trials can have profound and immediate effects on patient care. When RCTs are reported, it is recommended that the Consolidated Standards of Reporting Trials (CONSORT)4 are followed. CONSORT was first published in 1996 and has been revised several times since.5 The CONSORT statement is widely supported and has been translated into several languages to facilitate awareness and dissemination. An extension of the CONSORT statement was published in 2008, focusing on randomized trials in non-pharmacologic treatment.6 CONSORT consists of a checklist of information to include when reporting on an RCT; however, inadequate reporting remains common among clinicians.6–12 Higher quality reports are likely to improve RCT interpretation, minimize biased conclusions, and facilitate decision making in light of treatment effectiveness.1 Furthermore, there is evidence that studies of lower methodological quality tend to report larger treatment effects than high quality studies.13–15
Research on clinical decision support (CDS) tools has rapidly evolved in the last decade. CDS provides clinicians with patient specific assessment or guidelines to aid clinical decision making16 and improve quality of care and patient outcome.17 18 CDS has been shown to improve prescribing practices,19 reduce serious medication errors,20 21 enhance delivery of preventive care services,22 and improve guidelines adherence,23 and likely results in lasting improvements in clinical practice.24 However, clinical research on CDS tools faces various methodological problems25–28 and is challenging to implement in the field of health informatics.29 Guidelines for reporting studies in health informatics have been published,26 but there is no universal consensus.
Numerous RCTs examining (disease specific) CDS tools aimed at improving patient treatment have been performed. It is unclear whether these studies provided CONSORT statements when the trials were reported. Although several studies have evaluated the quality of RCTs in medical journals,3 7 8 to date none have been directed at medical informatics literature published in dedicated journals. The objective of this paper is to perform a systematic review of RCTs to assess the quality of clinical CDS research focusing on disease specific interventions. We aimed to score the identified RCTs according to the CONSORT6 checklist and Jadad score.3 Finally, we discuss the implications of these results in the context of evidence-based medicine.
Materials and methods
The review followed the PRISMA statements (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)30 and was divided into two work phases: (a) identification of RCT trials assessing disease specific CDS and (b) data extraction and assessment of RCT quality.
The Study Group of Research Quality in Medical Informatics and Decision Support (SQUID) is a multidisciplinary study group. Members have expertise in hospital medicine (KL, ROL, KMA), RCTs in medicine (surgery) (KL),31 RCTs in telemedicine (RW),32 trials of medical informatics (JGB, KMA),33–37 and epidemiological research (GB).38–41 The group's objective is to assess and improve the quality of clinical informatics research with special focus on randomized controlled trails aimed at enhancing physician performance.
We defined CDS as ‘any electronic or non-electronic system designed to aid directly in clinical decision making, in which characteristics of individual patients are used to generate patient specific assessments or recommendations that are subsequently presented to clinicians for consideration.’42 We defined disease specific CDS as ‘a clinical decision support aimed at a specific disease, describing symptoms, diagnosis, treatment, and follow-up.’
This systematic review is based on a PubMed, EMBASE, and Cochrane Controlled Trials Register search using EndNote X3 (EndNote, San Francisco, California, USA) for relevant publications published through November 2010. We piloted search strategies and modified them to ensure they identified known eligible articles. We combined keywords and/or subject headings to identify CDS (clinical decision support system, computer-assisted decision making, computer-assisted diagnosis, hospital information systems) in the area of RCTs (ie, randomized controlled trial). We searched publications accessible from the web pages of the International Journal of Medical Informatics, Journal of the American Medical Informatics Association, and BMC Medical Informatics and Decision Making. We systematically searched the reference lists of included studies. Reviews addressing CDS were investigated and papers fulfilling the inclusion criteria were included.17 42–44 The searches were individually tailored for each database or journal. Experienced clinicians reviewed all search hits and decided whether a CDS was aimed at a specific disease and fulfilled inclusion criteria. The titles, index terms, and abstracts of the identified references were studied and each paper was rated as ‘potentially relevant’ or ‘not relevant.’ Disagreements regarding inclusion were resolved by discussion. Only trials performed the last 10 years were included.
Inclusion criteria were:
Randomized controlled trial
CDS describing specific diseases and treatment guidelines
CDS aimed at physicians.
Exclusion criteria were:
Papers published before the year 2000
Not published in English
Proceedings, symposium, and protocol papers.
Assessing RCT quality
Scoring according to CONSORT
A checklist of 22 items from the revised 2001 CONSORT guidelines was analyzed.4–6 The score for each item ranged from 0 to 2 (0=no description, 1=inadequate description, 2=adequate description). The maximum score a paper could obtain was 44 points.
Each article was then assessed for every item on the checklist and scored independently by two observers (KMA and GB). The scores for the 22 items were added together and a percentage score for each trial was calculated.
Scoring according to Jadad
The Jadad scale is a 5-point scale for evaluating the quality of randomized trials in which three points or more indicates superior quality.3 The Jadad scale is commonly used to evaluate RCT quality.7 8 The scale contains two questions each for randomization and masking, and one question evaluating reporting of withdrawals and dropouts.
Scoring according to the sequential phases of a complex intervention
An RCT evaluating a CDS tool is defined as a complex intervention, that is an intervention consisting of various interconnecting parts.29 77–79 Cambell et al77 suggested four sequential phases for developing RCTs for complex interventions: theory, modeling, exploratory trial, definitive randomized controlled trial, and long-term implementation. Included trials were scored according to these sequential phases, that is one point was given for each phase.
Scoring according to CDS features critical for success
Kawamoto et al identified certain CDS factors associated with clinical improvement.42 These factors are: automatic provision of CDS, CDS at the time and location of decision making, provision of a recommendation rather than just an assessment, computer based assessment, and automatic provision of decision as part of clinician workflow. The identified CDS tools were scored according to these factors, giving one point for each feature.
All appraised papers were discussed by the two reviewers and, if necessary, by a third independent reviewer to verify the appraisal process and resolve disagreement; when consensus could not be reached, the third reviewer assessed the items and provided the tiebreaker score.
Trial characteristics and CONSORT adherence were analyzed and interpreted with the trial as unit of analysis. Descriptive statistics were analyzed using percentages, standard deviation, confidence intervals, 2×2 contingency tables, χ2 test, and Fisher's exact test when appropriate. We used proportions for categorical variables and mean for continuous variables. For reasons of comparison, trials were divided into groups according to whether or not their outcome was positive. A positive outcome was defined as either a primary or secondary outcome with p<0.05. All tests were two-sided and a probability (p) value of <0.05 was considered statistically significant. Microsoft Excel and SPSS PASW Statistics v 18.0 were used for the statistical analyses.
Of 3784 potentially relevant articles screened, 32 papers met all our inclusion criteria (table 1).
Fourteen (43%) of the trials were performed in the US, seven (21%) in the Netherlands, and four (12%) in the UK. Four of the trials were published in medical informatics journals, and the rest in medical journals. The trials included 181 702 patients and 7315 physicians. The majority (22 trials) were performed in primary care, including 897 general practitioner (GP) offices. Of the 11 trials performed at hospital level, two were performed in an outpatient department, three in internal medicine departments, one in a surgical department, one in an intensive care unit, two in emergency departments, one in a trauma unit, and one in various different departments. Asthma (n=4), diabetes (n=4), and hyperlipidemia (n=3) were the most common diseases addressed (table 1).
General trial features
Twenty-six trials (81%) did not provide an RCT registration number (ie, http://Clinicaltrials.gov and others), while only seven trials (21%) offered web access to the full trial protocol. One trial did not state funding sources (table 2). In nine trials (28%), more than half of the authors were medical doctors; in 10 trials, information on the background and education of the author(s) was not provided. Twenty-two (68%) trials chose a cluster-randomized design, which was the most common design among trials in primary care (21 of 22). Of the nine trials performed in a hospital setting, four had a cluster-randomized design and in these cases the department was chosen as the clustering unit. Two trials provided information on changes to the trial protocol, and one trial addressed CONSORT guidelines.
Less than half of the CDS tools were implemented in an electronic medical record, and 14 (43%) of the CDS tools provided automatic alerts (table 2). Twenty-four (75%) of the developed CDS tools provided decision support at the time and location of the decision need. Eighteen (56%) of the CDS tools did not disrupt the natural workflow of the physician. None of these CDS features had a significant influence upon the primary endpoint or overall conclusions.
Addressing sequential phases of a complex intervention
None of the trials defined the intervention as complex or discussed the definition of a complex intervention.77 78 80 Four trials defined all phases of a complex intervention and these phases were described in detail (table 2).
Trials reporting on long-term CDS implementation
Six trials reported on the long-term implementation of the CDS tool used in the RCT (table 1).
Four of these trials addressed all phases of a complex intervention and had a statistically higher CONSORT score compared to trials not reporting long-term implementation (OR 1.64, p=0.04). Three of these trials were performed at a hospital level, with the largest trial including 87 000 patients.
Inter-rater reliability and CONSORT score
The intraclass correlation coefficient used to establish inter-rater reliability was 0.69 for the 22-item CONSORT scale. The mean CONSORT score was 30.75 (95% CI 27.0 to 34.5), median score 32, range 21–38.
CONSORT: title, abstract, and background
Five trials did not identify a randomized design in their title. All trials had a structured abstract and gave a solid background and rationale for the trial (table 3).
CONSORT: materials and methods
One trial addressed the CONSORT guidelines in their Material and Methods section. Twenty-seven trials (84%) clearly defined their participants, eligibility, and ethics approval. Fourteen trials (43%) did not clearly define the study objective or hypothesis. Twenty-three trials (72%) had an adequate definition of outcome measures. Fourteen studies (37%) did not perform or had an inadequate sample size calculation (table 3).
Most trials described mechanisms to generate random allocation (59%) and the method of implementing the random sequence (47%). In contrast, only five trials (15%) gave adequate information regarding blinding (whether or not blinding was necessary and if necessary, how it was performed) (table 3).
Most trials (87%) provided a detailed description of statistical methods (table 3). Five trials had no figure showing participant flow and four trials did not include a table showing demographics. Nine trials did not address exclusions during the trial, and 10 trials did not define the date of trial initiation and termination. Only two trials performed an interim analysis, and only one trial addressed the ‘harms or unintended effects’ of the intervention.
The interpretation of results was justified in 28 trials (87%). Four trials did not discuss limitations and six trials did not address generalizability or provide recommendations for the future (table 3).
Thirteen trials (40%) were classified as superior quality trials (≥3 points). Nineteen (59%) described the study as randomized, and the sequence of randomization was explained and was appropriate. Twenty-seven (85%) did not describe blinding. Ten (32%) did not describe dropouts (table 4).
Summary of findings
This is the first review assessing the quality of RCTs of disease specific CDS as a primary intervention. We have analyzed their outcome, CONSORT adherence and Jadad score. Methodologically, research quality varies and adherence to CONSORT guidelines is low for certain checklist items. Thirteen trials (40%) were classified as superior quality trials according to their Jadad score (≥3 points). According to our analysis, there is considerable room for improving methodology in areas such as the description of specific research objectives, randomization methods, sample size calculations, reporting of adverse events, and a general focus on CONSORT. Similarly, the Jadad score was low on several checklist items. Surprisingly few studies defined their CDS intervention as a complex intervention; only four studies described all phases of a complex intervention including long-term implementation.
Research challenges of complex interventions
A complex intervention was defined by Cambell et al77 81 as an intervention that is ‘built up from a number of components, which may act both independently and interdependently.’ Similarly, Campbell defined an intervention with a decision support system as a complex intervention.77 In 2000, the Medical Research Council in the UK proposed a framework for the development and evaluation of RCTs for complex interventions (theory, modeling, exploratory trial, definitive RCT, long-term implementation),77 which was further improved in 2007.81 The methodological challenges of complex interventions have been thoroughly discussed in the field of medical informatics,25 29 as well in the area of health service research.79 82–85 There have been arguments against over-standardization of complex interventions. Complex and large health organizations are characterized by flux, contextual variation, and adaptive learning rather than stability, and a standardized approach will not fit such organizations.86 However, our review shows that most trials do not address the term ‘complex intervention’ and as many as 23 trials (71%) did not perform an exploratory trial before the definitive RCT. This problem is well discussed by Friedman, who introduces the ‘tower of achievements.’87 According to Friedman, integration across research phases is of utmost importance to success in the field.
Quality of RCTs in medical informatics versus clinical trials
Our survey shows generally low CONSORT adherence and only 13 trials were defined as superior quality trials according to their Jadad score. However, the research quality of RCTs has been of varying quality in medical research as well. In a review from 20068 assessing 69 RCTs of surgery, only 37% of trials were classified as of superior quality. CONSORT scores were generally low but significantly higher in trials with higher author numbers, multi-centre trials, and trials with a declared funding source.8 It has been concluded that there is a need to improve awareness of the CONSORT statement among authors, reviewers, and editors.8 Similar concerns were recently reported in several medical journals, which concluded that there was low adherence to key methodological items.88–90 These conclusions from the medical literature are in accordance with our review findings.
Strength and limitations of our study
This study has several important strengths. First, our literature search was thorough and we screened more than 3700 articles. Second, this is the first review to evaluate the general trial quality and CONSORT adherence of RCTs evaluating CDS tools as a clinical intervention. Research on CDS tools is methodologically challenging.28 Thus, a focus on research methods in medical informatics is important, and adherence to CONSORT has never been assessed. Third, we are currently recruiting patients into an RCT addressing the use of disease specific CDS tools37 and thus have experienced the inherent methodological challenges. In addition to technological problems, these trials also face the challenges of a complex intervention. These research questions have been addressed in this review.
One limitation of our study might be that only RCTs assessing CDS systems aimed at physicians were included. However, when planning this review, the research group wanted to identify CDS trials to improve patient treatment as these trials should ideally adhere to research conventions in general medical society. In this context the research group felt it natural to exclude CDS not addressing physicians.
Another limitation might be the reporting of the various phases in a complex intervention. Our review shows that only six trials (18%) report on long-term implementation. However, all studies were RCTs and thus were in the stage prior to implementation. It may be that implementation did occur after the RCT was published but was not part of the publication. It might also be that some providers implemented their long-term intervention, but as the RCT did not support this, they were reluctant to report on it. Similarly, it is possible that theoretical and preliminary work might have been carried out but was not fully described in an RCT paper.
Finally, it is unclear whether or not ‘complex intervention’ is a term widely accepted in medical informatics circles. We identified the term ‘complex intervention’ in one JAMIA article from 2008, with the other mentions of this concept all being in BMJ. Since JAMIA readership is largely within the US, it is unclear whether it is mandatory for CDS and their evaluation to be declared as complex interventions and thus follow the required phases.
Challenges of RCTs in medical informatics
Recently, Liu28 discussed the pros and cons of RCTs in medical informatics. We agree with their view that RCTs are not the only method for evaluation. Medical informatics interventions are usually performed in a complex organizational environment. In this context, there is a need for different research methods, and often a mixture of qualitative and quantitative methods, depending on the research subject. However, when an RCT is deemed the proper design, standards of reporting must be followed. In addition, RCTs in medical informatics face several methodological challenges, some of which have been clarified in this review.
Choice of outcome measures
In principal, outcomes can either be patient orientated, process orientated, or system orientated. The choice of outcome measures should be clearly related to the research question. Our review shows a large mixture of primary outcomes, which makes meta-analyses of effects impossible. Thus, a clear conclusion regarding the effects of CDS (in the form of a meta-analyses) cannot be reached.
Sample size calculations
The planning of an RCT should begin with sample size calculation. This assessment is closely related to the choice of primary outcome, as different primary outcomes can result in different sample size estimates. The sample estimate is crucial to determine the resources and time needed to conduct a properly designed RCT with enough power to reject or accept the null hypothesis. Kiehan et al7 address concerns about the poor standards of reporting sample size calculations. They conclude that many of these trials are flawed from the start due to inadequate power to assess any real difference between interventions. In this review, approximately 50% of the trials had an inadequate estimate of sample size, a surprisingly low number.
Should randomization be performed at an individual or an organizational level? In this review, 68% preferred a clustered design, clustered at the level of hospitals, departments, or GP offices. There are obvious advantages to a cluster design in complex health organizations, as problems of blinding and random sequence implementation will be avoided. In addition, clustering randomization is usually less demanding of resources, as randomization can be performed before the actual trial period with fewer personnel involved.
The research methodology in the identified trials is of low quality, suggesting a need for increased focus on the methods of conducting and reporting RCT trials. Study designs that adhere to CONSORT are not always appropriate in medical informatics research.26 However, RCTs evaluating CDS tools in a clinical setting should adjust to the accepted consensus. Thus, CONSORT guidelines for conducting RCT trials should be addressed and subsequently implemented in the trial. CONSORT guidelines for non-pharmacological treatment6 provide a solid basis for reporting RCTs evaluating CDS systems, but an adjustment for medical informatics is needed. The societies for medical informatics should aim for a consensus statement to improve the quality of reporting RCTs, trials of informatics applications, and CDS.
Funding The Norwegian Health Authorities Research Fund supported this survey (grant number 40614).
Competing interests None.
Ethics approval No patients were involved in the survey and thus approval by the hospital IRB board was not required.
Provenance and Peer review Not commissioned; externally peer reviewed.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.