Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters
- 1Health Information Research Unit, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada
- 2Department of Clinical Epidemiology and Biostatistics, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada
- 3Department of Medicine, McMaster University, Faculty of Health Sciences, Hamilton, Ontario, Canada
- Correspondence to Dr Cynthia Lokker, Health Information Research Unit, Department of Clinical Epidemiology and Biostatistics, McMaster University, CRL 125, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada;
- Received 8 March 2011
- Accepted 24 May 2011
- Published Online First 15 June 2011
Objective Clinical Queries filters were developed to improve the retrieval of high-quality studies in searches on clinical matters. The study objective was to determine the yield of relevant citations and physician satisfaction while searching for diagnostic and treatment studies using the Clinical Queries page of PubMed compared with searching PubMed without these filters.
Materials and methods Forty practicing physicians, presented with standardized treatment and diagnosis questions and one question of their choosing, entered search terms which were processed in a random, blinded fashion through PubMed alone and PubMed Clinical Queries. Participants rated search retrievals for applicability to the question at hand and satisfaction.
Results For treatment, the primary outcome of retrieval of relevant articles was not significantly different between the groups, but a higher proportion of articles from the Clinical Queries searches met methodologic criteria (p=0.049), and more articles were published in core internal medicine journals (p=0.056). For diagnosis, the filtered results returned more relevant articles (p=0.031) and fewer irrelevant articles (overall retrieval less, p=0.023); participants needed to screen fewer articles before arriving at the first relevant citation (p<0.05). Relevance was also influenced by content terms used by participants in searching. Participants varied greatly in their search performance.
Discussion Clinical Queries filtered searches returned more high-quality studies, though the retrieval of relevant articles was only statistically different between the groups for diagnosis questions.
Conclusion Retrieving clinically important research studies from Medline is a challenging task for physicians. Methodological search filters can improve search retrieval.
- Health information science
- knowledge translation
- information storage and retrieval
- PubMed, search engine
- databases as topic
- medical informatic
- evidence-based medicine
- information retrieval
- informatics education
- library science
Background and significance
A systematic review reports that primary-care physicians generate an average of between 0.07 and 1.85 questions per consultation in the course of daily practice.1 On average, only 30%1 2 to 55%3 of these patient-care questions are pursued further. When they are, colleagues and textbooks are primarily used as information sources,1 but electronic resources are also employed.4–7
In addressing their clinical questions, physicians need to be able to define the question clearly; search resources effectively; identify and appraise retrieved information; and then apply the evidence appropriately.8 9 The most salient hurdle to answering clinical questions is limited time.2 3 8 10 Other reported barriers include poor searching skills,8 11 limited resource availability and accessibility,12 and inadequate critical appraisal skills.13 Additional challenges include perceptions that the information does not exist,3 8 that resources do not address the specific issue or do not adequately synthesize the information into a useful statement,3 and that searches retrieve too much irrelevant material.1
Several studies have assessed the impact of clinical information retrieval systems on answering clinical questions6 10 11 14 15 and changes in patient care.5 14 These studies show that relevant answers to 46% of clinicians' questions were found when searched by medical librarians, mostly through the use of Medline.14 Medical and nurse practitioner students improved their ability to obtain relevant answers to simulated clinical questions from 45% to 77% following searches of Medline.11 Physicians with access to a virtual library including Medline, textbooks, and clinical guidelines improved their ability to answer clinical questions correctly from 29% (95% CI 25.4 to 32.6) correct before system use to 50% (95% CI 46.0% to 54.0%) after system use.6 Pluye et al16 reported in a literature review of clinical information retrieval studies that observational studies indicate about 30% of searches may have a positive impact on physicians. Hoogendam et al15 found that searches in UpToDate resulted in more full or partial answers than those in PubMed (83% vs 63%, p<0.001).
Clearly clinical information retrieval systems have a role to play, but outcomes are not consistently positive. In a study where physicians were encouraged to use their own best resources to answer simulated questions, McKibbon and Fridsma (2006)17 found that 11% of answers went from correct prior to searching to incorrect following searching. This was balanced by 13% of answers that were incorrect prior to searching that were answered correctly after searching (overall correct rate increased from 39.1% to 41.3%). The rate of correct answers becoming incorrect after searches was similar to that found by Hersh and colleagues in two separate studies when Medline was used as a searching tool: 4.5%11 and 10.5%.11 18 Koonce et al19 found that 35% of the time, clinical information retrieval from evidence-based resources did not provide answers, indicating that primary literature is still an important resource.
PubMed is one of the most accessible primary research sources, allowing free searches and access to abstracts for material indexed in Medline. One set of tools available for physicians searching PubMed is the Clinical Queries search filters (http://www.ncbi.nlm.nih.gov/pubmed/clinical) for therapy, diagnosis, etiology, prognosis, and clinical prediction guides. To date, the search filters available through the Clinical Queries interface of PubMed have not been formally tested with the intended group of clinician users. This exploratory study set out to determine the yield of relevant citations and physician satisfaction while searching using Clinical Queries in PubMed compared with searching PubMed without these filters.
The research questions were:
When practicing general internists conduct searches for each of three questions (therapy, diagnosis, and their own question), what is the yield of relevant and methodologically sound citations, comparing the yield from the main PubMed search screen with the appropriate specific search filter on Clinical Queries?
Are clinicians more satisfied with the studies retrieved from Medline when searching via PubMed Clinical Queries than when using the main search in PubMed without the clinical filters?
What are the sensitivity and specificity of the methodologic components (if used) of the clinician's own search terms? How do the operating characteristics of these searches compare with those stored in the Clinical Queries interface of PubMed?
What are the effects of limiting searches to a core journal subset for internal medicine, compared with the full PubMed journal database, on the yield of clinically relevant citations and clinician satisfaction?
Materials and methods
Physician recruitment and searches
Practicing general internists, registered with the primary discipline of ‘Internal Medicine’ in the McMaster Online Rating of Evidence system (http://plus.mcmaster.ca/more) were recruited between March 2008 and March 2009. One hundred and sixty McMaster Online Rating of Evidence raters were invited to participate; 62 accepted the invitation (38.8%), 40 of whom completed the study (64.5%). Participants were offered a $100 Canadian honorarium for their participation. All participants consented to taking part in the study which was approved by the McMaster University Research Ethics Board.
Standardized patient care questions, nine concerning treatments and four concerning diagnosis (box 1), were devised based on highly rated primary treatment and diagnosis articles in the field of internal medicine in bmjupdates+ (now EvidenceUpdates: http://plus.mcmaster.ca/EvidenceUpdates/). Each participating physician was randomly assigned by computer to one treatment and one diagnosis question. They were then asked to devise a third treatment or diagnosis clinical question of their own.
Standardized treatment and diagnosis questions presented to participants [frequency of being searched]
Are antiseptics effective for reducing catheter-associated bloodstream infections in medical ICU patients?
Can an ACE-inhibitor improve physical function in elderly people with functional impairment?
What is the current best treatment for agitation in Alzheimer's disease?
Is low-dose, self-administered anticoagulation safe for patients with mechanical heart valve prostheses?
Does self-monitoring of blood glucose improve glycemic control in patients with type 2 diabetes not on insulin?
Do probiotic Lactobacillus preparations prevent antibiotic-associated diarrhea?
Does N-terminal pro-B-type natriuretic testing improve the management of patients with congestive heart failure?
Does ultrasound screening for abdominal aortic aneurysm reduce mortality for elderly men?
Which of the Atkins, Zone, Ornish, and LEARN diets leads to greater weight loss over a year among premenopausal overweight women?
Which is the preferred diagnostic procedure for patients with suspected acute stroke: MRI or CT?
How sensitive is the microscopic-observation drug-susceptibility assay for the diagnosis of pulmonary TB?
How accurate is multislice CT for the evaluation of coronary artery disease?
What is the most sensitive non-invasive test for diagnosing acute pulmonary embolism?
After completing a brief survey of searching habits (online appendix table A1), participants conducted their searches as follows. Physicians were presented with each question and were asked to indicate which information sources they would usually prefer to use to answer the question. They then entered their search terms which were submitted via a secure online interface which allowed blinding to which information source was searched. Participants were instructed to search ‘as they would if they were using a database like PubMed’ but were unaware of where their searches were submitted.
For the Clinical Queries search, participants' terms that dealt with methods (eg, randomized control trial), if any, were replaced by the most ‘specific’ Clinical Queries search filter for treatment20 or diagnosis,21 depending on the question being addressed. That is, for treatment questions, the methods terms were replaced with (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract])).20 For diagnosis questions, the methods terms were replaced with (specificity[Title/Abstract]).21 The physician was then presented with a maximum of the first 20 citations retrieved with the two searches, PubMed or Clinical Queries, in random order. If no citations were retrieved by the search terms, the physician was asked to revise their search terms; if once again no citations were retrieved, the physician was presented with a new randomly selected question.
The yield of relevant citations was determined as the number of articles selected by the physician as relevant to answering the clinical question from the first 20 retrieved (if <20 were retrieved, all citations were shown) by each search. Retrievals were limited to the 20 most recent articles as would appear on the first page of a PubMed search to limit the time taken to complete the study and to mimic the time a clinician might spend on a search. Participants reviewed the article titles and had access to the abstracts in a new window by selecting the hyperlinked titles. Full-text retrieval was not available to participants during the study.
Participants were then asked to indicate their satisfaction with the retrieval from each search by responding to the following statement: ‘Overall, I am satisfied with the results of my search,’ using the following 7-point scale: strongly agree, moderately agree, mildly agree, neither agree or disagree, mildly disagree, moderately disagree, strongly disagree.
For a specified 80% power to detect a difference of at least two relevant citations between the retrievals via PubMed Clinical Queries versus PubMed main screen at an α level of 0.05 with a SD of 2.0, and an approximately normal distribution for the number of citations, 40 participants were required.
The primary outcome was the number of articles retrieved by the search judged to be relevant by the participants (search retrievals were capped at 20). Secondary measures included satisfaction, the proportion of relevant articles selected, the proportion selected that met methodological criteria, the placement order of the first relevant article, and performance characteristics of methodological terms submitted by participants.
For each variable individually, a participant score was calculated as the difference between the Clinical Queries value and the PubMed value (CQ–PM). The variables were: number of relevant articles selected, satisfaction, the proportion of relevant articles selected by participants, and the proportion of these that passed methods criteria, the number of articles retrieved and the placement order of the first relevant article. Using paired differences in this way eliminates the effects of characteristics of participants; these differences were then used as dependent variables in the analyses.
Because the standardized questions were presented to more than one participant, treatment and diagnosis search outcomes were first analyzed with an analysis of variance (ANOVA) using ‘question’ as a nominal independent variable to determine if an effect occurred that was related to the ‘question’ factor. The results indicated that the question significantly influenced the number of articles selected as relevant for treatment and diagnosis searches and for the number of diagnosis articles retrieved (online appendix table A2). For these two variables, the standardized and participant questions were analyzed separately:
the standardized questions were analyzed with the ANOVA including question as an independent variable;
the participant questions were analyzed with a paired t test on the raw participant scores, as these questions were unique.
For all other outcomes, the results from the standardized questions were pooled with those posed by the participants. Treatment and diagnosis questions were analyzed separately using linear regression with question origin (standardized vs participant) as an independent variable. A significant t value for the intercept indicated whether there was any difference between the Clinical Queries and PubMed scores. The number of articles retrieved per search varied from 1 to 20, leading to differences in the error variations of the observations. To take this effect into account, the number of relevant articles, the proportion of relevant articles, and the proportion meeting methods criteria were weighted by (the number articles retrieved for Clinical Queries+the number articles retrieved for PubMed)/2.
The order in the retrieved list of the first relevant article and the difference between the orders for Clinical Queries and PubMed were highly skewed. The difference variable was recoded as a binary variable with categories (1) for values below zero (PubMed order was higher than Clinical Queries) and (2) values equal to or above zero (PubMed order the same or lower than the Clinical Queries order). A logistic regression was performed, first assessing the effect of repeated standardized questions, then pooled for standard and participant questions.
Methodological rigor of articles
Two research associates assessed whether the articles selected by the participants as relevant met methodological criteria. The assessments were independent, and any disagreements were resolved through consensus. The criteria used for treatment and diagnosis articles are presented in box 2.
Methodological assessment used to assess articles selected as relevant
Is the TREATMENT study methodologically sound?
Random allocation of participants to comparison groups
Outcome assessment of at least 80% of those entering the investigation accounted for in one major analysis at any given follow-up assessment
Analysis consistent with study design
Is the DIAGNOSIS study methodologically sound?
Inclusion of a spectrum of participants, some (but not all) of whom have the disorder or derangement of interest
Objective diagnostic (‘gold’) standard OR current clinical standard for diagnosis
Each participant must receive both the new test and some form of the diagnostic standard
Interpretation of diagnostic standard without knowledge of test result
Interpretation of test without knowledge of diagnostic standard result
Analysis consistent with study design
The number of articles retrieved that originated from the top 30 internal medicine journals were also analyzed (online appendix table A3).22 This list is based on a survey of the contents of 170 core clinical journals for the publishing year 2000 which assessed which journals published the highest number of methodologically sound and clinically relevant studies.22 The list includes internal medicine titles that contributed at least one abstracted article to the ACP Journal Club in 2000. Retrieval from this subset of strong clinical journals between PubMed and Clinical Queries was compared.
Search operating characteristics
The sensitivity and specificity of the methodologic components of the clinician-derived searches were tested and compared to the Clinical Query search filter in PubMed. The PubMed translation of each search containing methods terms was recorded, and the performance characteristics of the various terms were tested using the Clinical Hedges Database. The Clinical Hedges Database was constructed by six research assistants in the Health Information Research Unit who hand-searched 161 journals titles that were indexed in Medline in the publishing year 2000. The research assistants categorized all original and review studies found in these journals, for eight purpose categories (treatment/quality improvement, diagnosis, prognosis, etiology, clinical prediction guide, economics, cost, and qualitative) and then applied methodologic criteria to determine if the categories of treatment/quality improvement, diagnosis, prognosis, etiology, clinical prediction guide, and economics were methodologically sound. All-purpose category definitions and corresponding methodologic rigor were outlined in a previous paper.23 Research staff were thoroughly calibrated before reviewing the literature, and the inter-rater agreement for application of all criteria exceeded 0.80 (κ statistic) beyond chance.23
Forty participants completed the study. Participants reported that they searched on average 26.75 times per month (95% CI 6.5 to 50.0). The majority worked at a center that has a fellowship training program (68%). Boolean operators (using AND, OR, and NOT to connect search terms) were the most reported (93%) and used (70%) search option (online appendix table A4). Limits (eg, limiting the scope of searching by language, publication type, date, author, age of participants, type of article), controlled vocabulary (eg, searching using Medical Subject Headings [MeSH] terms in PubMed/Medline) and wild cards (eg, * or $) were reported as used by 90%, 68%, and 38% of the participants respectively, but were used in this study by 30%, 10%, and 8% (online appendix table A4).
When asked to indicate which resources they would normally use to address the questions, UpToDate (57.5%), PubMed (52.5%), and Medline (25%) were most often reported, followed by Cochrane (22.5%), colleagues (20%), Google (17.5%), Harrison's (12.5%), other textbooks (12.5%), websites (12.5%), and guidelines (12.5%). Forty-five other resources were also indicated by one to three participants (online appendix table A5).
For the 120 searches performed, a total of 3762 articles were retrieved, 2720 of which were unique. Searches for the same questions often had some overlap in articles retrieved, but due to the diverse approaches to searching, variation was also seen in retrieved articles.
Retrievals, relevant articles, methodological rigor, and satisfaction
The standardized questions were presented randomly to participants; box 1 outlines their frequency. For treatment questions, no significant differences in satisfaction, number of relevant articles, proportion of relevant articles, or rank of the first relevant article were found (table 1). While most of the adjusted results from the regressions are consistent with the corresponding unadjusted means, there are a few anomalous cases where the two sets of results are in opposite directions. For example, with the data for the number of relevant articles for standardized questions, the y-intercept of the regression is negative, suggesting that the PubMed retrieval method produced more relevant articles than the Clinical Queries method; in contrast, a comparison of the unadjusted means suggests the opposite conclusion. Noting the relatively high standard errors for both the means and for the regression intercept, these discrepancies are probably due to high sampling variation, and therefore not meaningful.
The number of articles needed to read (NNR), determined as the number retrieved divided by the number selected as relevant, for treatment questions was 6.0 (95% CI 4.43 to 7.66) for Clinical Queries and 7.1 (95% CI 5.37 to 8.79) for PubMed (difference 1.03 (95% CI −1.29 to 3.36, NS). Of the articles that were selected as relevant by the participant, proportionally more of the Clinical Queries articles met methods criteria than the PubMed articles (0.439 vs 0.266, y-intercept 0.180, 95% CI 0.001 to 0.359, p=0.049).
For diagnosis questions via Clinical Queries versus PubMed, there were significantly more relevant articles retrieved for the standardized questions, a higher proportion of relevant articles over all questions, a lower number of retrieved articles for participant questions, and earlier presentation of the first relevant article in the retrieved list overall (table 2). The NNR was 5.2 (95% CI 3.70 to 6.66) for Clinical Queries and 5.6 (95% CI 4.10 to 7.06) for PubMed (difference 0.399, 95% CI −1.68 to 2.48, NS). No statistical difference in the methodological quality of articles selected as relevant by the participants was found. Again, anomalous y-intercepts observed in table 2 are probably due to high sampling variation.
Clinical Queries and PubMed returned a similar number of articles published in the 30 Internal Medicine journals subset (tables 1, 2). For treatment questions, the difference approached significance, but the variation in Clinical Queries retrieval was great.
Participants included methods terms in 19 of 62 searches for treatment questions (table 3). When testing the PubMed translations of these terms in the Clinical Hedges database, their sensitivity ranged from 0% to 98.9%, specificity from 60% to 100%, and precision from 2% to 55%. The sensitivity for methods terms used in diagnosis searches ranged from 0% to 96.6%, specificity from 54% to 100%, and precision from 0% to 16%; 40 of 58 diagnostic searches contained methods terms (table 4).
Many clinical information retrieval services have been developed to facilitate searching for the current best evidence for clinical decisions. DiCenso, Bayley, and Haynes24 have described a ‘6S’ hierarchy of evidence-based information services. This follows the evolution of information processing from: (1) original studies (the lowest level); (2) synopses of original studies (evidence-based abstraction journals); (3) syntheses (reviews); (4) synopses of synthesis (eg, DARE, healthevidence.ca); (5) summaries (evidence-based online textbooks); and (6) systems (the highest level—computerized clinical decision support). Being familiar with available resources at the various levels of the hierarchy can expedite searching. Evidence-based resources near the top of the 6S hierarchy have already filtered the primary literature and appraised the research, creating quality evidence in their resources from a large quantity of information.24
Articles in Medline, the lowest level of evidence, are still important in clinical care. Tools such as the Clinical Queries search filters assist by limiting search retrievals to articles of higher quality with the intent that they will reduce the number of articles that a clinician needs to screen out and increase the number more likely to provide them with an answer based on sound research. This research shows that these filters can assist clinicians in their quest for high-quality clinically relevant articles.
For treatment questions, the primary outcome of retrieval of relevant articles was not different for Clinical Queries compared with PubMed. The filtered searches did result in higher numbers of articles from the core internal medicine journal subset and higher-quality articles that were selected as relevant and that passed scientific criteria; these articles provide the clinicians with better answers to their questions. In addition, because of the popularity of the core journals, clinicians will likely have a higher probability of obtaining the article in full text form, since many institutions have subscriptions to these. Few differences between the retrieval of relevant articles were found from the main PubMed page and from those filtered through the Clinical Queries page, and no differences in search satisfaction, although the non-significant differences favored Clinical Queries searches.
For the diagnosis searches, the filtered results returned more relevant articles, with the first relevant one presented higher in the retrieved list, which would allow searchers to get answers more quickly and reduce the time to screen through results. Clinical Queries also returned significantly fewer articles, which is often a preferable situation because physicians can determine these more quickly if they need to continue searching or if they have found an answer to their question.
The aim of the Clinical Query filters is to optimize search results by increasing the sensitivity, specificity, and precision of searches with the goal of returning more articles that are on target and fewer that are off target. The treatment filter uses the terms (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract])) which increase the yield of higher-quality trials. The current study supports the return of a greater proportion of higher-quality studies than without the use of the filter. The perceived clinical relevance of the retrieved articles to the searches, however, was not impacted by the filter.
Similarly, for the diagnostic studies, the Clinical Queries filter limits results with the term (specificity[Title/Abstract]). The filters resulted in more relevant articles but did not impact participant satisfaction. This is presumably because participants are focused on content relevance rather than methodologic quality (which they were not directed to judge).
Future research includes testing the robustness of the Clinical Queries filters as applied to current databases since they were derived in 2000. The performance of PubMed with and without Clinical Queries on assisting clinicians in arriving at ‘correct’ answers to clinical questions will also be assessed.
Searching databases for research relevant to a physician's question is not an easy task. A number of strategies can help improve search retrieval such as using Boolean operators, controlled vocabulary, and the Participants, Intervention, Control, Outcome format of question analysis. In this study, the approaches to searching by the participants varied greatly. Some very sophisticated users included MeSH terms and truncation; others directly copied the question into the search box. The methodological terms used by participants had varying levels of impact on the operating characteristics of their searches (tables 3, 4).
The precision of searches, the proportion of retrieved articles that are on target, is one of the most important measures for busy clinicians; they want an answer, and they want it quickly and easily. The filtered searches showed some improvements in giving fewer articles, and more on target, but the content of the returned searches still relied heavily on the content terms submitted.
Strengths and limitations
One of the strengths of this study is that the participants were blinded to where their search terms were being sent. They were unaware that the differences between PubMed and the Clinical Queries filter were being tested. Further, the patterns in the results were similar for provided and participants' own search questions, indicating that the findings can be extrapolated beyond the study sample/participants.
The use of standardized questions gives some control on the search terms being used. The generation of good standardized questions is quite challenging; because the questions were based on systematic review topics, assurances were made that the question did not replicate words in the review title to reduce the probability of the systematic review being the first article retrieved and skewing the participants' search by that result.
Limitations for this study included the constraints put on physician searching and lack of access to full-text articles for relevance assessment. Searching is generally an iterative process; once the results of a search are presented, searchers usually refine their search to increase the applicability of the articles retrieved. Participants were only able to adapt their search once, but only if their initial search returned no articles in one of the interfaces. Some expressed frustration at this limit. Participants did not have access to full text of the articles, but they did have the option to access the abstract. Relevance assessments were therefore not based on the whole study, but rather only on the title and optionally the abstract. Only one clinician requested the full text of an article.
The study used more ‘specific’ Clinical Queries search strategies, which minimize the retrieval of ‘off target’ articles. Performance would likely be different if the ‘sensitive’ search filters had been used; these filters maximize the proportion of high-quality studies retrieved (at the expense of somewhat lower specificity).
In this study the use of PubMed Clinical Queries to filter search results improved some search retrieval measures. This was more marked in the quest for diagnostic studies, although trends toward improvement were seen with treatment questions.
Funding CIHR. Grant number 177748.
Ethics approval Ethics approval was provided by the McMaster FHS Research Ethics Board.
Provenance and peer review Not commissioned; externally peer reviewed.