rss
JAMIA 2009;16:25-31 doi:10.1197/jamia.M2996
  • The Practice of Informatics
  • Viewpoint Paper

Towards Automatic Recognition of Scientifically Rigorous Clinical Research Evidence

  1. Halil Kilicoglua,b,
  2. Dina Demner-Fushmanb,
  3. Thomas C Rindfleschb,
  4. Nancy L Wilczynskic,
  5. R Brian Haynesc
  1. aDepartment of Computer Science and Software Engineering, Concordia University, Montréal, QC, Canada
  2. bNational Library of Medicine, National Institutes of Health, Bethesda, MD
  3. cHealth Information Research Unit, McMaster University, Hamilton, ON, Canada
  1. Correspondence: Halil Kilicoglu, MS, Concordia University, Department of Computer Science and Software Engineering, 1515 Ste Catherine West, Montréal, QC, H3G 1M8, Canada; e-mail: <h_kilico{at}cse.concordia.ca>
  • Received 8 September 2008
  • Accepted 30 September 2008

Abstract

The growing numbers of topically relevant biomedical publications readily available due to advances in document retrieval methods pose a challenge to clinicians practicing evidence-based medicine. It is increasingly time consuming to acquire and critically appraise the available evidence. This problem could be addressed in part if methods were available to automatically recognize rigorous studies immediately applicable in a specific clinical situation. We approach the problem of recognizing studies containing useable clinical advice from retrieved topically relevant articles as a binary classification problem. The gold standard used in the development of PubMed clinical query filters forms the basis of our approach. We identify scientifically rigorous studies using supervised machine learning techniques (Naïve Bayes, support vector machine (SVM), and boosting) trained on high-level semantic features. We combine these methods using an ensemble learning method (stacking). The performance of learning methods is evaluated using precision, recall and F1 score, in addition to area under the receiver operating characteristic (ROC) curve (AUC). Using a training set of 10,000 manually annotated MEDLINE citations, and a test set of an additional 2,000 citations, we achieve 73.7% precision and 61.5% recall in identifying rigorous, clinically relevant studies, with stacking over five feature-classifier combinations and 82.5% precision and 84.3% recall in recognizing rigorous studies with treatment focus using stacking over word + metadata feature vector. Our results demonstrate that a high quality gold standard and advanced classification methods can help clinicians acquire best evidence from the medical literature.

Footnotes

  • Supported in part by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.

Access policy for JAMIA

All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.