Automated Acquisition of Disease–Drug Knowledge from Biomedical and Clinical Documents: An Initial Study
- aClinical Informatics Research & Development, Partners HealthCare System, Wellesley, MA
- bDivision of General Medicine, Brigham & Women’s Hospital, Boston, MA
- cHarvard Medical School, Boston, MA
- dDepartment of Biomedical Informatics, Columbia University, New York, NY
- eDepartment of Biostatistics, Columbia University, New York, NY
- Correspondence: Elizabeth S. Chen, PhD, Clinical Informatics Research & Development, Partners HealthCare System, 93 Worcester Street, PO Box 81902, Wellesley, MA 02481; e-mail: <eschen{at}partners.org>
- Received 6 February 2007
- Accepted 5 September 2007
Abstract
Objective Explore the automated acquisition of knowledge in biomedical and clinical documents using text mining and statistical techniques to identify disease-drug associations.
Design Biomedical literature and clinical narratives from the patient record were mined to gather knowledge about disease-drug associations. Two NLP systems, BioMedLEE and MedLEE, were applied to Medline articles and discharge summaries, respectively. Disease and drug entities were identified using the NLP systems in addition to MeSH annotations for the Medline articles. Focusing on eight diseases, co-occurrence statistics were applied to compute and evaluate the strength of association between each disease and relevant drugs.
Results Ranked lists of disease-drug pairs were generated and cutoffs calculated for identifying stronger associations among these pairs for further analysis. Differences and similarities between the text sources (i.e., biomedical literature and patient record) and annotations (i.e., MeSH and NLP-extracted UMLS concepts) with regards to disease-drug knowledge were observed.
Conclusion This paper presents a method for acquiring disease-specific knowledge and a feasibility study of the method. The method is based on applying a combination of NLP and statistical techniques to both biomedical and clinical documents. The approach enabled extraction of knowledge about the drugs clinicians are using for patients with specific diseases based on the patient record, while it is also acquired knowledge of drugs frequently involved in controlled trials for those same diseases. In comparing the disease-drug associations, we found the results to be appropriate: the two text sources contained consistent as well as complementary knowledge, and manual review of the top five disease-drug associations by a medical expert supported their correctness across the diseases.
Footnotes
-
This work is supported in part by grants LM007659, LM008635, and LM006910 from the National Library of Medicine. Dr. Markatou is supported by NSF DMS-0504957.









