J Am Med Inform Assoc doi:10.1136/amiajnl-2012-001563
  • Research and applications

A literature search tool for intelligent extraction of disease-associated genes

Open Access
  1. Dennis P Wall1,2
  1. 1Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
  2. 2Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  1. Correspondence to Dr Dennis P Wall, Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; dpwall{at}
  • Received 11 December 2012
  • Revised 15 July 2013
  • Accepted 8 August 2013
  • Published Online First 2 September 2013


Objective To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods.

Methods We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article.

Results We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder–gene link are more extensive and accurate than other general purpose gene-to-disorder association databases.

Conclusions We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene–disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Related Article

Open Access

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article