Essie: A Concept-based Search Engine for Structured Biomedical Text
- Affiliations of the authors: Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, and Thoughtful Solutions, Inc., McLean, VA
- Correspondence and reprints: Nicholas C. Ide, Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894 e-mail: <ide{at}nlm.nih.gov>
- Received 31 July 2006
- Accepted 26 January 2007
Abstract
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain.
Footnotes
-
This article is written by an employee of the US Government and is in the public domain. This article may be republished and distributed without penalty.
-
The views expressed in this paper do not necessarily represent those of any U.S. government agency, but rather reflect the opinions of the authors.








