Evaluating Relevance Ranking Strategies for MEDLINE Retrieval
- National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, MD 20894
- Correspondence: Zhiyong Lu, NCBI/NLM/NIH, 8600 Rockville Pike, Bethesda, MD 20852; e-mail: <luzh{at}ncbi.nlm.nih.gov>
- Received 18 July 2008
- Accepted 28 September 2008
Abstract
This paper evaluates the retrieval effectiveness of relevance ranking strategies on a collection of 55 queries and about 160,000 MEDLINE® citations used in the 2006 and 2007 Text Retrieval Conference (TREC) Genomics Tracks. The authors study two relevance ranking strategies: term frequency–inverse document frequency (TF-IDF) weighting and sentence-level co-occurrence, and examine their ability to rank retrieved MEDLINE documents given user queries. Furthermore, the authors use the reverse chronological order—PubMed's default display option—as a baseline for comparison. Retrieval effectiveness is assessed using both mean average precision and mean rank precision. Experimental results show that retrievals based on the two strategies had improved performance over the baseline performance, and that TF-IDF weighting is more effective in retrieving relevant documents based on the comparison between the two strategies.
Footnotes
-
Supported by the Intramural Research Program of NIH, National Library of Medicine. The authors are grateful to the TREC organizers for their efforts in producing and making the text collection and relevance judgments publicly available.
-
↵a Translations shown here were obtained in March 2008. Changes to PubMed after March 2008 may result in different translations.
-
↵† Detailed description of our statistical test is given as supplementary material, along with all of the pair-wise comparison results, publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Relevance-ranking/supplementary.pdf.









