A Day in the Life of PubMed: Analysis of a Typical Day’s Query Log
- aUniversity of Texas School of Health Information Sciences at Houston, Houston, TX
- bDepartment of Pediatrics, Division of Pediatric Critical Care, University of Texas School of Medicine at Houston, Houston, TX
- cDepartment of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR
- dDepartment of Internal Medicine, Division of General Internal Medicine, University of Texas School of Medicine at Houston, Houston, TX
- Correspondence and reprints: Dr. Elmer V. Bernstam, University of Texas School of Health Information Sciences at Houston, 7000 Fannin St. Suite 600, Houston, TX 77030; (e-mail: <Elmer.V.Bernstam{at}uth.tmc.edu>)
- Received 1 July 2006
- Accepted 6 December 2006
Abstract
Objective To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines.
Design We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day.
Measurements We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies.
Results The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms.
Conclusion PubMed’s usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.
Footnotes
-
Supported in part by a training fellowship from the W. M. Keck Foundation to the Gulf Coast Consortia through the Keck Center for Computational and Structural Biology, NLM grant 5K22LM008306 and NCRR grant 1UL1RR024148.








