rss
JAMIA 2009;16:806-815 doi:10.1197/jamia.M3037
  • Original Investigation
  • Research Paper

Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents

  1. Joshua C Denny,
  2. Anderson Spickard III,
  3. Kevin B Johnson,
  4. Neeraja B Peterson,
  5. Josh F Peterson,
  6. Randolph A Miller
  1. Affiliations of the authors: Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN; Division of General Internal Medicine and Public Health, Department of Medicine(JCD, AS,NBP, JFP), Vanderbilt University School of Medicine, Nashville, TN; Department of Pediatrics(KBJ), Vanderbilt University School of Medicine, Nashville, TN; Tennessee Valley Geriatric Research Education Clinical Center (GRECC), Tennessee Valley Healthcare System, Veterans Administration, Nashville, TN(JFP)
  1. Correspondence: Joshua C. Denny, MD, MS, Eskind Biomedical Library, Room 442, 2209 Garland Ave, Nashville TN 37232 e-mail: <josh.denny{at}vanderbilt.edu>
  • Received 17 October 2008
  • Accepted 3 August 2009

Abstract

Objective Clinical notes, typically written in natural language, often contain substructure that divides them into sections, such as “History of Present Illness” or “Family Medical History.” The authors designed and evaluated an algorithm (“SecTag”) to identify both labeled and unlabeled (implied) note section headers in “history and physical examination” documents (“H&P notes”).

Design The SecTag algorithm uses a combination of natural language processing techniques, word variant recognition with spelling correction, terminology-based rules, and naive Bayesian scoring methods to identify note section headers. Eleven physicians evaluated SecTag's performance on 319 randomly chosen H&P notes.

Measurements The primary outcomes were the algorithm's recall and precision in identifying all document sections and a predefined list of twenty-nine major sections. A secondary outcome was to evaluate the algorithm's ability to recognize the correct start and end boundaries of identified sections.

Results The SecTag algorithm identified 16,036 total sections and 7,858 major sections. Physician evaluators classified 15,329 as true positives and identified 160 sections omitted by SecTag. The recall and precision of the SecTag algorithm were 99.0 and 95.6% for all sections, 98.6 and 96.2% for major sections, and 96.6 and 86.8% for unlabeled sections. The algorithm determined the correct starting and ending text boundaries for 94.8% of labeled sections and 85.9% of unlabeled sections.

Conclusions The SecTag algorithm accurately identified both labeled and unlabeled sections in history and physical documents. This type of algorithm may assist in natural language processing applications, such as clinical decision support systems or competency assessment for medical trainees.

Footnotes

    Access policy for JAMIA

    All content published in JAMIA is deposited with PubMedCentral by the publisher but with varying embargo times. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication. Research funded by government and other recognised agencies is deposited with a 12 month embargo. All other content is deposited with a 36 month embargo.

    The Journal of the American Medical Informatics Association is published for the American Medical Informatics Association by BMJ Publishing Group Ltd.