J Am Med Inform Assoc 16:580-584 doi:10.1197/jamia.M3087
  • Focus on i2b2 Obesity NLP Challenge
  • Research Paper

Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier

  1. Illés Solta,
  2. Domonkos Tikkb,
  3. Viktor Gálc,
  4. Zsolt T Kardkovácsa
  1. aDepartment of Media Informatics and Telematics, Budapest University of Technology and Economics, Budapest, Hungary
  2. bInstitute of Computer Science, Humboldt University in Berlin, Berlin, Germany
  3. cDepartment of Computer Science, Australian National University, Acton, Australia
  1. Correspondence: Illés Solt, Department of Media Informatics and Telematics, Budapest University of Technology and Economics, 1117 Budapest, Magyar tudósok krt. 2, Hungary; email: <illes.solt{at}>.
  • Received 2 December 2008
  • Accepted 7 April 2009


Objective Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task.

Design The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels.

Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F1-macro and F1-micro for measurements.

Results On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F1-macro = 0.80 for the textual and F1-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge.

Conclusions The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse.


  • Domonkos Tikk was supported by the Alexander von Humboldt Foundation.

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article