rss
J Am Med Inform Assoc 2009;16:601-605 doi:10.1197/jamia.M3097
  • Focus on i2b2 Obesity NLP Challenge
  • Research Paper

Semi-automated Construction of Decision Rules to Predict Morbidities from Clinical Texts

  1. Richárd Farkasa,
  2. György Szarvas, PhDa,b,
  3. István Hegedűsc,
  4. Attila Almásic,
  5. Veronika Vinczec,
  6. Róbert Ormándic,
  7. Róbert Busa-Fekete, PhDc,d
  1. aResearch Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary
  2. bTechnische Universität Darmstadt, Department of Computer Science, Ubiquitous Knowledge Processing Lab, Darmstadt, Germany
  3. cUniversity of Szeged Department of Informatics, Szeged, Hungary
  4. dLAL, University of Paris-Sud, CNRS, Orsay, France
  1. Correspondence: Richárd Farkas, Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged H-6720, Szeged, Aradi vértanúk tere 1, Hungary; e-mail: <rfarkas{at}inf.u-szeged.hu>.
  • Received 8 December 2008
  • Accepted 7 April 2009

Abstract

Objective In this study the authors describe the system submitted by the team of University of Szeged to the second i2b2 Challenge in Natural Language Processing for Clinical Data. The challenge focused on the development of automatic systems that analyzed clinical discharge summary texts and addressed the following question: “Who's obese and what co-morbidities do they (definitely/most likely) have?”. Target diseases included obesity and its 15 most frequent comorbidities exhibited by patients, while the target labels corresponded to expert judgments based on textual evidence and intuition (separately).

Design The authors applied statistical methods to preselect the most common and confident terms and evaluated outlier documents by hand to discover infrequent spelling variants. The authors expected a system with dictionaries gathered semi-automatically to have a good performance with moderate development costs (the authors examined just a small proportion of the records manually).

Measurements Following the standard evaluation method of the second Workshop on challenges in Natural Language Processing for Clinical Data, the authors used both macro- and microaveraged Fβ=1 measure for evaluation.

Results The authors submission achieved a microaverage Fβ=1 score of 97.29% for classification based on textual evidence (macroaverage Fβ=1 = 76.22%) and 96.42% for intuitive judgments (macroaverage Fβ=1 = 67.27%).

Conclusions The results demonstrate the feasibility of the authors approach and show that even very simple systems with a shallow linguistic analysis can achieve remarkable accuracy scores for classifying clinical records on a limited set of concepts.

Footnotes

  • An earlier version of this paper was presented at the NLP Challenge Workshop, sponsored by the i2b2 National Center for Biomedical Computing in November 2008. The workshop papers were not published.

This Article

Services

  1. Request permissions

Responses

  1. Submit a response
  2. No responses published

Social bookmarking

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.