rss
J Am Med Inform Assoc 2003;10:330-338 doi:10.1197/jamia.M1157
  • Original Investigation
  • Research Paper

The Role of Domain Knowledge in Automating Medical Text Report Classification

  1. Adam B Wilcox,
  2. George Hripcsak
  1. Affiliations: of the authors: Department of Medical Informatics, University of Utah, Salt Lake City, Utah (ABW); Medical Informatics, Intermountain Health Care, Salt Lake City, Utah (ABW); Department of Medical Informatics, Columbia University, New York, New York (GH), USA
  1. Correspondence and reprints: Adam B. Wilcox, PhD, Medical Informatics, Intermountain Health Care, 4646 West Lake Park Blvd., Salt Lake City, UT 84120; e-mail: <lpawilco{at}ihc.com>
  • Received 14 May 2002
  • Accepted 3 March 2003

Abstract

Objective To analyze the effect of expert knowledge on the inductive learning process in creating classifiers for medical text reports.

Design The authors converted medical text reports to a structured form through natural language processing. They then inductively created classifiers for medical text reports using varying degrees and types of expert knowledge and different inductive learning algorithms. The authors measured performance of the different classifiers as well as the costs to induce classifiers and acquire expert knowledge.

Measurements The measurements used were classifier performance, training-set size efficiency, and classifier creation cost.

Results Expert knowledge was shown to be the most significant factor affecting inductive learning performance, outweighing differences in learning algorithms. The use of expert knowledge can affect comparisons between learning algorithms. This expert knowledge may be obtained and represented separately as knowledge about the clinical task or about the data representation used. The benefit of the expert knowledge is more than that of inductive learning itself, with less cost to obtain.

Conclusion For medical text report classification, expert knowledge acquisition is more significant to performance and more cost-effective to obtain than knowledge discovery. Building classifiers should therefore focus more on acquiring knowledge from experts than trying to learn this knowledge inductively.

Footnotes

  • This work was supported by National Library of Medicine Grants R01 LM06910 “Discovering and Applying Knowledge in Clinical Databases,” R01 LM06274 “Unlocking Data from Medical Records with Text Processing,” and Pfizer, Inc. grant “Using Information Systems to Advance Clinical Research and Clinical Care.”

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.