rss
J Am Med Inform Assoc 2008;15:14-24 doi:10.1197/jamia.M2408
  • Focus On Medical Record Identification of Smoking Status
  • Viewpoint Paper

Identifying Patient Smoking Status from Medical Discharge Records

  1. Özlem Uzunera,b,
  2. Ira Goldsteina,
  3. Yuan Luoa,
  4. Isaac Kohanec
  1. aUniversity at Albany, State University of New York, Albany, NY
  2. bMassachusetts Institute of Technology, Boston, MA
  3. cChildren’s Hospital and Harvard Medical School, Boston, MA
  1. Correspondence: Özlem Uzuner, PhD, University at Albany, SUNY, Draper 114A, 135 Western Avenue, Albany, NY 12222 e-mail: <mailto:ouzuner{at}albany.edu>
  • Received 21 February 2007
  • Accepted 30 June 2007

Introduction

Clinical narrative records contain much useful information. However, most clinical narratives are in the form of fragmented English free text, showing the characteristics of a clinical sublanguage. This makes their linguistic processing, search, and retrieval challenging.1 Traditional natural language processing (NLP) tools are not designed for the fragmented free text found in narrative clinical records; therefore, they do not perform well on this type of data.2 Limited access to clinical records has been a barrier to the widespread development of medical language processing (MLP) technologies. In the absence of a standardized, publicly available ground truth that encourages the development of MLP systems and allows their head-to-head comparison, successful MLP efforts have been limited, e.g., MedLEE3 and Symtxt.4 A few MLP systems have been developed,5 and such efforts have successfully shown the usefulness of MLP in clinical settings.6 7 8

To improve the availability of clinical records and to contribute to the advancement of the state of the art in MLP, within the i2b2 (Informatics for Integrating Biology to the Bedside) project, the authors de-identified and released a set of clinical records from Partners HealthCare. These records provided the basis for the development of ground truth for two challenge questions:

  • 1 Automatic de-identification of clinical data, i.e., de-identification challenge.

  • 2 Automatic evaluation of the smoking status of patients based on medical records, i.e., smoking challenge.

Representative teams from the MLP community participated in the two challenges and met at a workshop organized by the authors to discuss the results of the challenges. The workshop was co-sponsored by the American Medical Informatics Association and met in conjunction with its Fall Symposium in November 2006. This article provides an overview of the smoking challenge and the findings of the workshop. An overview of the de-identification challenge can …

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Unlocked fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.