J Am Med Inform Assoc 20:828-835 doi:10.1136/amiajnl-2013-001635
  • Research and applications

A hybrid system for temporal information extraction from clinical text

  1. Hua Xu1,3
  1. 1School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
  2. 2Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
  3. 3Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA
  4. 4Department of Medicine, Vanderbilt University, School of Medicine, Nashville, Tennessee, USA
  1. Correspondence to Dr Hua Xu, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St, Suite 600, Houston, TX 77030, USA; hua.xu{at}
  • Received 9 January 2013
  • Revised 11 March 2013
  • Accepted 18 March 2013
  • Published Online First 9 April 2013


Objective To develop a comprehensive temporal information extraction system that can identify events, temporal expressions, and their temporal relations in clinical text. This project was part of the 2012 i2b2 clinical natural language processing (NLP) challenge on temporal information extraction.

Materials and methods The 2012 i2b2 NLP challenge organizers manually annotated 310 clinic notes according to a defined annotation guideline: a training set of 190 notes and a test set of 120 notes. All participating systems were developed on the training set and evaluated on the test set. Our system consists of three modules: event extraction, temporal expression extraction, and temporal relation (also called Temporal Link, or ‘TLink’) extraction. The TLink extraction module contains three individual classifiers for TLinks: (1) between events and section times, (2) within a sentence, and (3) across different sentences. The performance of our system was evaluated using scripts provided by the i2b2 organizers. Primary measures were micro-averaged Precision, Recall, and F-measure.

Results Our system was among the top ranked. It achieved F-measures of 0.8659 for temporal expression extraction (ranked fourth), 0.6278 for end-to-end TLink track (ranked first), and 0.6932 for TLink-only track (ranked first) in the challenge. We subsequently investigated different strategies for TLink extraction, and were able to marginally improve performance with an F-measure of 0.6943 for TLink-only track.

Related Article

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article