The Evaluation of a Temporal Reasoning System in Processing Clinical Discharge Summaries
- aDepartment of Biomedical Informatics, Columbia University, New York, NY
- bClinical Informatics Research and Development, Partners HealthCare, Boston, MA
- cDepartment of Computer and Information Science, Brooklyn College, Brooklyn, NY
- Correspondence: Li Zhou, PhD, BMed, Clinical Informatics Research and Development, Partners HealthCare, 93 Worcester Street, 2nd Floor, Wellesley, MA 02481; e-mail: <lzhou2{at}partners.org>
- Received 3 April 2007
- Accepted 20 September 2007
Abstract
Context TimeText is a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text.
Objective To measure the accuracy of the TimeText for processing clinical discharge summaries.
Design Six physicians with biomedical informatics training served as domain experts. Twenty discharge summaries were randomly selected for the evaluation. For each of the first 14 reports, 5 to 8 clinically important medical events were chosen. The temporal reasoning system generated temporal relations about the endpoints (start or finish) of pairs of medical events. Two experts (subjects) manually generated temporal relations for these medical events. The system and expert-generated results were assessed by four other experts (raters). All of the twenty discharge summaries were used to assess the system’s accuracy in answering time-oriented clinical questions. For each report, five to ten clinically plausible temporal questions about events were generated. Two experts generated answers to the questions to serve as the gold standard. We wrote queries to retrieve answers from system’s output.
Measurements Correctness of generated temporal relations, recall of clinically important relations, and accuracy in answering temporal questions.
Results The raters determined that 97% of subjects’ 295 generated temporal relations were correct and that 96.5% of the system’s 995 generated temporal relations were correct. The system captured 79% of 307 temporal relations determined to be clinically important by the subjects and raters. The system answered 84% of the temporal questions correctly.
Conclusion The system encoded the majority of information identified by experts, and was able to answer simple temporal questions.
Footnotes
-
This work was funded by National Library of Medicine (NLM) “Discovering and applying knowledge in clinical databases” (R01 LM006910).








