J Am Med Inform Assoc 19:e110-e118 doi:10.1136/amiajnl-2011-000562
  • Research and applications
  • FOCUS on clinical research informatics

Automated identification of extreme-risk events in clinical incident reports

Editor's Choice
  1. Enrico Coiera
  1. Centre for Health Informatics, University of New South Wales, Sydney, Australia
  1. Correspondence to Dr Mei-Sing Ong, Centre for Health Informatics, University of New South Wales, Sydney 2052, Australia;{at}
  1. Contributors MSO developed the classifiers, analyzed the data, and wrote the manuscript. FM and EC analyzed the data and reviewed the manuscript.

  • Received 22 August 2011
  • Accepted 16 December 2011
  • Published Online First 11 January 2012


Objectives To explore the feasibility of using statistical text classification to automatically detect extreme-risk events in clinical incident reports.

Methods Statistical text classifiers based on Naïve Bayes and Support Vector Machine (SVM) algorithms were trained and tested on clinical incident reports to automatically detect extreme-risk events, defined by incidents that satisfy the criteria of Severity Assessment Code (SAC) level 1. For this purpose, incident reports submitted to the Advanced Incident Management System by public hospitals from one Australian region were used. The classifiers were evaluated on two datasets: (1) a set of reports with diverse incident types (n=120); (2) a set of reports associated with patient misidentification (n=166). Results were assessed using accuracy, precision, recall, F-measure, and area under the curve (AUC) of receiver operating characteristic curves.

Results The classifiers performed well on both datasets. In the multi-type dataset, SVM with a linear kernel performed best, identifying 85.8% of SAC level 1 incidents (precision=0.88, recall=0.83, F-measure=0.86, AUC=0.92). In the patient misidentification dataset, 96.4% of SAC level 1 incidents were detected when SVM with linear, polynomial or radial-basis function kernel was used (precision=0.99, recall=0.94, F-measure=0.96, AUC=0.98). Naïve Bayes showed reasonable performance, detecting 80.8% of SAC level 1 incidents in the multi-type dataset and 89.8% of SAC level 1 patient misidentification incidents. Overall, higher prediction accuracy was attained on the specialized dataset, compared with the multi-type dataset.

Conclusion Text classification techniques can be applied effectively to automate the detection of extreme-risk events in clinical incident reports.


  • Funding This research is supported by NHMRC Program Grant 568612 and the Australian Research Council (ARC) LP0775532. The funding sources played no role in the design, conduct, or reporting of the study.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Related Article

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article