J Am Med Inform Assoc 18:i116-i124 doi:10.1136/amiajnl-2011-000321
  • Research and applications
  • Focus on clinical and translational research

EliXR: an approach to eligibility criteria extraction and representation

  1. Stephen B Johnson1
  1. 1Department of Biomedical Informatics, Columbia University, New York, New York, USA
  2. 2Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, USA
  1. Correspondence to Chunhua Weng, Department of Biomedical Informatics, Columbia University, 622 W 168 Street, VC-5, New York, NY 10032, USA; cw2384{at}
  • Received 18 April 2011
  • Accepted 22 June 2011
  • Published Online First 31 July 2011


Objective To develop a semantic representation for clinical research eligibility criteria to automate semistructured information extraction from eligibility criteria text.

Materials and Methods An analysis pipeline called eligibility criteria extraction and representation (EliXR) was developed that integrates syntactic parsing and tree pattern mining to discover common semantic patterns in 1000 eligibility criteria randomly selected from The semantic patterns were aggregated and enriched with unified medical language systems semantic knowledge to form a semantic representation for clinical research eligibility criteria.

Results The authors arrived at 175 semantic patterns, which form 12 semantic role labels connected by their frequent semantic relations in a semantic network.

Evaluation Three raters independently annotated all the sentence segments (N=396) for 79 test eligibility criteria using the 12 top-level semantic role labels. Eight-six per cent (339) of the sentence segments were unanimously labelled correctly and 13.8% (55) were correctly labelled by two raters. The Fleiss' κ was 0.88, indicating a nearly perfect interrater agreement.

Conclusion This study present a semi-automated data-driven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.


  • Funding This research was supported by the National Library of Medicine grants R01LM009886, R01LM010815, AHRQ grant R01 HS019853 and CTSA award UL1 RR024156.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Related Article

Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.

Navigate This Article