Using Regular Expressions to Abstract Blood Pressure and Treatment Intensification Information from the Text of Physician Notes
- Alexander Turchin,
- Nikheel S Kolatkar,
- Richard W Grant,
- Eric C Makhni,
- Merri L Pendergrass,
- Jonathan S Einbinder
- Affiliations of the authors: Division of Endocrinology, Brigham and Women's Hospital, Boston, MA (AT, NSK, MLP); Division of General Medicine, Massachusetts General Hospital, Boston, MA (RWG); Harvard Medical School, Boston, MA (AT, NSK, RWG, ECM, MLP, JSE); Clinical Informatics Research and Development, Partners HealthCare System, Boston, MA (AT, JSE); Division of General Medicine, Brigham and Women's Hospital, Boston, MA (JSE)
- Correspondence and reprints: Alexander Turchin, MD, MS, Clinical Informatics Research and Development, Partners HealthCare System, 93 Worcester Street, Suite 201, Wellesley, MA 02481; e-mail: <aturchin{at}partners.org>
- Received 7 February 2006
- Accepted 8 August 2006
Abstract
This case study examined the utility of regular expressions to identify clinical data relevant to the epidemiology of treatment of hypertension. We designed a software tool that employed regular expressions to identify and extract instances of documented blood pressure values and anti-hypertensive treatment intensification from the text of physician notes. We determined sensitivity, specificity and precision of identification of blood pressure values and anti-hypertensive treatment intensification using a gold standard of manual abstraction of 600 notes by two independent reviewers. The software processed 370 Mb of text per hour, and identified elevated blood pressure documented in free text physician notes with sensitivity and specificity of 98%, and precision of 93.2%. Anti-hypertensive treatment intensification was identified with sensitivity 83.8%, specificity of 95.0%, and precision of 85.9%. Regular expressions can be an effective method for focused information extraction tasks related to high-priority disease areas such as hypertension.
Footnotes
-
This research was supported in part by the Partners HealthCare IS Research Council (AT, JSE), Diabetes Trust Foundation (AT), NHLBI training grant T32HL007609 (NSK), and NIDDK Career Development Award K23 DK067452 (RWG).








