Evaluation of record linkage between a large healthcare provider and the Utah Population Database
- 1VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
- 2Division of Clinical Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
- 3Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA
- 4Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA
- 5Oncology Clinical Program, Intermountain Healthcare, Salt Lake City, Utah, USA
- 6Division of Genetic Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah, USA
- 7Department of Oncological Sciences, University of Utah, Salt Lake City, Utah, USA
- Correspondence to Professor Geraldine P Mineau, Huntsman Cancer Institute, 2000 Circle of Hope, University of Utah, Salt Lake City, Utah 84112, USA;
- Received 26 April 2011
- Accepted 11 July 2011
- Published Online First 16 September 2011
Objective Electronically linked datasets have become an important part of clinical research. Information from multiple sources can be used to identify comorbid conditions and patient outcomes, measure use of healthcare services, and enrich demographic and clinical variables of interest. Innovative approaches for creating research infrastructure beyond a traditional data system are necessary.
Materials and methods Records from a large healthcare system's enterprise data warehouse (EDW) were linked to a statewide population database, and a master subject index was created. The authors evaluate the linkage, along with the impact of missing information in EDW records and the coverage of the population database. The makeup of the EDW and population database provides a subset of cancer records that exist in both resources, which allows a cancer-specific evaluation of the linkage.
Results About 3.4 million records (60.8%) in the EDW were linked to the population database with a minimum accuracy of 96.3%. It was estimated that approximately 24.8% of target records were absent from the population database, which enabled the effect of the amount and type of information missing from a record on the linkage to be estimated. However, 99% of the records from the oncology data mart linked; they had fewer missing fields and this correlated positively with the number of patient visits.
Discussion and conclusion A general-purpose research infrastructure was created which allows disease-specific cohorts to be identified. The usefulness of creating an index between institutions is that it allows each institution to maintain control and confidentiality of their own information.
- Master subject index
- record linking
- cancer cohort
- population database
- record linking
- master subject index
- population database
Funding Funds for this work were provided by training grant No LM007124-11 from the National Library of Medicine and Robert Wood Johnson Foundation. This project was sponsored by the Huntsman Intermountain Cancer Control Program. Partial support for all datasets within the Utah Population Database (UPDB) was provided by the University of Utah Huntsman Cancer Institute and the Huntsman Cancer Institute Cancer Center Support grant, P30 CA42014 from National Cancer Institute. Support for the Utah Cancer Registry is provided by Contract No HHSN 261201000026C from the National Cancer Institute with additional support from the Utah Department of Health and the University of Utah. Support for this project was also provided by the Division of Genetic Epidemiology in the Department of Biomedical Informatics University of Utah. This work was supported using resources and facilities at the VA Salt Lake City Health Care System with funding support from the VA Informatics and Computing Infrastructure (VINCI), VA HSR HIR 08-204, and the Consortium for Healthcare Informatics Research (CHIR), VA HSR HIR 08-374.
Competing interests None.
Ethics approval This study was approved by University of Utah and Intermountain Healthcare.
Provenance and peer review Not commissioned; externally peer reviewed.