Application of statistical machine translation to public health information: a feasibility study
- 1Department of Electrical Engineering, University of Washington, Seattle, Washington, USA
- 2Northwest Center for Public Health Practice, University of Washington, Seattle, Washington, USA
- 3Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, Washington, USA
- Correspondence to Professor Katrin Kirchhoff, Department of Electrical Engineering, University of Washington, Box 352500, Seattle, WA 98195, USA;
- Received 11 February 2011
- Accepted 24 March 2011
- Published Online First 15 April 2011
Objective Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials.
Design The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations.
Results Machine translation plus postediting took 15–53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors.
Conclusion The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations.
- Public health informatics
- consumer health information
- natural language processing
- vulnerable populations
Funding This work was supported by the Northwest Preparedness and Response Research Center (PERRC) grant number P01TP000297, CDC Center of Excellence in Public Health Informatics grant P01 HK 000027, the National Library of Medicine Medical Informatics Training Grant T15 LM007442-07, National Library of Medicine Grant 1R01LM010811-01 and a grant from the University of Washington's Provost's Office to KK.
Competing interests None.
Ethics approval The University of Washington Institutional Review Board approved this study.
Provenance and peer review Not commissioned; externally peer reviewed.