Identifying survival associated morphological features of triple negative breast cancer using multiple datasets
- 1Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
- 2Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA
- 3Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, USA
- 4Department of Pathology, The Ohio State University, Columbus, Ohio, USA
- 5Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
- 6Division of Medical Oncology and the Breast Program James Cancer Hospital, The Ohio State University, Columbus, Ohio, USA
- 7Biomedical Informatics Shared Resource, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, USA
- Correspondence to Dr Kun Huang, Department of Biomedical Informatics, The Ohio State University, Room 218, 420 w 12th, Columbus, OH 43210, USA;
- Received 2 December 2012
- Revised 7 March 2013
- Accepted 18 March 2013
- Published Online First 12 April 2013
Background and objective Biomarkers for subtyping triple negative breast cancer (TNBC) are needed given the absence of responsive therapy and relatively poor prediction of survival. Morphology of cancer tissues is widely used in clinical practice for stratifying cancer patients, while genomic data are highly effective to classify cancer patients into subgroups. Thus integration of both morphological and genomic data is a promising approach in discovering new biomarkers for cancer outcome prediction. Here we propose a workflow for analyzing histopathological images and integrate them with genomic data for discovering biomarkers for TNBC.
Materials and methods We developed an image analysis workflow for extracting a large collection of morphological features and deployed the same on histological images from The Cancer Genome Atlas (TCGA) TNBC samples during the discovery phase (n=44). Strong correlations between salient morphological features and gene expression profiles from the same patients were identified. We then evaluated the same morphological features in predicting survival using a local TNBC cohort (n=143). We further tested the predictive power on patient prognosis of correlated gene clusters using two other public gene expression datasets.
Results and conclusion Using TCGA data, we identified 48 pairs of significantly correlated morphological features and gene clusters; four morphological features were able to separate the local cohort with significantly different survival outcomes. Gene clusters correlated with these four morphological features further proved to be effective in predicting patient survival using multiple public gene expression datasets. These results suggest the efficacy of our workflow and demonstrate that integrative analysis holds promise for discovering biomarkers of complex diseases.