The Center for Computational Biology: resources, achievements, and challenges
- Arthur W Toga,
- Ivo D Dinov,
- Paul M Thompson,
- Roger P Woods,
- John D Van Horn,
- David W Shattuck,
- D Stott Parker
- Center for Computational Biology, Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, Los Angeles, California, USA
- Correspondence to Dr Arthur W Toga, Center for Computational Biology, Laboratory of Neuro Imaging, David Geffen School of Medicine at UCLA, 635 S Charles Young Drive, Suite 225, Los Angeles, CA 90095-7334, USA;
Contributors All authors participated in the efforts to establish the CCB infrastructure, develop effective software tools, support these computational resources, and contributed in writing and proofreading of this manuscript.
- Received 3 August 2011
- Accepted 2 September 2011
- Published Online First 10 November 2011
The Center for Computational Biology (CCB) is a multidisciplinary program where biomedical scientists, engineers, and clinicians work jointly to combine modern mathematical and computational techniques, to perform phenotypic and genotypic studies of biological structure, function, and physiology in health and disease. CCB has developed a computational framework built around the Manifold Atlas, an integrated biomedical computing environment that enables statistical inference on biological manifolds. These manifolds model biological structures, features, shapes, and flows, and support sophisticated morphometric and statistical analyses. The Manifold Atlas includes tools, workflows, and services for multimodal population-based modeling and analysis of biological manifolds. The broad spectrum of biomedical topics explored by CCB investigators include the study of normal and pathological brain development, maturation and aging, discovery of associations between neuroimaging and genetic biomarkers, and the modeling, analysis, and visualization of biological shape, form, and size. CCB supports a wide range of short-term and long-term collaborations with outside investigators, which drive the center's computational developments and focus the validation and dissemination of CCB resources to new areas and scientific domains.
- National centers for biomedical computing
- center for computational biology
- computational neuroscience
- computational infrastructure
- collaborative and sustainable biomedical research
- data sharing
- data mining
The Center for Computational Biology (CCB), one of the National Centers for Biomedical Computing (NCBCs), is focused on developing and applying tools for ‘Computational Atlases’, a framework that goes beyond traditional paper or digital atlases by providing computational methods to map bioimaging and related data from multiple subjects into common coordinate systems for group comparisons. The concept of an atlas is naturally adaptable across different kinds of populations, and atlases can reflect multiple modalities of information, including wide ranges of scale and time. Atlases can incorporate complex mathematical models of biological features, statistical methods for analysis and inference on populations, and an increasing spectrum of scientific disciplines. CCB integrates all of these information perspectives using cutting-edge mathematical models, optimized algorithms, and advanced computational infrastructure.
The Computational Atlas must incorporate accurate registration,1 shape extraction, modeling and analysis,2 voxel3 and tensor based morphometry,4 and spatial-temporal statistics5 in order to understand the sometimes subtle, distributed, and dynamic changes associated with normal and pathological biological processes of the brain.
A powerful example of the CCB computational atlasing efforts is the development of a ‘Manifold Atlas’, an integrated biomedical computing environment that combines a workflow framework with new facilities for statistical inference on biological manifolds. These manifolds are basic mathematical models of biological structure, including shapes and flows, which support morphometric and statistical analyses suitable for individual and population comparisons. The manifold atlas enables holistic statistical analyses of shape information, provides an environment for studying associations between biological structure and function in multimodal population studies, and makes it easier to integrate multidisciplinary methods to address complex translational challenges.
A ‘manifold’ is a mathematical space that, on a sufficiently small scale, resembles Euclidean space. For example, the surface of a brain structure such as the hippocampus is a two-dimensional manifold, while its volume is a three-dimensional (3D) manifold. In neuroimaging, manifolds are used to describe brain structures,6–9 opening the door to mathematical and computational brain mapping methods for analyzing connectivity, development, and function. For example, using manifolds permits development of new registration methods, such as the Diffeomorphic Neuroanatomical Registration Framework described below. Brain features have been modeled as Riemannian manifolds—manifolds that include a metric, thereby providing geometric structure and permitting definition of geodesics (shortest paths) and curvature. CCB uses Riemannian manifolds to represent parametric surfaces10 and to define flows between them (eg, Ricci flows and Riemannian fluid flows),11 12 as well as for shape analysis (eg, spectral embedding) and analysis of high-dimensional diffusion imaging datasets.13
Shape manifolds are also used in biomedical analysis of brain structures. The most basic shapes are ‘curves’ (such as tractographic and sulcal–gyral curves), ‘surfaces’ (such as the outer boundary of the cerebral cortex), and ‘volumes’ (such as subcortical structures and cortical regions). Shape manifolds can be augmented into higher-dimensional manifolds with biological data such as tissue density and gene expression, as well as ‘flows’—which can characterize the evolution of shapes over time—and therefore represent important biomedical patterns such as neurodevelopment, brain activation, and disease progression.
Atlases play fundamental roles in computational biology, both as unified mathematical models and as intuitive computational environments. By its nature, the CCB manifold atlas has a visual representation, which is vital for many types of biological information, and it includes an array of related maps, each of which associates features to points in some coordinate space. Any parameterized set of data may be viewed as a map. An example is a brain map, which associates brain features with 3D or four-dimensional (4D) coordinates. Biological sequence maps are also examples, mapping molecular information with one-dimensional linear coordinates. Combining these maps makes it possible to answer queries that cut across scales and modalities. The CCB focus is on computational biology of the brain; the brain's complexity is so great that a common computational framework is vital.
The new CCB ‘biomorphometry tools’ combine methods from differential geometry, Bayesian theory, and statistics on manifolds (figure 1). The resulting biological inferences permit complex analysis of multimodal information about biological structure. We have also developed methods such as ‘manifold learning’.14–16
Statistical inference on biological manifolds can be used for undertaking a variety of tasks: (1) mathematical definition and representation of biological structures; (2) defining an abstract manifold consisting of such representations, incorporating both differential geometric and manifold-learning methods wherever suitable; (3) defining metrics that measure distances on and between manifolds; (4) constructing biological atlases on manifolds using (1)–(3) above; and finally (5) performing population-based statistical analysis of biological parameters represented on manifolds. The aims of the CCB yield a set of end-to-end analytical workflows (see Pipeline below) that permit fusion of features extracted from structural and diffusion images, followed by analyses that answer families of important biological questions introduced by various driving biological projects (DBPs). The CCB Atlasing Toolkit, a suite of workflow modules for atlasing with biological manifolds, includes Pipeline protocols integrating data services, parallel computation resources, analytical packages, workflow processing, and best practices (such as the protocols embodied in workflows in the Pipeline Library and the CCB Workbench).
Driving Biological Projects
The CCB promotes and nurtures collaborations with outside groups using two complementary mechanisms to initiate, manage, and advance collaborative projects—long-term DBPs and short-term pilot collaborative projects. In the period 2004–2011, the CCB maintained dozens of DBPs and pilot collaborative projects and supported hundreds of service recipients, outside investigators, and infrastructure users.
Each CCB DBP addresses heterogeneous aspects of computational biology. Their cumulative breadth and diversity supports the Center's effort on developing the computational manifold atlas. CCB DBPs included Mapping Language Development Longitudinally, Mapping Brain Changes in Alzheimer's and Those at Risk, Mechanisms Underlying the Clinical Progression of MS and EAE, and Genetic Influences on Brain Structure in Schizophrenia. CCB DBPs also included Identifying Age Related Atrophy Using Level-set Registration of Embedded Maps, Developmental Origin of Phenotypic Variation in Drosophila melanogaster, Mapping Brain Changes in HIV/AIDS, and Vervet Genetics and Brain Morphology. In addition, several stand-alone CCB/NCBC collaborative projects were funded by the NIH, including the Cognitive Atlas Project (http://www.cognitiveatlas.org), the Cardiac Atlas Project (http://www.cardiacatlas.org), and the Diffeomorphic Neuroanatomical Registration Framework (http://www.picsl.upenn.edu/). Details of the goals, achievements and findings of these CCB collaborative projects are available online (http://ccb.loni.ucla.edu/research/neurobiology/).
Collectively, the CCB DBPs have led to over 220 published peer-reviewed articles, generated six complementary computational atlases, designed dozens of end-to-end computational analysis protocols, and provided thousands of datasets to the scientific community. Examples of significant CCB DBP findings include the following.
We made the first time-lapse films of Alzheimer's pathology spreading in the living brain. Our time-lapse maps show the spread of a new compound (FDDNP-PET) that labels amyloid plaques and neurofibrillary tangles in the living brain.17 This mapping technique has been hailed as a breakthrough in the Alzheimer's disease community, as has the earlier development of the first time-maps of structural brain change in Alzheimer's disease.18 This type of dynamic 4D map can show where treatments slow a disease19 and reveal the disease trajectory as it spreads in the living brain.
We are now validating it in a separately funded large-scale Alzheimer's disease project (ADNI).25
We created the first 3D anatomical brain atlas indexing tests of genetic association of two schizophrenia disease-related DISC1 and TRAX haplotypes with regional cortical gray matter density.26
We investigated genotype–phenotype relationships in schizophrenia (figure 2B) and discovered22 23 associations between cortical gray matter density, the schizophrenia risk gene DISC1, and alterations in brain structure associated with deletions at the risk locus 22q11.2 (figure 2C).
There are many quantitative and qualitative metrics used to assess the accomplishments of the Center in the past 7 years. Some of these include number of publications, quality of software tools, impact of supported collaborative research projects, caliber of the trainees. Also relevant are the applications of the techniques and models to new domains and problems. Below we include some specific products that resulted from the CCB research and development efforts.
Since 2004, CCB investigators have published 812 manuscripts, including peer-reviewed journal articles, books, book chapters, and conference proceedings and abstracts. Of these, a CCB member was first author on 237 papers, with the remainder having been authored by someone outside CCB in collaboration with CCB or in 37 without any direct collaboration at all. They designed and implemented 75 image processing, shape analysis, tensor modeling, informatics, and visualization software tools and web services, which were distributed over 10 000 times. They supported 112 active collaborations and serviced hundreds of researchers, mentored 478 trainees, and conducted dozens of training courses and educational events. The CCB also distributed large amounts of imaging, phenotypic, and genetics data, designed 90 different data analysis Pipeline protocols, and provided a 1200 core computational grid infrastructure to over 600 users (http://CCB.loni.ucla.edu). There were seven collaborative RO1 grants that grew out of CCB research projects and matured to the point of becoming stand-alone research endeavors.
The CCB maintains one of the largest neuroimaging archives in the world, with more than 65 different projects, that comprise multiple species, more than 70 000 image volumes, dozens of imaging modalities, and diverse arrays of data on normal and pathological states from thousands of subjects. In addition, meta-data, derived imaging data, and genetics data are available for many subjects and projects (http://ccb.loni.ucla.edu/resources/ccb-data/).
Mathematical modeling and computational algorithms
CCB has developed a unifying approach for non-linear registration, matching general geometric patterns including landmark points, curves, surfaces, and sub-volumes using implicit level set methods. A distance function-based, non-linear landmark curve-matching algorithm27 28 with an inverse-consistent elastic energy was introduced to compute deformation fields carrying source landmarks in the form of curves and/or points to homologous landmarks in a target image. This algorithm facilitates non-linear, inverse-consistent, intensity-based registration methods suitable for 3D image volumes29 30 (figure 3). In addition, we pioneered a method for intrinsic-feature-based shape correspondences31 and an automated detection algorithm for analysis of sulcal, gyral, and sub-cortical patterns.32 33 We also designed and implemented two new level-set based techniques—a multilayer and multilevel level set—for volumetric segmentation of brain imaging data34 35 and a new algorithm for automatic whole brain segmentation, which was trained and validated on manually segmented data.36–38
CCB has actively participated in many NCBC-wide initiatives and computational infrastructure developments. CCB led the design and development of the NCBC Biositemaps (http://www.Biositemaps.org) and the iTools Resourceome,39 provided an open-access computational infrastructure for general biomedical computing, participated in many NCBC dissemination and training events, and shared data, tools, and resources via the NCBC framework. Together with the other NCBCs, CCB has organized a number of training events (http://ccb.loni.ucla.edu/training), provided student fellowships, and disseminated valuable digital educational resources, video archives, and research tutorials (http://www.loni.ucla.edu/SVG/).
The CCB Pipeline is a Java-based platform-agnostic graphical workflow environment for design, distributed client-server execution, and validation and community distribution of computational protocols.40 41 The Pipeline environment enables the sharing and replication of results at multiple institutions and promotes collaborative open science. Figure 4 shows an example of an image registration meta-algorithm implemented completely within the Pipeline environment using heterogeneous types of data, software tools, and services. In addition to computational algorithms, the Pipeline environment also provides access to standardized datasets.
Training and dissemination
The CCB educational and training efforts have involved a wide range of activities including mentoring and supervision of hundreds of undergraduate, graduate, and postgraduate trainees, scientific presentations at national and international conferences, K-12 instructional events and organization of research workshops. Through the CCB, a new UCLA Bioinformatics Inter-Departmental Graduate Student Program has been designed and established. The CCB uses many complementary routes to disseminate resources and knowledge to the general community. Examples include (1) websites and pages (eg, http://www.CCB.ucla.edu, http://www.NITRC.org), (2) peer-reviewed scientific publications (http://www.loni.ucla.edu/Research/Publications/), and (3) educational events (http://ccb.loni.ucla.edu/training/). CCB software resources have been downloaded in large numbers and web-services utilization has increased about 25% semiannually since 2004. CCB discoveries and results have also been broadcast on 15 national and international news and media channels.
Ongoing challenges and future developments
The NCBC program is, by all accounts, a major success. Each center, CCB among them, plans and operates with an expectation of 10 years of funding. Creating a successful center requires this level and duration of support to fully realize its goals as stipulated by the program. Furthermore, these are cooperative agreements, and as such the activities and directions of the centers are strongly influenced and in some instances specifically guided by NIH program staff. The challenge then is how to continue this kind of program within a model that requires traditional peer review, with all its shortcomings. Evaluations delivered by committees with incomplete knowledge of the topical areas for each center is a formula for a random outcome. Grouping all NCBC applications into one or two review panels cannot possibly do justice to the diversity of science represented in this program.
The challenge faced by CCB is its very existence. Future developments will depend on the availability of funding.
Funding This work was initially funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 RR021813.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.