Direct2Experts: a pilot national network to demonstrate interoperability among research-networking platforms
- Griffin M Weber1,
- William Barnett2,
- Mike Conlon3,
- David Eichmann4,
- Warren Kibbe5,
- Holly Falk-Krzesinski6,
- Michael Halaas7,
- Layne Johnson8,
- Eric Meeks9,
- Donald Mitchell10,
- Titus Schleyer11,
- Sarah Stallings12,
- Michael Warden13,
- Maninder Kahlon9,
- Members of the Direct2Experts Collaboration
- 1Information Technology, Harvard Medical School, Boston, Massachusetts, USA
- 2Center for Applied Cybersecurity Research, Indiana University, Bloomington, Indiana, USA
- 3Clinical and Translational Science Institute, University of Florida, Gainesville, Florida, USA
- 4School of Library and Information Science, University of Iowa, Iowa City, Iowa, USA
- 5Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA
- 6Northwestern University Clinical and Translational Sciences Institute, Northwestern University, Chicago, Illinois, USA
- 7IRT Systems Development and Data Management, Stanford University School of Medicine, Menlo Park, California, USA
- 8Health Sciences Libraries, University of Minnesota, Minneapolis, Minnesota, USA
- 9Clinical and Translational Science Institute, University of California, San Francisco, California, USA
- 10Stanford University School of Medicine, Stanford, California, USA
- 11Center for Dental Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- 12Colorado Clinical and Translational Sciences Institute, University of Colorado, Denver, Aurora, Colorado, USA
- 13Elsevier, Ann Arbor, Michigan, USA
- Correspondence to Dr Griffin M Weber, Harvard Medical School, Information Technology, 107 Avenue Louis Pasteur, Boston, MA 02115, USA;
- Received 22 February 2011
- Accepted 20 September 2011
- Published Online First 28 October 2011
Research-networking tools use data-mining and social networking to enable expertise discovery, matchmaking and collaboration, which are important facets of team science and translational research. Several commercial and academic platforms have been built, and many institutions have deployed these products to help their investigators find local collaborators. Recent studies, though, have shown the growing importance of multiuniversity teams in science. Unfortunately, the lack of a standard data-exchange model and resistance of universities to share information about their faculty have presented barriers to forming an institutionally supported national network. This case report describes an initiative, which, in only 6 months, achieved interoperability among seven major research-networking products at 28 universities by taking an approach that focused on addressing institutional concerns and encouraging their participation. With this necessary groundwork in place, the second phase of this effort can begin, which will expand the network's functionality and focus on the end users.
- Search engine
- information storage and retrieval
- access to information
- multi-institutional systems
- research personnel
- artificial intelligence
- biomedical informatics
- machine learning
- social network analysis
- clinical data repositories
- data warehousing
- informatics extraction
- research networking
- clinical data
- translational informatics
Whereas biomedical researchers increasingly leverage comprehensive publication, award, genomic, proteomic, pathway, and other datasets to advance domain research, profile data about themselves, key to forming the partnerships that are foundational for translational and team science, are not available. A national research network of these profiles can: (1) connect institution-level resources, university enterprise systems, national research networks, publicly available research data, and restricted data harvested into the expertise profiles; (2) enable more rapid discovery and recommendation of researchers, expertise, and resources using network-analysis techniques; (3) support the development of new collaborative science teams to address new or existing research challenges; (4) provide tools that support research, such as CV generation; and (5) facilitate evaluation of research, scholarly activity, and resources, especially over time.1–3 This is particularly important because productive cross-disciplinary research collaboration is an essential feature of a robust translational research enterprise, collaborative research is receiving proportionally increasing amounts of federal funding and leading to more highly cited publications,4 5 and investigators are becoming more comfortable with online collaboration. In addition, a national expertise network has many other potential audiences and use cases, such as students and junior faculty looking for mentors, investigators seeking jobs, researchers sharing resources and data sets, conference organizers searching for speakers, and committees (like data-safety-monitoring boards) needing extrainstitutional representation.
In August 2010, the Clinical and Translational Science Award (CTSA) Research Networking Group launched an initiative to develop a pilot network, which we call Direct2Experts, to enable users to search for biomedical researchers across multiple institutions in a way that is more effective and efficient than existing methods, such as Google or Facebook. Because the pilot was shaped by our belief that individual institutions can provide ‘cleaner,’ more complete and authoritative data about their own researchers than what is publicly available, we decided to focus our pilot network on generating buy-in from institutions so that they will be both willing and eager to share their information and encourage their researchers to adopt the tool. Although several of the platforms, including the National Institutes of Health funded VIVO,6 Harvard's Profiles Research Networking Software,7 and Elsevier's SciVal Experts,8 had existing networks connecting instances of their own products, we sought to solve a complementary challenge, which is combining these networks to reach the broadest set of institutions.
Although it seems counterintuitive, we determined early in the initiative that for the pilot phase, we could not place emphasis on the usability of the network or in measuring the impact of national research networking. This is because we were aware that institutions were highly concerned about privacy issues and competitive intelligence that could result from sharing data, and that in order to design a network that institutions would support, we needed to start with an architecture that would minimize those fears. Participating institutions understood that this would greatly limit the functionality of the pilot, but there was an agreement among everyone that once the hurdle of achieving interoperability was overcome, and a pilot network was built with a critical number of products and institutions, the second phase of Direct2Experts would then focus on the end users.
Against a backdrop of providing individual institutions with as much control as possible over the data they shared and the user experience, four features defined the project and pilot network: (1) inclusion at an early stage of the leads of the major products, including the primary commercial player; (2) a federated query architecture that did not require exporting data to a central repository; (3) very simple technical requirements for connecting to the network; and (4) an agreement between institutions on a non-controversial user interface.
The pilot was designed (both strategically and technically) by a group that included the key developers of several academically produced products, the informatics leads of a number of CTSAs, representatives from non-CTSA institutions, and the major commercial provider in this domain. The composition of this group resulted in rapid progress in pilot definition and highlighted the fact that the network was not limited to CTSA institutions or particular products. An agreement was crafted that emphasized that institutions only have to share the aggregate count of the number of people who match a search phrase and a URL back to that institution's research-networking website. This straightforward approach made it easy for institutions to commit to sharing data.
To perform a search, an investigator accesses his or her own institution's website and enters a phrase. This search is then broadcast to participating sites, and the user is presented with a list showing the number of matching people at each institution. The user can click any of the institution names to go to that institution's research-networking tool to view the details of the individual people who matched the search phrase. This gives institutions control over both how their own investigators access the network and how their investigators are presented to others. Key aspects of this architecture are that it can operate without a central database, search index, or website; a global ranking algorithm is not needed; institutions can define which populations to load into their databases, what a search ‘match’ is, and how to sort/rank people within their institution; and institutions may remove themselves from the network at any time.
We developed a simple XML-based technical standard for Direct2Experts. Each institution creates a ‘bootstrap’ XML file that indicates its preferred name and a URL, which, when called as a web service with a search phrase added to the end, returns a second ‘results’ XML file. The bootstrap file has the following structure:
The results XML file contains the aggregate count of the number of matching people; a brief description of the population included in that institution's research-networking system; a ‘preview’ URL that returns a small amount of information about the individuals who matched the query; and a search-result URL, which, when followed, shows the full list of matches. An example of a results XML file returned by a search for ‘informatics’ is:
The bootstrap file, Direct2Experts web service, and preview and search-results URLs all need to be publicly available because the network is designed to be open and accessible to anyone, including people from institutions who are not part of Direct2Experts. Once an institution is ready to join the federation, it shares the URL of its bootstrap file to others by emailing it to the CTSA Research Networking Group mailing list.
The Direct2Experts pilot network was officially launched in January 2011, with a dozen institutions representing four different products (table 1). However, within a month, this had expanded to a total of seven products, and the number of institutions had more than doubled to 28, bringing the total number of experts searchable through the network to more than 30 000 (table 2). Any institution was (and still is) welcome to join the network; and some, which had historically been reluctant to share information about their investigators, rapidly moved forward on implementing the Direct2Experts web service so that they would not be ‘left out’ from the network.
In creating the network, we discovered much variability in both the types of populations that institutions included in their research-networking systems and the user interfaces that institutions created to connect to Direct2Experts.9–13 We also created an information website about Direct2Experts (http://direct2experts.org), which includes its own search interface for the network (figure 1). Some institutions implemented the Direct2Experts web service but not a search interface, and the Direct2Experts website is an example of a stand-alone search interface without a web service. Institutions experimented with different types of content to return in their ‘preview’ URL, such as the Top 10 matches, a breakdown of the aggregate count by faculty rank, or the first page of their complete search results.
This pilot accomplished its goal of bringing a large number of universities together to share institutionally curated information about their researchers. Through the successful deployment of the pilot, and rapid increase in participation after launch, we validated the strategic approach of reducing barriers to entry (even at the cost of losing sophistication at the technical end). In the process of implementing the pilot, we found that with the right mix of stakeholders at the table, everyone was incentivized to participate. For example, by including the commercial partner early and comprehensively, they contributed the largest number of institutions to the network (14) and became strong advocates for participation.
In making the requirements to join the pilot network as simple as possible, we limited its functionality in several ways. From a usability perspective, we consider the simple aggregate count search interface just a first step. In the future, we plan to develop mechanisms to filter and view search results in a variety of more complex ways. In addition, we placed no restrictions on how individual sites implemented their Direct2Experts web services; and therefore, there is no assurance of consistency in search across institutions. They vary in the populations they cover, ranging from only research faculty to all research staff, clinicians, and trainees; the types of queries supported, with some limited to Medical Subject Headings terms, and others searching both biomedical concepts and person names; and the local privacy policies adopted, with some institutions automatically including all investigators, and others taking an opt-in or opt-out approach. Also, to simplify the technical architecture, we chose to define a new standard for the Direct2Experts web services, rather than leveraging existing federated systems that some of our institutions already use, such as caGrid or SHRINE.14 15
Because Direct2Experts just launched, and the pilot was not designed to optimize the end-user experience, it is too early to determine how frequently people will use the system, how often investigators will choose a national search over one that queries only their local institution, which user interfaces are most effective, and whether cross-institutional research-networking tools such as Direct2Experts will have an impact on research. In the initial weeks, though, after its launch, the pilot gained considerable attention, especially across the CTSA consortium, where the CTSA site PIs identified this project as one of major interest.
A key question during the next phase of Direct2Experts will be whether institutional adoption can continue to increase even with more sophisticated user interfaces, which will require sharing information on individual investigators, including potentially controversial data elements such as grant funding and other metrics that can be used to rank and filter lists of people. Will institutions falter in their commitments, or has the network reached a critical size where the benefits of participation are viewed as outweighing the risks? Such questions will drive the future strategy for Direct2Experts.
We recognize that much work lies ahead in the next phase of Direct2Experts, which will address governance, scalability, privacy, user experience, data quality, consistency across sites, and other issues. However, the pilot marks several significant milestones. First, the major research-networking platforms, both academic and commercial, used across the country agreed to a common federated architecture. Second, more than two dozen institutions agreed to share information about their investigators through a public website. Third, momentum was generated that will encourage additional institutions to participate and will facilitate discussions about the future of Direct2Experts.
We thank the members of the Direct2Experts Collaboration; the CTSA Research Networking Group; and the many teams of software developers, based at academic institutions across the country and at the commercial companies Elsevier and Recombinant Data Corp, who are responsible for implementing the federated network. The manuscript was approved by the CTSA Consortium Publications Committee.
Funding This project has been funded in whole or in part with federal funds from the National Center for Research Resources and National Institutes of Health, through the Clinical and Translational Science Awards Program, part of the Roadmap Initiative, Re-Engineering the Clinical Research Enterprise.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.