Biomedical informatics and translational medicine
© Sarkar. 2010
Received: 21 July 2009
Accepted: 26 February 2010
Published: 26 February 2010
Skip to main content
© Sarkar. 2010
Received: 21 July 2009
Accepted: 26 February 2010
Published: 26 February 2010
Biomedical informatics involves a core set of methodologies that can provide a foundation for crossing the "translational barriers" associated with translational medicine. To this end, the fundamental aspects of biomedical informatics (e.g., bioinformatics, imaging informatics, clinical informatics, and public health informatics) may be essential in helping improve the ability to bring basic research findings to the bedside, evaluate the efficacy of interventions across communities, and enable the assessment of the eventual impact of translational medicine innovations on health policies. Here, a brief description is provided for a selection of key biomedical informatics topics (Decision Support, Natural Language Processing, Standards, Information Retrieval, and Electronic Health Records) and their relevance to translational medicine. Based on contributions and advancements in each of these topic areas, the article proposes that biomedical informatics practitioners ("biomedical informaticians") can be essential members of translational medicine teams.
public health informatics. These support the transfer and integration of knowledge across the major realms of translational medicine, from molecules to populations. A partnership between biomedical informatics and translational medicine promises the betterment of patient care[9, 10] through development of new and better understood interventions used effectively in clinics as well as development of more informed policies and clinical guidelines
Natural Language Processing
Electronic Health Records. For each topic, progress and activities in bio-, imaging, clinical and public health informatics are described. The article then concludes with a consideration of the role of biomedical informaticians in translational medicine teams
translational bioinformatics (which primarily consists of biomedical informatics methodologies aimed at crossing the T1 translational barrier)
clinical research informatics (which predominantly consists of biomedical informatics techniques from the T1 translational barrier across the T2 and T3 barriers). It is important to emphasize that the role of biomedical informatics in the context of translational medicine is not to necessarily create "new" informatics techniques. Instead, it is to apply and advance the rich cadre of biomedical informatics approaches within the context of the fundamental goal of translational medicine: facilitate the application of basic research discoveries towards the betterment of human health or treatment of disease
Clinical informatics has historically been described as a field that meets two related, but distinct needs: patient-centric and knowledge-centric. This notion can be generalized for all of biomedical informatics within the context of translational medicine to suggest that the goals are either to meet the needs of user-centric stakeholders (e.g., biologists, clinicians, epidemiologists, and health services researchers) or knowledge-centric stakeholders (e.g., researchers or practitioners at the bench, bedside, community, and population level). Bioinformatics approaches are needed to identify molecular and cellular regions that can be targeted with specific clinical interventions or studied to provide better insights to the molecular and cellular basis of disease[19–25]. Imaging informatics techniques are needed for the development and analysis of visualization approaches for understanding pathogenesis and identification of putative treatments from the molecular, cellular, tissue or organ level[26–29]. Clinical informatics innovations are needed to improve patient care through the availability and integration of relevant information at the point of care[30–35]. Finally, public health informatics solutions are required to meet population based needs, whether focused on the tracking of emergent infectious diseases[36–39], the development of resources to relate complex clinical topics to the general population[40–44] or the assessment of how the latest clinical interventions are impacting the overall health of a given population[45–47].
At the T1 translational barrier crossing, translational bioinformatics is rapidly evolving with the enhancement and specialization of existing bioinformatics techniques and biological databases to enable identification of specific bench-based insights. Similarly, clinical research informatics emphasizes the use of biomedical informatics approaches to enable the assessment and moving of basic science innovations from the T1 translational barrier and across the T2 and T3 translational barriers (as depicted in Figure 1). These approaches may involve the enhancement and specialization of existing and new clinical and public health informatics techniques within the context of implementation and controlled assessment of novel interventions, development of practice guidelines, and outcomes assessment.
Translational bioinformatics and clinical research informatics are built on foundational knowledge-centric (i.e., "hypothesis-driven") approaches that are designed to meet the myriad of research and information needs of basic science, clinical, and public health researchers. The future of biomedical informatics depends on the ability to leverage common frameworks that enable the translation of research hypotheses into practical and proven treatments . Progress has already been seen in the development of knowledge management infrastructures and standards to enable biomedical research to facilitate general research inquiry in specific domains (e.g., cancer and neuroimaging). It is also imperative for such advancements to be done in the context of improving user-centric needs, thereby improving patient care. To this end, the ability to manage and enable exploration of information associated with the biomedical research enterprise suggests that human medicine may be considered as the ultimate model organism . Towards this aspiration, biomedical informaticians are uniquely equipped to facilitate the necessary communication and translation of concepts between members of trans-disciplinary translational medicine teams.
knowledge acquisition - the gathering of relevant information from knowledge sources (e.g., research databases, textbooks, or experts)
explanation - describing the possible decisions and the decision making process
The leveraging of computational techniques to aide in decision-making has been well established in the clinical arena for more than forty years. In bioinformatics, a range of systems have been developed to support bench biologist decisions, including sequence similarity, ab initio gene discovery, and gene regulation. There has been discussion of decision support systems that can incorporate genetic information in the providing of clinical decision support recommendations [66, 67]. Decision support systems have been developed within imaging informatics for enabling better (both in terms of sensitivity and specificity) diagnoses of a range of diseases[68, 69]. Clinical informatics research has given consideration to both positive and negative aspects of computer facilitated decision support [70–78]. Recent attention to bioterrorism planning and syndromic surveillance has also given rise to public health informatics solutions that involve significant decision support[79–81].
Decision support systems in the context of translational medicine will require a new paradigm of trans-disciplinary inferencing approaches to cross each of the translational barriers. Inherent in the design of such decision support systems that span multiple disciplines will be the need for collaboration and cross-communication between key stakeholders at the bench, bedside, community, and population levels. To this end, there may be utility in decision support systems incorporating "Web 2.0" technologies, which enable Web-mediated communication between experts across disciplines. Such technologies have begun to emerge in scenarios where expertise and beneficiaries are inherently distributed, such as rare genetic diseases. Regardless of the approach chosen, the fundamental tasks of knowledge acquisition, representation, and inferencing and explanation will be required to be done with members of the translational medicine team. The successful design of translational medicine decision support systems could become an essential tool to bridge researchers and findings across biological, clinical, and public health data.
natural language understanding systems that extract information or knowledge from human language forms (either text or speech), often resulting in encoded and structured forms that can be incorporated into subsequent applications[84, 85]
natural language generation systems that generate human understandable language from machine representations (e.g., from within a knowledge bases or systems of logical rules). NLP has a strong relationship to the field of computational linguistics, which derives computational models for phenomena associated with natural language (encapsulated as either sets of handcrafted rules or statistically derived models)
The development and application of NLP approaches has been a significant focus of research across the entire spectrum of biomedical informatics. Biological knowledge extraction has also been a major area of focus in NLP systems[88, 89], including the use of NLP methods to facilitate the prediction of molecular pathways. Within imaging informatics, there has been a range of applications that involve processing and generating information associated with clinical images that are often used to help summarize and organize radiology images[91–94]. In clinical informatics, there have been great advances in the extraction of information from semi-structured or unstructured narratives associated with patient care , as well as the development of applications for generating summaries or reports automatically[96–98]. In the realm of public health, NLP approaches have been demonstrated to facilitate the encoding and summarization of significant information at the population level, such as for describing functional status and outbreak detection.
Peer-reviewed literature, such as indexed by MEDLINE, has been shown to be a source of previously unknown inferences across domains[101, 102] as well as linkages between the bioinformatics and clinical informatics communities. In addition to MEDLINE, which grows by approximately 1 million citations per year, the increasing adoption of Electronic Health Records will lead to increased volumes of natural language text. To this end, NLP approaches will increasingly be needed to wade through and systematically extract and summarize the growing volumes of textual data that will be generated across the entire translational spectrum. There has also been some work in NLP that directly strives to develop linkages across disparate text sources (e.g., bridging e-mail communications to relevant literature). Within the realm of translational medicine, NLP approaches will be increasingly poised to facilitate the development of linkages between unstructured and structured knowledge sources across the realms of biology, medicine, and public health.
The task of transmitting or linking data across multiple biomedical data sources is often difficult because of the multitude of different formats and systems that are available for storing data. Standard methods are thus needed for both representing and exchanging information across disparate data sources to link potentially related data across the spectrum of translational medicine - from laboratory data at the bench to patient charts at the bedside to linkage and availability of clinical data across a community to the development of aggregate statistics of populations. These standards need to accommodate the range of heterogeneous data storage systems that may be required for clinical or research purposes, while enabling the data to be accessible for subsequent linkage and retrieval. Standards are thus an essential element in the representation of data in a form that can be readily exchanged with other systems.
The development of standards to represent and exchange data has been a major area of emphasis in biomedical informatics since the 1980's[108–113]. Much energy has been placed in the development of knowledge representation constructs[109, 114, 115] (e.g., ontologies and controlled vocabularies), as well as establishment of standards for their use and incorporation in biological, clinical[117, 118], and public health contexts. For example, the voluminous data associated with gene expression arrays gave rise to the Minimum Information About Microarray Experiment (MIAME) standard by the bioinformatics community. Within the imaging informatics community, the Digital Imaging and COmmunications in Medicine (DICOM) defines the international standards for representing and exchanging data associated with medical images. Within the clinical realm, Health Level 7 (HL7) standards are commonplace for describing messages associated with a wide range of health care activities[122, 123]. Specific clinical terminologies, such as the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) can be used to represent, with appropriate considerations[124, 125], clinical information associated with patient care. Data standards have been developed for systematically organizing and sharing data associated with clinical research[112, 126], including those from HL7 and the Clinical Data Standards Interchange Consortium (CDISC). Within public health, the International Statistical Classification of Diseases and Related Health Problems (ICD) is a standard established by the World Health Organization (WHO) and used in the determination of morbidity and mortality statistics. The rapid emergence of regional health information exchange networks has also necessitated that a range of standards be used to ensure the interoperability of clinical data[128–133]. The Comité Européen de Normalisation in collaboration with the International Organization for Standardization (ISO) is coordinating the common representation and exchange standards across the clinical and public health realms (through ISO/TC 215).
The re-use of data in the development and testing of research hypotheses is a regular area of interest in biomedical informatics[126, 135]. However, disparities between coding schemes pose potential barriers in the ability for systematic representation across biomedical resources. Furthermore, the development of new representation structures is becoming increasingly easier, resulting in many possible contextual meanings for a given concept. The Unified Medical Language System (UMLS)  has demonstrated how it may be possible to develop conceptual linkages across terminologies that span the entire translational spectrum, from molecules to populations. Additional centralized resources have been developed that facilitate the development and dissemination of knowledge representation structures that may not necessarily be part of the UMLS (e.g., the National Center for Biomedical Ontology and its BioPortal).
Standards that have been developed and are implemented by the biomedical informatics community will be an essential component towards the goal of integrating relevant data across the translational barriers (e.g., to answer questions like what is the comparative effectiveness of a particular pharmacogenetic treatment versus conventional pharmaceutical treatments in the general population?). Additionally, standards can facilitate the access and integration of information associated with a particular individual in light of available biological, imaging, clinical, and public health data (including improved access to these data from within medical records), ultimately enabling the development and testing the utility of "personalized medicine." Consequently, translational medicine will depend on biomedical informatics approaches to leverage existing standards (e.g., MIAME, HL7, and DICOM) and resources like the UMLS, in addition to developing new standards for specialized domains (e.g., cancer and neuroimaging).
Information retrieval systems are designed for the organization and retrieval of relevant information from databases. The basic premise is that a query is presented to a system that then attempts to retrieve the most relevant items from within database(s) that satisfy the request. The quality of the results is then measured using statistics such as precision (the number of relevant results retrieved relative to the total number of retrieved results) and recall (the number of relevant results retrieved relative to the total number of relevant items in the database).
Across the field of biomedical informatics, various efforts have focused on the need to bring together information across a range of data sources to enable information retrieval queries[145, 146]. Perhaps the most popular information retrieval tool is the PubMed interface to the MEDLINE citation database that contains information across much of biomedicine. In addition to MEDLINE, the growth of publicly available resources has been especially remarkable in bioinformatics, which generally focus on the retrieval of relevant biological data (e.g., molecular sequences from GenBank given a nucleotide or protein sequence). Information retrieval systems have also been developed in bioinformatics that are able to retrieve relevant data from across multiple resources simultaneously (e.g., for generating putative annotations for unknown gene sequences). Imaging information retrieval systems have been a rich research area where relevant images are retrieved based on image similarity (e.g., to identify pathological images that might be related to a particular anatomical shape and related clinical context). Within clinical environments, information retrieval systems have been developed that can link users to relevant clinical reference resources based on using the particular clinical context as part of the query (e.g., to identify relevant articles based on a specific abnormal laboratory result)[152, 153]. Information retrieval systems have been developed in public health to identify relevant information for consumers, epidemiologists, and health service researchers given varying types of queries[47, 154, 155]. The procedural tasks involved with information retrieval often involve natural language processing and knowledge representation techniques, such as highlighted previously. The integration of natural language processing, knowledge representation, and information retrieval systems has led to the development of "question-answer" systems that have the potential to provide more user-friendly interfaces to information retrieval systems.
The need to identify relevant information from multiple heterogeneous data sources is inherent in translational medicine, especially in light of the exponential growth of data from a range of data sources across the spectrum of translational medicine. Within the context of translational medicine, information retrieval systems could be built on existing and emerging approaches from within the biomedical informatics community, including those that make use of contemporary "Semantic Web" technologies[157–159]. The ability to reliably and efficiently identify relevant information, such as demonstrated by archetypal information retrieval systems developed by the biomedical informatics community (e.g., GenBank and MEDLINE), will be crucial to identify requisite knowledge that will be necessary to cross each of the translational barriers.
Medical charts contain the sum of information associated with an individual's encounters with the health care system. In addition to data recorded by direct care providers (e.g., physicians and nurses), medical charts typically include data from ancillary services such as radiology, laboratory, and pharmacy. With the increasing electronic availability of data across the health care enterprise, paper-based medical charts have evolved to become computerized as Electronic Health Records (EHRs). EHRs can capture a variety of information (e.g., by clinicians at the bedside) and have electronic interfaces to individual services (e.g., administrative, laboratory, radiology, and pharmacy). Many EHRs can enable Computerized Provider Order Entry (CPOE), which allows clinicians to electronically order services and may also enable real-time clinical decision support (e.g., provide an alert about an order that could lead to an adverse event). Clinical documentation can be entered directly into EHR systems, allowing for potentially fewer issues due to transcription delays or difficulty in deciphering handwritten notes. An artifact of EHRs is the development of more robust clinical and research data warehouses, which can be used for subsequent studies[161–163].
From the earliest propositions of electronic health records[164, 165], it has been thought that the potential benefits to support and improve patient care would been immense. From a bioinformatics perspective, the integration of genomic information in EHRs may lead to genotype-to-phenotype correlation analyses[167, 168], and thus increase the importance of bioinformatics integration with laboratory and clinical information systems. The ability to review radiological images or search for possible clinically relevant features within them has shown great promise by the imaging informatics community[170–174]. Recent attention to EHRs has been given by the United States federal government as a core element of the modern reformation of health care. Empirical studies will be needed to demonstrate the actual implications on patient care and effects on the reduction in overall health care costs as a direct result of EHR implementation[176, 177]; however, there remains great interest in overall benefit of patient care and management to keep up with the dizzying pace of modern medicine within the clinical informatics community[176, 178, 179], including the development of integrated clinical decision support systems. Public health informatics initiatives have pioneered surveillance projects for outbreak detection[180, 181] or patient safety[182, 183] that involve EHRs (which are also noted for their potentially high costs of implementation). Recently, energy has also focused on the development of personal health records (PHRs) as a means to extend the realm of clinical care beyond the clinic into patient homes. Through PHRs, consumers can be directly involved with their care management plans and as easily used as other electronic services (e.g., ATMs for banking or using increasingly popular "Web 2.0" collaboration technologies). Like EHRs, there is still need to assess the true benefits of PHRs in terms of their actual impact on the improvement of patient care[188, 189]. The potential ubiquity of EHRs underscores the importance of considering the associated privacy and ethical issues (e.g., who has access to which kinds of data and for what purposes can clinical data actually be used for research or exchanged through regional interchanges)[189–193].
The increased availability of electronic health data, which are largely available and organized within EHRs, may have a significant impact on translational medicine. For example, the emergence of "personal health" projects (e.g., Google Health) and consumer services (e.g., 23andMe) has the potential to generate more genotype (i.e., "bench") and phenotype (i.e., "bedside") data that may be analyzed relative to community-based studies. The raw elements that could lead to the next breakthroughs may be made available as part of the data deluge associated with consumer-driven, "grass-roots" efforts. Such initiatives, in addition to the other core biomedical informatics topics discussed here (decision support, natural language processing, and information retrieval techniques), will enable the leveraging of EHR-based health data to catalyze the crossing of the translational barriers.
Translational medicine is a trans-disciplinary endeavor that aims to accelerate the process of bringing innovations into practice through the linking of practitioners and researchers across the spectrum of biomedicine. As evidenced by major funding initiatives (e.g., the United States National Institutes of Health "Roadmap"[194, 195]), there is great hope in the development of a new paradigm of research that catalyzes the process from bench to practice. The trans-disciplinary nature of the translational barrier crossings in translational medicine endeavors will increasingly necessitate biomedical informatics approaches to manage, organize, and integrate heterogeneous data to inform decisions from bench to bedside to community to policy.
The distinctions between multi-disciplinary, inter-disciplinary, and trans-disciplinary goals have been described as the difference between additive, interactive, and holistic approaches[196–198]. Unlike multi-disciplinary or inter-disciplinary endeavors, trans-disciplinary initiatives must be completely convergent towards the development of completely new research paradigms. The greatest challenge faced by translational medicine, therefore, is the difficulty in truly being a trans-disciplinary science that brings together researchers and practitioners that traditionally work within their own "silos" of practice.
The success of translational medicine will depend not only on the addition of biomedical informaticians to translational medicine teams, but also on the acceptance and understanding of what biomedical informatics consists of by other members in the team. To this end, the importance of biomedical informatics training has been underscored as a key area of required competency across the spectrum of translational medicine, from biologists to clinicians to public health professionals. There has been some demonstrable success in the development of experiences that focus on training "agents of change" with necessary core concepts as well as hallmark distributed educational programs that aim to provide formal educational opportunities for biomedical informatics training. The composition of translational medicine teams will also depend on the appropriate intermixing of biomedical informatics expertise to complement the requisite domain expertise. To this end, the success of translational medicine endeavors may undoubtedly be greatly enhanced with biomedical informatics approaches; however, the appropriate synergistic relationship between biomedical informaticians and other members of the translational medicine team remains one of the next major challenges to be addressed in pursuit of translational medicine breakthroughs.
Since its beginnings, biomedical informatics innovations have been developed to support the needs of various stakeholders including biologists, clinicians/clinical researchers, epidemiologists, and health services researchers. A range of biomedical informatics topics, such as those described in this paper, form a suite of elements that can transform data across the translational medicine spectrum. The inclusion of biomedical informaticians in the translational medicine team may thus help enable a trans-disciplinary paradigm shift towards the development of the next generation of groundbreaking therapies and interventions.
The author thanks members of the Center for Clinical and Translational Science at the University of Vermont, especially Drs. Richard A. Galbraith and Elizabeth S. Chen, for valuable insights and discussion that contributed to the thoughts presented here. Gratitude is also expressed from the author to the anonymous reviewers who provided in-depth suggestions towards the improvement of the overall manuscript. The author is supported by grants from the National Library of Medicine (R01 LM009725) and the National Science Foundation (IIS 0241229).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.