Identifying translational science within the triangle of biomedicine
© Weber; licensee BioMed Central Ltd. 2013
Received: 7 April 2013
Accepted: 15 May 2013
Published: 24 May 2013
Skip to main content
© Weber; licensee BioMed Central Ltd. 2013
Received: 7 April 2013
Accepted: 15 May 2013
Published: 24 May 2013
The National Institutes of Health (NIH) Roadmap places special emphasis on “bench-to-bedside” research, or the “translation” of basic science research into practical clinical applications. The Clinical and Translational Science Awards (CTSA) Consortium is one example of the large investments being made to develop a national infrastructure to support translational science, which involves reducing regulatory burdens, launching new educational initiatives, and forming partnerships between academia and industry. However, while numerous definitions have been suggested for translational science, including the qualitative T1-T4 classification, a consensus has not yet been reached. This makes it challenging to tract the impact of these major policy changes.
In this study, we use a bibliometric approach to map PubMed articles onto a graph, called the Triangle of Biomedicine. The corners of the triangle represent research related to animals, cells and molecules, and humans; and, the position of a publication on the graph is based on its topics, as determined by its Medical Subject Headings (MeSH). We define translation as movement of a collection of articles, or the articles that cite those articles, towards the human corner.
The Triangle of Biomedicine provides a quantitative way of determining if an individual scientist, research organization, funding agency, or scientific field is producing results that are relevant to clinical medicine. We validate our technique using examples that have been previously described in the literature and by comparing it to prior methods of measuring translational science.
The Triangle of Biomedicine is a novel way to identify translational science and track changes over time. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. The Triangle of Biomedicine also provides a simple visual way of depicting this impact, which can be far more powerful than numbers alone.
In biomedicine, translational science is research that has gone from “bench” to “bedside”, resulting in applications such as drug discovery that can benefit human health [1–6]. However, this is an imprecise description; and, while numerous definitions have been suggested, including the qualitative T1-T4 classification , a consensus has not yet been reached. Several bibliometric techniques have been developed to quantitatively place publications along the translational spectrum. Narin assigned journals to fields, and then grouped these fields into either “Basic Research” or “Clinical Medicine” [8–10]. Narin also developed another classification called research levels, in which journals are assigned to “Clinical Observation” (Level 1), “Clinical Mix” (Level 2), “Clinical Investigation” (Level 3), or “Basic Research” (Level 4) . He combines Levels 1 and 2 into “Clinical Medicine” and Levels 3 and 4 to “Biomedical Research”. Lewison showed that the research level of individual articles can be determined from keywords within the articles’ titles and addresses, and he defines the average research level of a collection of articles as the mean of the research levels of those articles [11–13].
In this study, we analyze the 20 million publications in the National Library of Medicine’s PubMed database by extending these bibliometric approaches in three ways: (1) We divide basic science into two subcategories, research done on animals or other complex organisms and research done on the cellular or molecular level. We believe it is important to make this distinction due to the rapid increase in “-omics” research and related fields in recent years. (2) We classify articles using their Medical Subject Headings (MeSH), which are assigned based on the content of the articles. Journal fields, title keywords, and addresses only approximate an article’s content. (3) We map the classification scheme onto a graphical diagram, which we call the Triangle of Biomedicine, which makes it possible to visualize patterns and identify trends over time.
Using a simple algorithm based on an article’s MeSH descriptors, we determined whether each article in PubMed contained research related to three broad topic areas—animals and other complex organisms (A), cells and molecules (C), or humans (H). An article can have more than one topic area. Articles about both animals and cells are classified as AC, articles about both animals and humans are AH, articles about cells and humans are CH, and articles about all three are ACH. Articles that have none of these topic areas are unclassified by this method.
To determine an article’s topics, we took advantage of the fact that MeSH is organized as a hierarchical tree, and the three topic areas correspond to particular MeSH nodes and their subtrees. H is mapped to all MeSH codes under the subtrees B01.050.150.900.649.801.400.112.400.400 (Human) and M01 (Person); A is mapped to all codes under the subtree B01 (Eukaryota) except the code for Humans; and C is mapped to the subtrees A11 (Cells), B02 (Archaea), B03 (Bacteria), B04 (Viruses), G02.111.570 (Molecular Structures), and G02.149 (Chemical Processes). These mappings are not perfect. A much more complicated MeSH-based classification technique could have been developed; however, keeping the definition of the three areas simple did not seem to limit our analysis, and it made the results easier to interpret.
Several groups have created “maps of science” to visually depict the structure of literature by showing the relationships among different fields of science [14–20]. In these maps, a reference system is defined, over which data about publications and citations are placed. A reference system can be chosen specifically to highlight certain attributes of the data, such as emerging areas of innovation or interdisciplinary research.
In order to identify translational research, we constructed a trilinear graph , where the three topic areas are placed at the corners of an equilateral triangle, with A on the lower-left, C on the top, and H on the lower-right. The midpoints of the edges correspond to AC, AH, and CH articles, and the center of the triangle corresponds to ACH articles.
An article can be plotted on the Triangle of Biomedicine according to the MeSH descriptors that have been assigned to it. For example, if only human descriptors, and no animal or cell descriptors have been assigned to an article, then it is classified as an H article and placed at the H corner. An article with both animal and cell descriptors, and no human descriptors, is classified as an AC article and placed at the AC point. A collection of articles is represented by the average position of its articles. Although an individual article can only be mapped to one of seven points, a collection of articles can be plotted anywhere in the triangle.
An imaginary line, the Translational Axis, can be drawn from the AC point to the H corner. The position of one or more articles when projected onto this axis is the Translational Index (TI). By distorting the Triangle of Biomedicine by bringing the A and C corners together at the AC point, the entire triangle can be collapsed down along the Translational Axis to the more traditional depiction of translational science being a linear path from basic to clinical research. In other words, the Triangle of Biomedicine does not replace the traditional linear view, but rather provides additional clarity into the path research takes towards translation.
The Triangle of Biomedicine is drawn as an equilateral triangle, whose corners correspond to A, C, and H topic areas. On a Cartesian system, each corner is a distance of 1 from the origin, with the A corner at (x,y) = (-sqrt(3)/2,-0.5), the C corner at (0,1), and the H corner at (sqrt(3)/2,-0.5). The AC, AH, and CH points are midway along the edges of the triangle, and the ACH point is located at the origin at (0,0). The Translational Axis is a line from the AC point, through the origin, to the H corner. The position of a point projected onto the Translational Axis is its Translational Index (TI). For example, the A, AC, or C points have TI = -0.5; the AHC point has TI = 0; the AH and CH points have TI = 0.25; and the H point has TI = 1. A collection of articles with mostly human studies that includes a small amount of basic science research will be close to the H corner, but not directly on it, and it will have a TI slightly less than 1.
Our datasets are 1) a snapshot of 20,032,189 PubMed articles and their associated MeSH descriptors and citations from December 24, 2010 (http://pubmed.org); 2) broad journal headings from the NLM Catalog database (http://www.ncbi.nlm.nih.gov/nlmcatalog); and 3) degrees and publications of 12,729 Harvard Medical School faculty taken from the Harvard Catalyst Profiles website (http://connects.catalyst.harvard.edu/profiles) in December, 2010. Each of these sources is publicly available.
Although we are using all PubMed articles for this study, PubMed derives its citation data (one article citing another) from PubMed Central (PMC), which represents only a subset of PubMed articles. As a result, the citation counts in PubMed are underestimates of the total number of times that articles have actually been cited. We therefore define a “corrected citation count” for an article by dividing each citation by the percentage of publications of the citing article’s type that are in PMC. For example, since 4.9% of H articles and 17.1% of C articles are in PMC, if an article has been cited in PMC by one H article and two C articles, its corrected citation count is 1/0.049 + 2/0.171 = 32.1. The assumption is that for articles of a given type, the ones in PMC cite articles the same way as the ones that are not in PMC.
Other citation databases exist, such as Thomson Reuters’ Web of Science (WoS), Elsevier’s Scopus, and Google Scholar. While there are large overlaps among these databases, there are also significant differences, which means that none of them are complete, and there will be biases regardless of which database is used . We chose PMC because it is the only one that is freely available to download in its entirety, and it is linked to PubMed and MeSH.
Summary of categories
Number of articles
Percent of PubMed
Number of authors
Percent in PMC
Percent cited in PMC
Mean PMC citations
Corrected PMC citations
Harvard corrected citations
Harvard WoS citations
Translational fraction (TF)
Translational distance (TD)
Translational years (TY)
Translational closeness (TC)
The National Library of Medicine (NLM) classifies journals into different disciplines, such as microbiology, pharmacology, or neurology, with the use of Broad Journal Headings. We used Narin’s mappings to group these disciplines into basic research or clinical medicine. Individual articles were given a “basic research” score of 1 if they were in a basic research journal and 0 if they were in a “clinical medicine” journal. For each A-C-H category, a weighted average of its articles’ scores was calculated, with the weights being the inverse of the total number of basic research (4,316,495) and clinical medicine (11,689,341) articles in PubMed. That gives a numeric value for the fraction of articles within a category that are basic research, which is corrected for the fact that PubMed as a whole has a greater number of clinical medicine articles.
For each of his four research levels, Narin selected a prototype journal to conduct his analyses: The Journal of the American Medical Association (JAMA, Level 1), The New England Journal of Medicine (NEJM, Level 2), The Journal of Clinical Investigation (JCI, Level 3), and The Journal of Biological Chemistry (JBC, Level 4). Each is widely considered a leading journal and has over 25,000 articles spanning more than 50 years. For each A-C-H category, we determined the number of articles from each of these four journals and calculated a weighted average of their research levels, with the weights being the inverse of the total number of articles each journal has in PubMed.
Table 1a lists the number of articles that map to each A-C-H category. The largest category is H, representing 43.3% of the articles in PubMed. About 19% of articles do not fit into any category, and therefore cannot be classified by this method. Many of these articles are in areas such as history of medicine and social science, and a third of them simply have no MeSH descriptors assigned to them yet.
If we give articles that Narin would classify as “basic research” a score of 1 and “clinical medicine” a score of 0, then H articles have a basic research score of 0.125, meaning they are mostly in clinical journals, while A and C articles have scores of 0.634 and 0.911, respectively, meaning they are mostly in basic research journals (Table 1a). AH, CH, ACH, and AC articles contain progressively more basic research, in that order.
Using Narin’s research levels, H articles have a score of 1.59, which is between his two Clinical Medicine levels, Clinical Observation (Level 1) and Clinical Mix (Level 2). A and C articles have research levels of 3.15 and 3.78, respectively, which are between the two Biomedical Research levels, Clinical Investigation (Level 3) and Basic Research (Level 4). AH and CH articles fall in the middle, between Levels 2 and 3; and, ACH and AC are both Biomedical Research.
The blue squares connected to each discipline indicate the average position publications that cite articles in that discipline. The angle and length of the connecting lines indicate the average direction and speed of knowledge flow. For example, articles that cite hematology studies include more animal research than the field of hematology itself, and publications that cite pharmacology or epidemiology studies include more human research. Thus, the position of a circle indicates the A-C-H composition of the research in a discipline, while the square suggests the direction of that discipline’s impact.
Articles with a MeSH descriptor “Adipose Tissue, Brown” (brown fat) were mostly focused on animal research until the late 1990s, when a number of related proteins were discovered and it was subsequently found that brown fat also exists in adult humans.
An almost immediate change in the focus of articles with a MeSH descriptor “Influenza A Virus, H1N1 Subtype” (swine flu) occurred with the 2009 pandemic of the virus in humans.
Articles with MeSH descriptors “Cloning, Organism” and “Genes, rRNA” have both moved in the direction of animal research in recent years.
The position of articles with a MeSH descriptor “Benzazepines” has swung in two directions with a surge in clinical trials during the early 1980s and again in the past five years, with a period of mostly animal research in the middle.
Not surprisingly, articles flagged in PubMed as Phase I clinical trials are near the H corner, and Phase II, III, and IV trials are progressively closer.
The publications that cite NIH R01 grant numbers cover a wide range of topics, and therefore are near the ACH point, though there has been movement towards the CH point over time.
PubMed as a whole has changed relatively little in the past 30 years, with a large percentage of its articles consistently in the H category.
Narin showed that articles in one research level primarily cite other articles in the same research level. Less common were citations in adjacent research levels, and only rarely did an article cite another that is two or three levels away . In other words, the flow of knowledge typically did not go from basic research directly to clinical observation. Rather, it slowly passed through each of the research levels along the path towards translation.
The amount of time from a basic science discovery to a clinical intervention, the “translation lag”, can be many years [23, 24]. Contopoulos-Ioannidis found that only 25% of high impact basic science articles that had clear therapeutic or preventative potential actually resulted in a clinical trial after 20 years; and, when translation occurs, it takes a median of 24 years from the initial basic science discovery until the first highly cited human study [25, 26].
Translational Fraction (TF) is the fraction of articles in a category that after some number of citation generations eventually reaches an H article.
Translational Distance (TD) of an article is the minimum number of citation generations needed to reach an H article. The TD of a collection of articles is the mean TD of the individual articles.
Translational Years (TY) of an article is the number of years it took to reach an H article. The TY of a collection of articles is the mean TY of the individual articles.
Translational Closeness (TC) is the average of the inverse of the TD of each article, including articles that have not yet translated (which have an inverse TD of zero). If TC = 1, then all articles are cited by H articles in exactly one generation; if TC = 0, then no articles have translated.
Table 1c lists the TF, TD, TY, and TC of each category. A, C, and AC articles take more citation generations and more time to reach H articles than AH, CH, and ACH articles, which is consistent with the results from the previous section. C articles require the most generations (TD = 3.08), though A articles require the longest time (TY = 10.40 years). Even H articles take 5.69 years, on average, before being cited by another H article.
The colors in Figure 1 indicate the average TD of different disciplines. There is a clear relationship between the TD and the position along the Translational Axis. For example, Nursing consists of mostly H articles (TI = 0.98), and nearly every article, when cited, is cited by an H article (TD = 1.09). Allergy and Immunology is near the ACH point (TI = 0.02) and requires an additional citation generation to reach an H article (TD = 2.29). Botany is furthest from the H point (TI = -0.46) and requires almost four citation generations (TD = 3.88).
Physician-scientists are essential in bridging the gap between basic research and clinical medicine and reducing the time to translation [24, 27, 28]. Zemlo notes that investigators with combined MD-PhD degrees play a particularly pivotal role in translation--they represent just 2.5% of medical school graduates each year, but have a third of the NIH grants going to physician-scientists .
Although the Triangle of Biomedicine is not meant to replace the traditional qualitative definitions of T1-T4 translational research , it provides a quantitative technique to measure translation and to determine how long it takes. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. The Triangle of Biomedicine also provides a simple visual way of depicting this impact, which can be far more powerful than numbers alone.
As with other bibliometric techniques, it is important not to overgeneralize metrics. The position of a broad discipline on the Triangle of Biomedicine simply represents the average of thousands of publications. Predicting the potential impact of a specific research area or an individual article or scientist requires far more information; though, this comes with its own limitations. For example, a multidimensional scoring system has been developed to assess the “translatability” of drug development projects [29, 30]. This may indeed be a superior method, but it requires manual review of the literature and therefore might not be scalable. Fontelo identified 59 words and phrases, which when present in the titles or abstracts of articles, suggest that the article is translational . However, that is an all-or-nothing approach, which does not take into account the full spectrum from basic research to clinical medicine.
This work is limited in several ways. It takes at least a year for most articles to be assigned MeSH descriptors. During that time the articles cannot be classified using the method described in this paper. Also, our classification method is based on a somewhat arbitrary set of MeSH descriptors—different descriptors could have been used to map articles to A-C-H categories. However, the ones we used seemed intuitive and they produced results that were consistent with Narin’s classification schemes. Finally, any metric based on citation analysis is dependent on the particular citation database used, and there are significant differences among the leading databases . In this study, we used citations in PubMed that are derived from PubMed Central because they are freely available in their entirety, and therefore our method can be used without subscriptions to commercial citation databases, such as Scopus and Web of Science, which are cost-prohibitive to most people. However, because these commercial databases have a greater number of citations and index different journals than PubMed, they might show shorter or alternative paths towards translation (i.e., fewer citation generations or less time). Though, as described in our Methods, there is evidence that suggests these differences might be relatively small. Selecting the best citation database for identifying translational research is a topic for future research.
Another area of future research could attempt to identify a subset of H articles that truly reflect changes in health practice and create a separate category P for these articles. This might be possible, for example, by using Khoury’s approach of using PubMed’s “publication type” categorization of each article to select for those that are clinical trials or practice guidelines . This could be visualized in the Triangle of Biomedicine by moving H articles to the center of the triangle and placing P articles in the lower-right corner, thereby highlighting research that has translated beyond H into health practice.
The Triangle of Biomedicine is a novel way to identify translational science and track changes over time. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. As with any metric, its limitations and potential biases should always be kept in mind. As a result, it should be used to supplement rather than replace alternative methods of measuring or defining translational science. What is unique, though, to the Triangle of Biomedicine, is its simple visual way of depicting translation, which can be far more powerful to policy makers than numbers alone.
The author thanks Dr. Katy Börner for information about maps of science and Dr. Garret Fitzgerald for suggestions on how to frame the paper. This work was conducted with support from NSF Science of Science and Innovation Policy (SciSIP) Award #1238469 and Harvard Catalyst | The Harvard Clinical and Translational Science Center (NIH Award #UL1 RR 025758 and financial contributions from Harvard University and its affiliated academic health care centers).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.