Structuring biochemical, genetic and genomic knowledgebase of Mtb
Decades of TB research has resulted in a wealth of information, which can be attributed to unstructured form of information, available as various journal articles in PubMed [21]. A simple search in PubMed with the key word `mycobacterium tuberculosis' results in numerous papers - the unstructured information, the rate of which rises exponentially every year since the last re-annotaiton of its genome was reported in 2002 [26]. The potential of this unstructured information available for Mtb remains untapped, which if utilized could substantially improve the functional annotation of its genome and quality of subsequent genome scale reconstruction of metabolism. The initial sequencing of Mtb genome a decade ago [27], and its subsequent annotation till 2002 [26] have been utilized to construct two comprehensive genome scale reconstructions of its metabolism [24],[28]. Although useful to a large extent these reconstructions primarily rely on the partial annotation (52%) of the Mtb genome available based on its most recent re-annotation statistics [26], thus leaving a gap in our understanding of metabolic physiology in Mtb. We filled this gap by crowdsourcing the genome annotation of Mtb by referring to ~27,000 published manuscripts that resulted in 87% of functional annotations as reported by us earlier [22].
The enhanced functional annotations of Mtb genome were utilized to reconstruct its metabolism at genome scale in a bottom up manner. In order to formalize the best capabilities and to fill the knowledge gap in the metabolism of Mtb we followed standard and well established protocol for its reconstruction (Methods) [25] and the resulting GENRE of Mtb was labeled as iOSDD890 as illustrated in (Figure 1). In contrast to prior comprehensive genome-scale reconstruction of Mtb (iNJ661) [24], the iOSDD890 contains 16% more metabolites, 12% more reactions and 34% more genes reflecting a significantly improved knowledgebase of the metabolism in Mtb (Figure 1A). A comparison of iOSDD890 with other genome scale reconstructions of Mtb GSMN-TB [28], iNJ661 [24] and iNJ661m [29] revealed a significant improvement in the metabolite, gene and reaction coverage of iOSDD890 (see Additional file 2: Figure S1).
Our understanding of the pathways involved in thiamine, valine, leucine and isoleucine biosynthesis was substantially expanded in terms of their gene coverage (Figure 1B). New pathways such as β-oxidation of fatty acids, β-oxidation of odd numbered acyl chain fatty acids, β-oxidation of unsaturated fatty acids were added along with a substantial number of genes (Figure 1C). β-oxidation of fatty acids is important in terms of lipid utilization. While much of the earlier work has been focused on lipid biosynthesis, lipid degradation through β-oxidation has been less explored. Since lipids are an important source of energy in hypoxic conditions, integrating these pathways into the model helps us understand the survival of Mtb within the hypoxic niches of granulomas. The other important pathway added to the model is the biosynthesis of the cofactor F420. The cofactor F420 is deazaflavin with no homology in humans but it is widespread across prokaryotes such as Mycobacteria. It has been suggested that the cofactor F420 plays an important role in shielding Mtb from NO2 stress, as induced by macrophage in aerobic conditions [30]. The detailed biochemical information, comprising the new genes, reactions, pathways and the literature cited included in iOSDD890, together with the supportive literature is provided in (see Additional file 1: Table S1 (A-C)) and the model (iOSDD890) is provided in (Additional file 3: Model-01).
Systems Biology Spindle Map
To comprehend the complex metabolic architecture one must understand its organization in terms of metabolites and genes that encode proteins with respect to the reactions involved in various pathways. Conventionally, metabolic pathways are represented in KEGG and other common formats based on flow chart diagrams [31]. While useful to a great extent, such representations seldom account for inherent structural connectivity among various metabolites, the genomic organization of genes (encoding for enzymes), reactions and pathways. For instance, the metabolic representation that can display the relation between various metabolites involved in various reactions with respect to the chromosome location of genes (encoding for respective enzymes) is absent. The conventional flow chart diagrams therefore limited our understanding of metabolic structure, in the context of its function. This lack of accountability in inherent metabolic topology by conventional representation also makes it formidable to construct efficient computational algorithms that can render complete metabolic information in a human perceivable manner.
To circumvent these challenges we developed a new method of metabolic visualization termed as Systems Biology Spindle Map (SBSM). As illustrated in (Figure 2A), the components of SBSM include metabolites, genes and reactions that are arrayed with their respective pathways parallel to each other. The metabolites are classified into exchange and intracellular metabolites. The exchange metabolites are linked to intracellular metabolites by exchange reactions, representing all possible ways in which a given metabolite can be imported or exported, into or from the system.
All the metabolic genes are placed according to their genomic order (chromosome position) with the first gene (top) being closer to the origin of replication as illustrated in (Figure 2A). The metabolites are then connected to respective genes based on the reactions in which they either act as reactant or product. The metabolites are sorted according to metGene, which represents the metabolite out-degree with respect to genes as illustrated in (Figure 2A). The metabolite with the highest metGene is placed at the centre, and each subsequent metabolite is placed alternatively on either side of the central metabolite, in descending order of metGene as illustrated in (Figure 2A). The reactions however, were first classified into their respective pathways and then arranged according to reaction-gene (rxnGene) connectivity (representing reaction out degree with respect to genes) similar to the placement of metabolites as illustrated in (Figure 2A). The resultant complete metabolic information in Mtb is summarized in (Figure 2B).
In addition, the degree distribution of the tripartite graph linking metabolites, genes and reactions was analyzed. Most of the metabolites were found to be highly connected to either one or two genes as illustrated in (Figure 2C) suggesting less redundancy in their utilization as either reactant or product. In contrast to currency metabolites, metabolites such as acetyl-CoA were found to be highly connected (100 genes) as illustrated in (Figure 2C). Metabolites such as octacosanoyl-CoA, hexacosanoyl-CoA, tetracosanoyl-CoA, docosanoyl-CoA etc. were found to be connected to a specific set of 67 genes, as illustrated in (Figure 2C), predominantly coding for the FAD operon involved in the β-oxidation of fatty acids. About 100 metabolites were found to be unconnected to any gene. The list of all the metabolites and their respective gene connectivity is provided in (Additional file 1: Table S2). The gene-reaction connectivity analysis revealed skewness towards single gene reactions, as illustrated in (Figure 2F), suggesting specificity in their function. A total of 57 genes were found to connect to a specific set of 27 reactions, as illustrated in (Figure 2F), involved in the β-oxidation of fatty acids. The complete list of genes and the reactions to which they are connected is provided in (see Additional file 1: Table S3). A total of 550 reactions were observed to be catalyzed by a single gene, and 320 were unconnected to any gene (Figure 2G). These orphan reactions along with their respective metabolites suggest a knowledge gap that remains to be filled by future reaction annotations.
Context dependent metabolism in Mtb
While the complete SBSM of metabolism in Mtb (Figure 2B) depicts an overall connectivity between various metabolites, genes and reactions, it also indicates that there might exist a functional redundancy in terms of its connectivity. For a given fitness function, such functional redundancy can be computed as a subset of the overall connectivity of SBSM in Mtb in a context dependent manner. To this end, we used an optimization approach (flux balance analysis) to quantify the optimal set of metabolic physiology in Mtb for a given fitness function (biomass) defined for specific media condition (Middlebrook) (see Methods) [32]. The in silico growth rate that was predicted on the basis of our simulations was found to be well in accordance with the growth rates reported in the literature, thereby validating the model [33],[34]. The quantified model of metabolism in Mtb was further used for all the analysis.
The reduced overall connectivity of SBSM in Mtb computed as an optimal set of metabolic information required for growth on Middlebrook media is illustrated in (Figure 2H), suggesting that Mtb uses only a subset of its metabolism to perform optimally. Interestingly, our model showed that only 441 (45%) metabolites (Figure 2C), 439 (38%) reactions (Figure 2G) and 400 (44%) (Figure 2F) genes were active in terms of their flux carrying capacity. On an average, the metabolites with high metGen were found to be active as illustrated in (Figure 2C). Most of the metabolites with a metGen degree of 2 were found to be predominantly active in contrast to other metabolites whose metGen ranges from 0 to 7 (Figure 2C). The metabolites belonging to pathways such as β-oxidation of fatty acids (connected to 67 genes of FAD operon) were mostly observed to be inactive in the optimized metabolic physiology of Mtb, further suggesting their importance for a condition dependent metabolism.
Interestingly, we observed a more exact power-law distribution (γ = 2.52) on metGen for active metabolites in contrast to power law distribution of overall metGen (γ = 2.02) of SBSM (Figure 2D-E), suggesting a modular pattern in the utilization of metabolites in the metabolism of Mtb akin to what has been observed for other bacterial species [35]. The improved value of γ suggests that Mtb uses its most connected metabolites to operate in an optimal manner. A similar type of pattern was observed in the utilization of genes and reactions, where genes and reaction with connectivity in the range of 0 to 4 were found to be mostly active (Figure 2F-G). The 400 active genes were then compared with the active transcriptome of Mtb, as recently reported [36]. A similarity of 91% (364 genes) was found between the active transcriptome (rpkm > 5) and predicted active genes in our model. Predominantly, genes belonging to operons such as sdhA-D, mmaA1-A4, sugA-C, atpA-F, murA-I, pyrA-H, cydA-D, aroA,B,D,E,G,F,K, ppsA-E and mur enzymes were found to be active. The complete list of model genes, active genes and the rpkm value of active gene transcript is provided in the (Additional file 1: Table S4).
Critical genes required for the growth of Mtb
Elucidating the most efficient metabolic architecture has given us a roadmap to identify new drug targets. It helps us identify the essential genes that are required to maintain the metabolic integrity of bacteria. A gene was regarded as essential if its knockout resulted into no biomass production. The genes were classified into three categories of lethality: (a) lethal genes: genes whose knockout resulted in no growth; (b) enzymatically low efficient genes (ELE): genes that result in more flux utilization than the alternate pathway to achieve the same predicted growth rate. Such genes therefore need to be overexpressed in order to achieve the desired growth rate; (c) metabolically low efficient genes (MLE): genes that reduce the growth if they are used in a pathway.
Lethal genes were computed based on single gene knockout of the genes involved in optimal metabolic physiology as illustrated in (Figure 2F). Both the ELE and MLE class of genes were identified based on the most efficient metabolic physiology as computed using the pFBA approach [37]. The pFBA method relies on selecting the fastest growing mutant with a minimal set of metabolic usage for a given objective function, which results in the elucidation of the most efficient metabolic physiology. The predicted essential genes were further assessed based on the experimental evidence for their essentiality from both in vitro and in vivo literature. Towards this we constructed a literature based evidence pool of gene essentiality by referring to a vast amount (~112) of different manuscripts reporting the essentiality of Mtb genes in vitro, in vivo, in macrophage and in mice models, in contrast to previous investigations [24],[28], wherein the comparison between computational predictions to that of their experimental validations has been primarily made using single mutagenesis based studies.
Knocking-out genes in-silico one at a time resulted into 116 lethal genes (Figure 3D, Additional file 1: Table S5). Of these, 86 (~74%) genes were also found to be essential based on in vitro, in vivo, in macrophage and in mice model evidence from the constructed literature pool. For enzymatically low efficient genes, a total of 48 of the 890 genes were found to be ELE essential (Figure 3E, Additional file 1: Table S6). Of these 48, 28 (58%) were reported to be essential in the in vitro, in vivo, in macrophage and in mice model based on the literature evidence pool. The genes that reduced the growth rate if they participate in metabolic pathways were then identified as metabolically low efficient genes (MLE). Of the 890 metabolic genes, 281 genes were found to be MLE essential (Figure 3F, Additional file 1: Table S7). Of these, 107 (38%) were also reported to be essential in vitro, in vivo, in macrophage and in mice model based on the literature evidence pool. For all the essential genes, a 58% overlap was observed between in-silico predictions and what has been reported in the literature evidence. However, the overlap was 74% between our in-silico predictions and the existing experimental results based on the in vitro, in vivo, in macrophage and in mice model, suggesting a significant improvement over previous research where the in silico predictions have been predominantly compared with only in vitro analysis.
The minimum inhibitory concentration (MIC) of an antibiotic is an important measure of its efficacy. All other things being equal, the lower the concentration of a target protein/enzyme for a non-competitive inhibitor or a tight binder (as most of the known antibacterial compounds are non-competitive inhibitors), the lower the antibiotic's MIC and hence more will be its efficacy. In this context, we assessed the intracellular levels of all the proteins/enzymes coded by the predicted essential genes based on the complete proteome of Mtb as reported recently [38], and illustrated in (Figure 3A,B,C). The mean concentration of the complete set of proteins of Mtb was computed, and used to assess the concentration of the proteins derived from essential genes. Most of these protein/enzymes had a lower concentration than the mean protein concentration. Of the total 116 single knockout lethal genes, 97 genes were expressed, resulting in enzymes of measurable concentration. 75 of these expressed genes resulted in protein/enzyme concentrations lower than the total mean concentration of the Mtb proteome. Similarly, the protein concentration and respective details of single gene knockout essential genes, ELE and MLE essential genes as illustrated in (Figure 3B-C) is provided in (see Additional file 1: Table S8).
Metabolic network structure and function
The topological location of metabolites, reactions and genes can be used to assess the efficiency of a metabolic structure. It has been suggested that metabolism operates in a modular manner [35]. Each module can be identified based on its function, such as genes involved in cofactor-membrane-methionine-mycolic acid and ubuquinone metabolism; fatty acids-glycolysis-citric acid cycle and sugar metabolism. Given the importance to Mtb of such specialized pathways as mycolic acid and lipid production, it seems important to analyze its modular topology, and identify its vulnerabilities. In this context we hypothesize that the transcription factors regulating genes that belong to different modules could be essential targets, as their inhibition would be more likely to result in lethal metabolic disruption.
We identified the modules in the metabolic network of Mtb. Towards this a reaction-reaction graph (directed) was first constructed. The graph was then partitioned into various modules by computing the topological overlap matrix that was quantified based on hierarchal clustering [35] (Methods). A total of 9 modules were identified as illustrated in (Figure 3H). The reactions of each module were mapped to their respective genes, resulting in sets of specific genes that were unique to each module (see Additional file 1: Table-S9A-I). The genes of each module were further mapped to their respective transcription factors, as illustrated in (Figure 3H). Among all the transcription factors phoP, Rv0348 and sigE were observed to be highly connected to genes belonging to different modules. phoP, which regulates a two-component system, is essential for maintaining virulence [39]. mosR (Rv0348) is a unique transcription factor that plays a key role in the maintenance of Mtb's hypoxic state by up-regulating the mce1 operon, which is responsible for mammalian cell entry and for regulating various genes involved in hypoxia and starvation [40].
sigE is one of the sigma factors in Mtb, which is thought to be responsible for the heat shock response. sigE mutants are more sensitive to heat shock and various oxidative stress conditions [41]. Our earlier interactome analysis suggests sigE as a potential drug target candidate [22]. This analysis confirms this finding and suggests that phoP and mosR are other valuable targets whose inhibition could severely disrupt Mtb metabolism. The list of all the transcription factors that were mapped to respective genes in various modules along with their pathways is provided in (Additional file 1: Table-S9A-I).
Metabolic basis of emergence in persister phenotype
The phenomenon in which an isogenic population of bacterial cells survives the impact of an antibiotic is defined as bacterial persistence [10]-[12]. While bacterial persistence has been predominantly investigated in Escherichia coli, its possible implications in Mtb are seldom addressed except in a recent investigation on Mycobacterium smegmatis[42]. The precise mechanism leading to persister formation remains elusive, however investigations suggest a predominant role of phenotypic heterogeneity [10],[43],[44], SOS response induced by DNA damage [45], Bet hedging [46] and stochasticity in cellular regulation [47],[48] as probable mechanism linked to bacterial persistence.
It is unclear if heterogeneity in the metabolism, that can be brought about by the complex connectivity among various metabolites, genes and reactions can facilitate formation of the persister. For a given metabolic structure, such as those represented by SBSM of Mtb, one can decipher the metabolic heterogeneity in terms of alternate metabolic sub-phenotypes that bacteria can adopt to survive in a context dependent manner under the influence of a given selection pressure. Such selected metabolic sub-phenotypes might become stable over a prolonged period of stress, which eventually may facilitate the emergence of persister phenotypes. Moreover, the intricate connectivity of metabolic interactions makes it likely that the selection of metabolic sub-phenotypes will be the preferred adaptive response, as it is more efficient than altering the genomic architecture through mutations. In this context, we therefore hypothesize that the directional re-routing of metabolic fluxes as a putative adaptive mechanism responsible for section of metabolic sub-phenotypes under antibiotic stress that may give rise to bacterial persistence in Mtb.
Consider a wild-type bacterium in an optimal metabolic steady state, as shown in (Figure 4B). As stress is introduced through antibiotics, the optimal metabolism can undergo changes in three ways such as a) loss of function (shutdown reactions that were active); b) function regulation (increase or decrease in the flux carrying capacity of active reactions); c) gain of function (activation of reactions that were dormant). The gain of function through the activation of reactions is responsible for the directional re-routing of metabolic fluxes, as illustrated in (Figure 4B). This results into alternative metabolic phenotypes. Over a prolonged period of antibiotic stress such phenotypes can eventually be selected, leading to the adaptation of the bacterium, which might demonstrate antibiotic tolerance.
We test this hypothesis by subjecting the metabolism of Mtb to sudden stress of known antibiotics such as Isoniazid (INH), Ethambutol (ETH), Rifampin (RIF) and the recently reported small molecule TCA1. Isoniazid is pro-drug which is activated by catalase-peroxidase hemoprotein, katG. It inhibits inhA, a nicotinamide adenine dinucleotide (NADH)-specific enoyl-acyl carrier protein (ACP) reductase involved in the fatty acid synthesis [49]. Ethambutol inhibits arabinosyltransferase which catalyzes the arabinogalactan and lipoarabinomannan polymer synthesis [49]. Arabinogalactan is a building block of the mucolyl-arabinogalactan-peptidoglycan layer that anchors it to the lipid-mycolic acid outer layer. Rifampin inhibits the β-subunit of the DNA-dependent RNA polymerase activity involved in transcription [49]. TCA1 is reported to inhibit DprE1, a component of decaprenyl-phosphoryl-β-D-ribofuranoseepimerase involved in cell wall synthesis [50]. The metabolic targets of INH (embB, katG, inhA and kasA) [51], the targets of ETH (embA, manB and rmlD) [51], the target of RIF (embB) [51] and the target of TCA1 (DprE1) [50] were identified from the literature.
In our analysis, each of these gene was knocked out individually and the flux carrying capacity of reactions were re-computed in order to identify the directional rerouting of metabolic fluxes (Methods). (Figure 4C) shows the metabolite and reaction connectivity of inhA in SBSM. The deletion of inhA resulted into a loss of flux carrying capacity of 328 reactions, which were subsequently mapped to 292 genes, (Figure 4D). However, a total of 57 reactions were found be involved in the directional re-routing of metabolic fluxes and were further mapped to their respective genes. A total of 60 enzyme-coding genes (hereafter referred as metabolic persister genes (MPGs) were identified that were responsible for catalyzing the reactions involved in the directional re-routing of metabolic fluxes (Figure 4E). Similarly all the genes (targets of antibiotics considered in this study) were knocked out one at a time and the resulting MPG's were deciphered based on directional re-routing of metabolic fluxes (Additional file 1: Table–S10 and Additional file 4: Text-01).
in vitro and in vivo assessment of MPGs based on experimental evidence
The MPGs were further analysed based on available experimental evidence from the literature both in vitro and in vivo. For isoniazid, the knock out of inhA as target resulted into 60 MPG's. The global gene expression of these 60 MPG's was further assessed based on a study reporting Mtb treatment with Isoniazid in a dormancy model attained through nutrient depletion and progressive hypoxia in vitro along with in vivo model of dormancy [52]. A total of 42 MPG's of the total 60 were observed to be significantly up-regulated when treated with Isoniazid at 1 μg/ml for 2 hours (Figure 4F). The expression of these genes however was noted to reduce as a function of INH exposure (6 hr treatment, (Figure 4G). The high concordance between predicted MPG's and their global up-regulation in vivo upon early exposure to Isoniazid suggest that the bacterium might use them as a resource to adapt its metabolism when challenged with the drug, resulting in an alternative metabolic phenotype. In addition, to probe the regulation of these MPGs, we mapped them to their respective transcription factors as illustrated in (Figure 4H). The transcription factors such as Rv0348 (mosR) and crp were observed to regulate most of the MPGs. Earlier research suggests that mosR controls the establishment of long term survival in the Mtb by inducing the mce1 operon [40]. Bacterial cyclic-AMP receptor proteins (CRP) are a specific class of transcription factors that are suggested to induce upon activation of cAMP in Mtb[53]. In response, CRP binds to the target promoter regulating the expression of repfA and whiB1, subsequently leading to the emergence of persistence in Mtb. We also observed sigF and sigE regulating the two-MPGs. A two-fold up-regulation of sigE and sigF transcription factor has been measured in an isogenic population of Mtb when challenged with Isoniazid [54]. Among other transcription factors, we observed oxyS (Figure 4H) regulating one of the MPGs involved in maintaining intracellular H2O2 levels.
Directional re-routing of metabolic fluxes as mechanism of adaptation
Upon removal of targeted genes a significant reduction in the flux carrying capacity of glycolysis and citric acid cycle along with the pentose phosphate pathway (not shown) was observed as an overall response of metabolism as illustrated in (Figure 5A). The flow of metabolic fluxes through mycolic acid biosynthesis (primary target of Isoniazid) and the pentose phosphate pathway appeared to be completely shutdown as shown in (Figure 5A). The reduction in the fluxes of glycolysis and citric acid cycle is distant from mycolic acid biosynthesis, which is the direct target of isoniazid. This clearly shows that the drug exposure triggers systems level changes. A drastic reduction in both growth and central metabolic activity has been suggested as a predominant mechanism of Mtb adaptation under antibiotic stress [55].
We observe that the directional re-routing of metabolic flux via reactions catalysed by the sdhA-D, nadA-E and nuoA-N operons is the predominant response to all four antibiotics considered in this study. The activation of the sdhA-D operon increases the production of fumarate from succinate in the citric acid cycle as illustrated in (Figure 5B). The fumarate then combines with l-Aspartate and initiates the de novo biosynthesis of NAD+ cofactors, as observed through the activation of the nadA-E operon. The nadA ~ E operon was also observed to be completely unregulated upon treatment with Isoniazid at 1 μg/ml for 2 hr and 6 hr of Isoniazid exposure in vivo (Figure 4I) [52]. The NAD+ cofactors produced are then reduced to NADH pool in a reaction catalysed by mez, converting malate into pyruvate. The activation of nuoA-N further suggests the oxidation of NADH to NAD+ by NDH-I dehydrogenase. Studies have characterized two class of NDH in Mtb. NDH-I is a type one oxidoreductase in which the redox activity is coupled with the proton translocation, whereas NDH-II, is a type two oxidoreductase that performs electron transport without proton pumping across membrane. Further, cytochrome bd oxidase aa3 - type cytochrome c oxidase (CcO) has been shown to actively participate in the electron transport chain of Mtb. Based on the directional re-routing of metabolic fluxes we propose the following probable mechanism of Mtb adaption.
As most of citric acid cycle and pentose phosphate pathway components undergo flux arrest, the activation of the nadA-E operon increase the intracellular production of NAD+
de novo (Figure 5B). The NAD+ is further reduced to NADH as observed by the activation of mez. The activation of the nuoA-N operon further suggests the oxidation of NADH through NDH-I dehydrogenase. The electron produced in this process is further transferred to either bd oxidase or CoC branch, and eventually to a terminal electron acceptor. The proton produced by NADH oxidation is however translocated across the cell membrane. This eventually might increases the potential difference across the cell membrane, subsequently leading to ATP production that is required for cell survival when it is under antibiotic stress (Figure 5C). We therefore suggest that de novo biosynthesis of the NAD+ cofactor is crucial for maintaining the viability of the cellular redox potential, which can eventually be harnessed for the production of ATP through the respiratory metabolism of Mtb. We support our finding with available experimental evidence from literature in vitro and in vivo[56],[57] demonstrating the importance of de novo biosynthesis in the growth, adaptation, survival and persistence of Mtb in conditions similar to that of human infection.
Spectrum of metabolic persister genes in Mtb
Further, a spectrum of MPG's in Mtb was formulated by considering all the metabolic genes as drug targets (Figure 6A). For every single gene knock out in the model, a total of 434 genes resulted into various MPG's based on directional re-routing of metabolic fluxes (Additional file 1: Table S11). For any given knockout the range of MPG's were observed to vary from 1 to 179, with median value of 36 (Figure 6A). A total of 211/434 genes were observed in the range below median value (Figure 6A, Additional file 1: Table S12). Predominantly these 211 genes belonged to nuoA ~ L operon, sugA ~ C operon, frdA ~ D operon atpA ~ H operon and proV ~ X operon. Off these operons, the sugA ~ C has been suggested to involved in carbohydrate transport to maintain Mtb virulence in the context of host-pathogen interaction [58], atpA ~ H codes of ATP-Synthase, which has been proposed as drug target earlier [59]. Except katG and dprE1, the MPG's for all the targets (inhA, kasA, embB, rmlD, embA and manB) of other antibiotics considered in this study were observed to be higher than median of all MPGs' (Figure 6A). The high value of MPG's for these known antibiotics suggest a diversity in the options for bacteria to circumvent the impact of these antibiotics and hence likely to become drug tolerant. Contrary to this, the knock-out of genes with MPG's below the median value can therefore be hypothesized as potential drug targets in combination with existing antibiotics as they are likely to get a limited support from the alternate metabolism and hence lower chances of developing drug tolerance. We support this hypothesis with the fact that the katG, dprE1 and atpE1 all having MPGs below the median value and are also approved drug targets for Mtb, which have cleared preclinical and phase-II trials successfully (www.tballiance.org).
Interestingly these drug targets also have lesser number of genes that were affected due to their knockout (Figure 6B). In the process of drug discovery it is essential to minimize the chances of drug toxicity, which can be achieved by minimizing the concentration of ligand and by maximizing its impact on the over all metabolic physiology. With this objective, we further mapped the intracellular abundance of proteins encoded by these 211 metabolic persister genes. A total of 112 off the 211 proteins were observed to have intracellular protein abundance lower than the average protein abundance of Mtb proteome [38], suggesting them further as potential drug targets for which ligands with minimum intracellular concentration could be designed.
The spectrum of metabolic persister genes in Mtb therefore for the first time suggest a new strategy that can be utilized to comprehend the impact of a given gene knockout by accounting for all possible mechanism through which such an impact could be buffered by the complexity of metabolism. We therefore propose these 211 genes as potential drug targets (of which one katG is known and dprE1 and atpE have clear preclinical and phase-II trials), which can accelerate the drug discovery for drug tolerant Mtb and potentiate translation of various therapeutic interventions for the same.
Potential non-toxic drug targets
Of the total 50 metabolic pathways, we observed essential genes in 42 pathways (Figure 6C). We have identified 33 of these 42 pathways that contain genes that are metabolically low efficient (MLE), 24 (out of 42) that are lethal, and 18 (out of 42) that are enzymatically low efficient (ELE) (Figure 6C). Interestingly most of the MPGs and MLE genes were observed in β-oxidation of fatty acids. This clearly suggests that Mtb utilized its β-oxidation pathway under stress condition despite harnessing its potential in wild type optimal metabolic physiology.
All the predicted essential genes were then compared with the human genome and human microbiome at the sequence level. The goal was to identify drug targets with the least likelihood of side effects. Of the total 116 lethal genes obtained from single gene knockout, 104 genes were found to be absent from the human genome, of which 48 were also found to have no homology in the human microbiome (Additional file 1: Table S5). For the 48 ELE essential genes, 32 were absent from the human genome, of which 26 had no homology to the human microbiome (Additional file 1: Table S6). For the 281 MLE essential genes, 207 were found to be absent from the human genome, of which 138 were also absent from the human microbiome (Additional file 1: Table S7). Few of the predicted targets with no sequence homology with human genome and microbiome are discussed below. The potential drug targets mostly belong to operons in Mtb.
The PPS system: the PPS system is made up of five genes. ppsA ~ E encodes for a type-I polyketide synthase. This operon codes for the protein responsible for the production of phthiocerols and phenolphthiocerols. Esterified by multimethyl-branched chain fatty acids, these complex lipids are composed of long chain β-diols, and are vital for maintaining the integrity of the cell wall [60]. In M. bovis BCG, ppsA-E has been shown to be responsible for the synthesis of phthiocerols and phenolphthiocerols through a mechanism involving the elongation of the C20-C22 fatty acyl chain containing a phenol moiety with three malonyl-CoA and two methylmalonyl CoA [61].
The PIM system: the PIM system is made up of three genes, pimA an α-mannosyltransferase, pimB, and pimE, a mannosyltransferase. Along with phosphatidylinositol (PI), the phosphatidylinositol mannosides (PIMs) are predominantly phospholipids synthesized metabolically in mycobacteria [62]. Anchoring two lipoglycans and lipomannan, the lipoarabinomannan is a crucial modulator of immune response during the course of a TB infection [63]. PIM synthesis begins with a mannose residue transfer from GDP-Man to the myo-inositol ring of PI resulting into the formation of phosphatidylinositol monomannosides subsequently resulting into the highly branched lipoglycans, lipomannan, and lipoarabinomannan with intermediate steps involving phosphate based sugar donors.
The KDP system: it codes for kdpA, kdpB and kdpC, and is a vital component of the host-pathogen interaction and for maintaining a K+gradiant across the Mtb cell membrane during the course of infection. As a mechanism of action, studies of the histidine kinase kdpD of E. coli suggests autophosphorylation and dephosphorylation of the response regulator kdpE, which in turn regulates the expression of the kdpFABC operon, acting a primary stress response to osmotic pressure by regulating the levels of K+. The change in osmotic pressure is sensed by the K+-transporting P-type ATPase enzyme, which subsequently results in the import and export of osmotic fluids such as K+[64].
The SDH system: it codes for sdhA-D, succinate dehydrogenase, which plays a vital role in the adaptation of M. smegmatis to restricted energy availability. The exact role of Sdh enzymes along with their presence at different locations in the cell membrane of mycobacterium species is unclear [65]. Menaquinone has been suggested as an electron acceptor for Sdh, which would indicate a thermodynamically stable conversion of fumarate to succinate [65].