Significantly fewer protein functional changing variants for lipid metabolism in Africans than in Europeans

Background The disorders in metabolism of energy substances are usually related to some diseases, such as obesity, diabetes and cancer, etc. However, the genetic background for these disorders has not been well understood. In this study, we explored the genetic risk differences among human populations in metabolism (catabolism and biosynthesis) of energy substances, including lipids, carbohydrates and amino acids. Results Two genotype datasets (Hapmap and 1000 Genome) were used for this study. The genetic risks of protein functional changing variants (PFCVs) on genes involved in lipid, carbohydrate and amino acid metabolism were calculated using two genetic risk indices: the total number of PFCVs (Num) and the total possibly harmful score of PFCVs (R). Observations in these two genotype datasets consistently showed that Africans had lower genetic risk in lipid metabolism (both catabolic and biosynthetic processes) compared to Europeans. However this relationship was not observed in carbohydrate and amino acid metabolism. Conclusions Our results suggested that Africans had higher efficiency of utilizing lipids as energy substances than Europeans. In other words, lipids might be more preferred as energy substances in Africans than in Europeans.


Introduction
Many complex diseases are closely related to the disorders in energy substances metabolism. Among three main energy substances (carbohydrates, lipids, and amino acids), lipids are often majorly studied because observed abnormalities in their metabolism can induce complex diseases, such as obesity, diabetes, and cancer, etc. [1][2][3]. The genetic background of lipid metabolism might be at the core of understanding these complex diseases.
It is usually easier to understand genetic differences at population level than at individual level. It has been well reported that the prevalence of many complex diseases, such as obesity, coronary heart diseases, and hypertension, etc. are race related and the genetic risks associated with these diseases might be different among different populations [4][5][6][7][8]. For example, the prevalence of hyper-tension among African-Americans is 1.5-2-fold higher than European-Americans [9].
Many kinds of genetic robustness, such as duplicate genes and biologically complicated networks, exist in human genomes [5,[10][11][12][13], which explain why loss of functions of one or more genes have little phenotypic effect. However, the accumulation of protein functional changing variants (PFCVs) might reduce or even destroy this robustness [5,10,13]. The vast majority of complex diseases have many genes involved, to which each gene only contributes a very small amount of effect [14]. Therefore, more PFCVs on genes involved in energy metabolisms might lead to higher genetic risk developing complex diseases at population level [5].
In our previous study with Hapmap data [5], we reported that Africans had significantly fewer PFCVs in whole catabolic process than non-Africans. In this study, with the use of two genotype datasets (Hapmap and 1000 Genome), we further investigated the genetic risks (R and Num) in metabolic processes of three main energy substances (carbohydrates, lipids and amino acids) among human populations. Results showed that R and Num for lipids in both catabolic and biosynthetic processes were significantly smaller in Africans than in Europeans. However, for catabolic or biosynthetic process of carbohydrates and amino acids, R and Num were either larger or not smaller in Africans compared to Europeans. Based on these observations, we hypothesized that Africans had higher efficiency of utilizing lipids as energy substances than Europeans and proposed a study design for testing this hypothesis.  African = {ASW, LWK, MKK, YRI}, East Asian = {CHB,  CHD, JPT}, European = {CEU, TSI}, and GIH and MEX are two independent groups. The genotype data in some human groups (ASW, CEU, MXL, MKK and YRI) contained family trios data which could bias the results. Therefore the offspring for each trio of these groups were excluded from further analyses. Table 1 shows the final data set used for this study.

Genotype data preparation and genetic risk estimation
The preparation for the studied genotype data, the methods used for PFCVs selection, and the estimation of the genetic risks (R and Num) were described in details in our previous study [5]. Here we only focus briefly on the points specifically for this study. The missense mutations of genes that were involved in metabolism of carbohydrates, lipids and amino acids were mainly considered in this study. The harmful impacts of many missense mutations over the genes were estimated using Polyphen-2 [15] collected in dbNSFP [16]. For simplicity, the alleles with minor allele frequency (MAF) were called mutations throughout this article.
The genes involved in the metabolic processes of carbohydrates, lipids, and amino acids were downloaded from Gene Ontology (GO) (http://www.geneontology.org). The name, symbol, and chromosome location of these selected genes are shown in details in Additional file 1: Table S1-S6 of supporting information. The total number of genes for each GO term is shown in Table 2. PFCVs with missense mutations on these genes were downloaded from NCBI dbSNP database (ftp://ftp.ncbi.nih.gov/snp/). The harmful probability for each PFCV estimated by Polyphen-2 was downloaded from http://genetics.bwh.harvard.edu/pph2/ dbsearch.shtml. For a false positive rate of 20%, the true positive prediction rate in PolyPhen-2 trained on HumDiv dataset is 92%, so the HumDiv-trained score for each mutation is referenced [15].
Two indices, R and Num of PFCVs, were used to assess the genetic risk in metabolic processes of carbohydrates, lipids and amino acids. The methods used to calculate R and Num were described in details in our previous study [5].

Permutation test
In this study, we used permutation to reduce the background risk level when assessing the actual genetic risks. Total 18161 genes in human genome were used for the assessment and they were downloaded from http://www. geneontology.org. Two types of permutation tests were conducted in this study. 1. Gene-based permutation. Of these 18161 genes, the given number of genes (the number of genes involved in carbohydrate, lipid or amino acid metabolism (catabolism and biosynthesis), Table 2) were re-sampled randomly up to 2000 times, followed by the calculations of R and Num for these genes. 2. PFCV-based permutation. The number of harmful PFCVs on a given set of genes (the set of genes involved in the carbohydrate, lipid or amino acid metabolism, Table 2) were counted and recorded. The same number of PFCVs was re-sampled randomly from total PFCVs of 18161 human genes up to 2000 times, followed by the calculations of R and Num on these re-sampled PFCVs. Because the results from these two tests were very similar [5], in this article we only showed and discussed the results obtained by using the gene-based permutation test. The results of these permutation tests were used as an estimation to the background risk when analyzing data.
For example, the total number of genes involved in lipid catabolic process for Africans and Europeans was 218 (Table 2), therefore 218 genes were re-sampled randomly each time from 18161 human genes and R and Num were calculated on these genes as background risk level of R and Num for Africans and Europeans. At each round of re-sampling, we calculated the mean of R (or Num) and the mean difference R' (R' = mean R Africanmean R Europan ). Total 2000 of R' were obtained for each population for 2000 re-sampling processes and the distribution of R' was close to normal (Additional file 2: Figure S1-S12). The actual observed mean difference (R' lipid_catabolism ) for lipid catabolic process was also calculated. P-Value was approximately equal to the number of re-sampled with R'< R' lipid_catabolism divided by 2000.

Statistical methods
The unpaired two-tailed Student's test, the F test (ANOVA, Analysis of Variance) and permutation tests were performed to assess the genetic risks among three subpopulations (Africans, Europeans and Asians) and 11 human groups (ASW, LWK, MKK, YRI, CHB, CHD, JPT, CEU, TSI, GIH and MEX).

Overview of the studied data
In this study, two genotype datasets, Hapmap and 1000 Genome, were used for the analyses. A few characteris- tics existed between these two datasets. First, Hapmap had larger number of population groups (three subpopulations with 11 human groups) while 1000 Genome only had three subpopulations with four human groups. Second, sample sizes were similar between Hapmap (58 ≤ size ≤ 156) and 1000 Genome (59 for YRI and 60 for CEU and CHB and JPT). Third, 1000 Genome had the most PFCVs, including total number of PFCVs and total number of PFCVs with missense mutation (Table 1). These features indicated that some differences existed between Hapmap and 1000 Genome, regarding the number of human groups, the density of PFCVs, and the sample size. As a result, we explained the outcomes for each dataset separately although we used the same methods to analyze these data.

Hapmap
Our results showed that in carbohydrate metabolism, most of the background R (on genes randomly sampled from human genomes using permutation test) was bigger than the observed R (on genes involved in metabolism of energy substances) (Figures 1, 2 and 3). It was intriguing that the observed R in carbohydrate catabolic process was significantly bigger than the background R (P << 0.01, Figure 1A). This result suggested that carbohydrate catabolic process might specially harbor more genetic mutations than expected (average on background). We also implemented the comparison of the observed R in catabolic and biosynthetic processes of carbohydrate among subpopulations (African, European and Asian) and among 11 human groups. The results showed no significant differences among subpopulations (P = 0.1903, F test, Table 3) in carbohydrate catabolic process, suggesting that the genetic risk R in carbohydrate catabolic process might be very similar among Africans, Europeans and Asians. However, among 11 human groups in carbohydrate catabolic process, the observed R in GIH was significantly bigger than all other human groups (P < 0.01, t-test, Figure 1A). In carbohydrate biosynthetic process, the observed R was significantly different among subpopulations (P < 2.2 × 10 -16 , F test, Table 3), and the R for all African groups was significantly bigger than non-African groups. (P < 0.01, t-test, all pairs, Figure 1B), suggesting that the genetic risk R in carbohydrate biosynthetic process in Africans might be the largest among human groups. The observed R in lipid catabolic process was significantly smaller than the background R for each human group (P << 0.01, t-test, all pairs, Figure 2A), while the observed R in lipid biosynthetic process was also significantly smaller than the background R among most human groups with two exceptions (CEU and TSI) whose observed R were significantly bigger ( P <0.05, t-test, Figure 2B). In lipid catabolic process, the observed R in Africans was significantly smaller than all non-African groups (P < 0.05, t-test, all pairs, Figure 2A) with the fact that LWK of Africans held the smallest R (Figure 2A). In lipid biosynthetic process, the observed R in Africans was significantly smaller than Europeans (P < 0.01, Figure 2B). As far as the relationship of the observed R between Africans (ASW, LWK, MKK, YRI) and Asians (CHB, CHD, JPT), several different observations were obtained. First, the observed R in ASW and LWK was significantly smaller than CHB or smaller than CHD and JPT with no significance. Second, the observed R in YRI was significantly smaller than all Asian groups (CHB, CHD, and JPT). Third, MKK was the only group whose observed R was not significantly different from all Asian groups ( Figure 2B). These observations suggested that the genetic risk R for lipid biosynthetic process in Africans might be smaller than all European groups and most of Asian groups.
The observed R in amino acid metabolic processes was consistently smaller than the background R. Especially, the ratio of R (the observed to the background) in the amino acid biosynthetic process was the smallest among all studied metabolic processes (Figures 1, 2 and  3), which suggested that, compared to the metabolism of carbohydrates and lipids, the biosynthesis of amino acids were more conservative and might harbor fewer mutations. In amino acid catabolic process, the observed R in most African groups were significantly smaller than Europeans and Asians (P < 0.01) with the exception of MKK whose R was significantly bigger than others (P < 0.01, Figure 3A). In amino acid biosynthetic process, the observed R in Africans was significantly bigger than non-Africans (P < 0.05), especially R in LWK, MKK and YRI were much bigger than other human groups (P < 0.01, Figure 3B). The observed R of Asians in amino acid biosynthetic process was the smallest among all studied human groups.
Additionally, we also implemented the comparison of the genetic risks (R and Num) between males and   females for all human groups and results showed that there were no significant difference between males and females (P > 0.05, t-test, Table 3). The above results for observed R were obtained without using the adjustment for background risk level. However, even with the consideration of background risk level, we still got the similar results (Additional file 2: Figure S13-S15).
1000 Genome (low-coverage, pilot data) 1000 Genome provides more variants than Hapmap (Table 1). Therefore here we used 1000 Genome data to replicate the results obtained by using Hapmap data. Without considering background risk level, mean R' or mean Num' observed between Africans-Europeans or Africans-Asians (mean R Africanmean R European or mean Num Africanmean Num European ) was observed to be bigger than 0 in 1000 Genome (Additional file 2: Figure S4-S15). This observation was different from the results observed using Hapmap data (see above) which might be resulted from the difference of total PFCVs between Hapmap and 1000 Genome (Table 1). After adjusting the background risk level using permutation test, we observed that R and Num in both lipid catabolic and lipid biosynthetic processes were significantly smaller in Africans than in Europeans (P < 0.05, permutation test, Table 4, Additional file 2: Figure S5 and S8). However, for carbohydrate and amino acid metabolisms, the significant difference of R or Num between Africans and Europeans was not observed in either catabolic or biosynthetic process (P > 0.05, Table 4, Additional file 2: Figure S4-S15). We also observed that R and Num in lipid catabolic process were smaller in Africans than in Asians with no significance (P = 0.099 for R; P = 0.0125 for Num). but this relationship was not observed in lipid biosynthetic process between Africans and Asians (P = 0.797 for R; P = 0.865 for Num) ( Table 4, Additional file 2: Figure S11 and S14). The above observations suggested that the genetic risks (R or Num) in lipid catabolic and lipid biosynthetic processes were smaller in Africans than in Europeans, while in metabolisms of carbohydrates and amino acids, this relationship was not held.
We hypothesized that in order to respond to high-energy food environment, the smaller genetic risks among Africans in lipid metabolism might be translated to higher efficience of utilizing lipids as energy substances metabolism compared to Europeans.
This hypothesis could be tested with a simple experimental design. Triglycerides (TGs), the main energy substances in lipids, are usually biosynthesized in liver and are transported in blood as part of lipoprotein particles to the end of body for energy expenditure or energy storage [3,17]. With smaller genetic risks (R and Num) among Africans in lipid biosynthetic process, the efficiency of TGs biosynthesis in liver cells might be higher compared to Europeans [5,10]. Thus based on our hypothesis, the first clinical prediction should be that TGs level in arm arterial serum among Africans should be higher compared to Europeans . It was reported clinically that TGs level in arm venous serum was lower in for the test for the difference between African and European, which means the probability of the difference (R') between African and European in permutation test being smaller than the difference observed in a given metabolic process. * means P-Value<0.05.
Africans than in Europeans [18]. The difference of TGs between arterial and venous serum (TG arterial-venous ) was usually larger than 0 [19], which suggested that it might not be appropriate to infer TGs in arterial serum using TGs in venous serum. TG arterial-venous represented the net consumption of TGs in lipids expenditure and storage. Our second clinical prediction should be that TG arterial-venous would be higher in Africans than in Europeans. Therefore, the data of TGs in arterial and venous serum obtained at the same time can be used to examine the predictions of our hypothesis.

Discussion
It has been well known that background genetic risks are usually different among different human populations such that one population could have more genetic variants than others [5,13,20]. Our previous [5] and current study observed that Africans had more background genetic risk than Europeans. Some studyies reported that Africans had smaller proportion of homozygous mutations but bigger proportion of heterozygous compared to Europeans [20], suggesting that the excessive mutations in background for Africans might be shelved under the recessive model. Generally speaking, the difference of background genetic risk among populations might not result in possible racial difference for fitness or some other phenotypes. But the difference of genetic variant distribution in some body systems (for example lipid metabolism) might result in racial difference in some phenotypes (for example obesity, diabetes and cancer). To access racial difference of genetic variants in these systems among human populations, we had to remove the background noise through permutation test to make sure whether the racial differences of R or Num we observed in lipid metabolism were attributed to the background level or not. If our hypothesis (Africans on average having higher efficiency of utilizing lipids during energy expenditure and storage) hold true, then lipids might be the more preferred energy substances for Africans compared to Europeans. As lipids are higher energy biomolecules compared to carbohydrates and proteins (the energy level for carbohydrates or proteins is~4 calories per gram, while for lipids it is~9 calories per gram), the preference of lipids as energy substances during energy expenditure might result in more economic energy generation in Africans compared to non-Africans. Clinically, Hunter et al. (2000) reported that when doing the same activity, African-descendant women consumed lower volumes of metabolically active masses [21]. This observation corresponded with our hypothesis of the preference of lipids as energy substances in Africans. Higher efficiency in energy expenditure along with lower consumption of body masses might contribute jointly to higher prevalence of obesity in Africans in high-energy food environment [4]. Meanwhile, more preference of lipids in energy storage might provoke the accumulation of fats in body.
In this study, we observed significantly fewer PFCVs in lipid metabolism among Africans than Europeans. Because most of PFCVs (or missense mutations) are detrimental, fewer PFCVs on genes in lipid metabolism might increase the efficiency of utilizing lipids as energy substances in Africans. This observation might extensively explain the differences of blood lipid level and its related phenotypes (or diseases) among human groups [8,9]. Of course, a lot of further studies are needed to elucidate the relationship between the PFCVs on the genes and the phenotypes (or diseases) associated with lipid metabolism. Of these studies, the study design we proposed in this paper to test the hypothesis could be the first step towards that goal.

Additional files
Additional file 1: The list of genes in carbohydrate, lipid and amino acid metabolic (catabolic and biosynthetic) process.
Additional file 2: Distribution of R' [the difference of mean R between two populations] in permutation test and the observed and background genetic risk (Num) for Hapmap dataset in carbohydrate, lipid and amino acid metabolic (catabolic and biosynthetic) processes.