- Open Access
Building a model for predicting metabolic syndrome using artificial intelligence based on an investigation of whole-genome sequencing
Journal of Translational Medicine volume 20, Article number: 190 (2022)
The circadian system is responsible for regulating various physiological activities and behaviors and has been gaining recognition. The circadian rhythm is adjusted in a 24-h cycle and has transcriptional–translational feedback loops. When the circadian rhythm is interrupted, affecting the expression of circadian genes, the phenotypes of diseases could amplify. For example, the importance of maintaining the internal temporal homeostasis conferred by the circadian system is revealed as mutations in genes coding for core components of the clock result in diseases. This study will investigate the association between circadian genes and metabolic syndromes in a Taiwanese population.
We performed analysis using whole-genome sequencing, read vcf files and set target circadian genes to determine if there were variants on target genes. In this study, we have investigated genetic contribution of circadian-related diseases using population-based next generation whole genome sequencing. We also used significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. In addition, we used random forest model variables importance matrix to select 40 more significant SNPs, which were subsequently incorporated to create new prediction models and to compare with previous models. The data was then utilized for training set and testing set using five-fold cross validation. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).
After searching significant variants, we used Chi-Square tests to find some variants. We found 186 significant SNPs, and four predicting models which used 186 SNPs (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.81 respectively. The F1 scores were 0.412, 0.078, 0.295, 0.552, respectively. The other three models which used the 40 SNPs (logistic regression, adaboost and neural network), AUC were 0.82, 0.81, 0.81 respectively. The F1 scores were 0.584, 0.395, 0.574, respectively.
Circadian gene defect may also contribute to metabolic syndrome. Our study found several related genes and building a simple model to predict metabolic syndrome.
Metabolic syndrome (MetS) is a cluster of commonly concurrent metabolic risk factors associated with cardiovascular disease and type 2 diabetes mellitus, including: elevated blood pressure, atherogenic dyslipidemia, insulin resistance, and central obesity (measured as waist circumference with ethnic specific values). Thus, metabolic syndrome can eventually lead to conditions such as Chronic Kidney Disease (CKD) and atherosclerotic cardiovascular disease .
Risk factors of metabolic syndrome include family history, smoking, obesity, lack of physical activity and lifestyle factors [2, 3]. Sugar-sweetened soft drinks have been reported to increase risk [4, 5]. Children who have an increased body mass index (BMI), systolic blood pressure (SBP) and triglyceride levels are believed to be at higher risk of developing MetS in middle age .
The prevalence of metabolic syndrome is highest among those who are overweight and obese. The International Diabetes Federation (IDF) estimated that one-quarter of the world’s population suffers from metabolic syndrome. Taking age into consideration, metabolic syndrome appears to be most common in the elderly in those who are over 60 of age . On average, the prevalence of metabolic syndrome in adults is about 23% . A national survey done in Taiwan, the Nutrition and Health Survey in Taiwan (NAHSIT) 2005–2008 showed a significant increase in the prevalence of MetS from 13.6% (1993–1996) to 25.5% (2005–2008) for males, and 26.4% to 31.5% in females respectively over a period of 10–15 years. The relationship between diabetes, high blood pressure, heart disease, cerebrovascular disease and metabolic syndrome is inseparable, as these conditions and or their associations are among the top ten causes of death in Taiwan .
Circadian rhythm plays an important role in endocrine secretion, body temperature . An important aspect of circadian rhythms is that they persist in the absence of external cues . Circadian genes which express periodically in an approximate 24- hour period help to regulate the genes of metabolism [11,12,13]. Previous animal models have showed that knockout of specific circadian gene will influence the circadian behavior. The recognition that multiple transcription factors function in the circadian gene, and that each of these has thousands of genomic DNA binding sites. Each of the circadian genes contributes directly to individual gene regulation in addition to its role in the reciprocal and homeostatic regulation of other clock genes by transcriptional-translational feedback loops that define the clock itself . Many disease have been found to related to circadian genes including Alzheimer’s diseases, Parkinson disease , atherosclerotic disease  or viral infection.
Circadian rhythm also affects oxidative stress, too. If the human body or cells experience significant stress, their ability to regulate internal systems, including redox levels and circadian rhythms, may become impaired . Animal studies have showed that risperidone may reset circadian rhythm . Risperidone was found to induce cytotoxicity via rising reactive oxygen species (ROS), mitochondrial potential collapse, lysosomal membrane leakiness, GSH depletion and lipid peroxidation, and some antioxidant like coenzyme Q10 or N-acetyl cysteine may have a role as a therapeutic options . Circadian rhythm also has played a role in liver lipid metabolism and renin angiotensin system  and chronic fatigue syndrome [21, 22]. The timing of statins therapy may influence the effect . Renin angiotensin system was found to induce oxidative stress and fibrogenic cytokine . Altering circadian rhythm may have a huge amount of influence over treatment of chronic liver diseases.
Increasing evidence shows that circadian clock genes may contribute to the development of metabolic syndrome [25, 26]. Circadian clocks regulate the timing of biological events including the sleep–wake cycle, energy metabolism, and secretion of hormones, etc. In an association and interaction analysis from Lin et al., the study proposed that many of these core circadian clock genes impacts metabolic activity and metabolism, which may lead to metabolic syndrome . We targeted the core circadian clock genes that have been potentially linked with MetS.
We used Taiwan Biobank (TWB) NGS cohort as our study population. TWB collects lifestyle, genomic data, and represent diseases from Taiwan residents. TWB recruits community-based volunteers who are 30 to 70 years of age and have no history of cancer. This cohort was based on the recruitment and monitoring from the general Taiwanese population, and has been utilized in previous genetic studies . Our study included 642 TWB individuals who have whole genome sequence (WGS) data.
Metabolic syndrome definition
According to the new International Diabetes Federation (IDF) definition, metabolic syndrome must meet the criteria of having central obesity (measured in waist circumference specific to the ethnic values, see below) plus 2 of the following 4 factors:
Triglycerides ≥ 150 mg/dL (1.7 mmol/L) or taking drug treatment for elevated triglycerides
Fasting glucose ≥ 100 mg//dL or previously diagnosed Type 2 Diabetes Mellitus
Reduced high-density lipoprotein (HDL) cholesterol or drug treatment for reduced HDL cholesterol:
in men, < 40 mg/dL (1.0 mmol/L)
in women, < 50 mg/dL (1.3 mmol/L)
Elevated blood pressure demonstrated by any of the following:
systolic blood pressure ≥ 130 mm Hg or
diastolic blood pressure ≥ 85 mm Hg or
antihypertensive drug treatment in a patient with a history of hypertension.
As our study took place in Taiwan and our data from the Taiwan Biobank, we used the ethnic specific values for waist circumference according to the “South Asians” and “Chinese” groups, where central obesity was defined as having a waist circumference of ≥ 90 cm in males and ≥ 80 cm in females.
Finding suspected single nucleotide polymorphisms
This analysis analyzed a total of 642 cases of WGS with the illumina platform (of which 123 were defined as metabolic syndrome patients) with target genes: ALAS1, APOA5, ARNTL, BUD13, CETP, CLOCK, CRY1, CRY2, CSNK1D, CSNK1E, GSK3B, LIPA, NPAS2, NR1D1, PER1, PER2, PER3, RORA, RORB, RORC, SMAD2, SMAD3, SMAD4, TGFB2, TGFB3, TGFBR2 and other genes within the range of SNPs for analysis. The range of SNP was set between 17 and 37 (average of > 30) with Qual > = 30 .
However, during this experiment, the range of data analysis was larger than originally expected due to a problem of the single nucleotide polymorphism (SNP) range set for CSNK1E. The definition of metabolic syndrome was primarily based on the physiological data of Taiwan's BioBank database. After it was imported into the SQL server, the patients were grouped with the database language as the basis for subsequent analysis.
The frequency of occurrence of single-strand, double-strand variation or non-variation in each group was counted. Subsequently the mathematical formula was written in Python and statistical analysis was applied to calculate the 95% confidence interval and the chi-square or Fisher’s Exact test to calculate the p value. After identifying significant SNPs, we conducted subgroup analysis to find out whether these SNPs are related to hypertension, low HDL level, diabetes or high TG level. Bonferroni Correction was used to tackle Multiple hypothesis testing, due to there are 5 category of metabolic syndrome, alpha value was set to 0.5/5 = 0.1.
P values for continuous variables were calculated using student’s t test. Categorical variables were compared using the chi-square test or exact test. Given the exploratory nature of this study, P < 0.05 was considered statistically significant. We use caret package in R software version 4.04 for model prediction. We also use C#, python and MySQL for data manipulation.
Creation of genome-based prediction model
We use significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. The data was used for training set and testing set using five-fold cross validation. We assumed that there was a cumulative effect on SNPs, so we take homozygous equal to 2, heterozygous equal to 1 and wild type as 0. Since weight may be influenced by these genes, weights are not use as a covariate . Besides the four models mentioned above, we selected 40 importance SNPs according to random forest important matrix, then using them to create another three model using the logistic regression, adaboost and neural network method (Fig. 1). We used a simple neural network with one layer and size 10 units in the hidden layer and decay equals to 0. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).
Baseline characteristic of metabolic syndrome individuals and control group
Among 642 study population, there were 124 individuals with metabolic syndrome and 518 individuals without metabolic syndrome. The mean age of metabolic syndrome cohort was 51 years old, and the mean age of non-metabolic syndrome cohort was 44 years old. We have found that the values of waistline, blood pressure, triglyceride level, hemoglobin A1C, fasting glucose and diabetes mellitus percentage in metabolic syndrome patient is higher than those without metabolic syndrome. In addition, the high-density lipoprotein value in metabolic syndrome is lower than those without metabolic syndrome which is corresponding to metabolic syndrome definition (Table 1).
Table 1 show the metabolic syndrome baseline value.
Spectrum of metabolic syndrome mutant alleles
We searched all alleles in the reference circadian gene and used chi-square test to find whether heterogenous or homogenous genotype is related to metabolic syndrome. Among the genes searched, we found 186 significant SNPs in circadian gene which is associated with metabolic syndrome. (Table 2). In the 186 SNP alleles, we identified 47 alleles associated with hypertension (Table 3), 27 alleles associated with diabetes mellitus (Table 4), 10 alleles associated with low HDL-C (Table 5) and 46 alleles associated with high TG level (Table 6).
Gene based prediction model
We applied different machine learning models including logistic regression, random forest, adaboost and neural network to predict metabolic syndrome which is based on gene data. Using our four predicting models (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.8, respectively. The F1 score were 0.424, 0.525, 0.528, 0.526 respectively (for details see Table 7). We chose 40 most significant SNPs in random forest model and used them as the new variable. We compared the 40 most significant OR value with the 40 most important SNPs in random forest model. We found that there are only 11 SNPs overlapping (Table 8) The SNP selected models ((logistic regression, adaboost and neural network) AUC were 0.82, 0.81, 0.85 respectively. The F1 score were 0.578, 0.415, 0.5, respectively (Table 9). Feature selecting models had better performance than original models. The AUC and F1 value are better than previous model.
In this study, we found 186 circadian gene SNPs related to metabolic syndrome. Of that there were 8 SNPs related to apolipoprotein. Previous studies have shown that apolipoprotein E knocked out mice will be more likely to developed cardiovascular disease after circadian rhythm was interrupted [31, 32]. Circadian rhythm disorders can alter our body’s metabolic factors including cholesterol profile and apolipoprotein . Another animal study also found that apolipoprotein-E knocked out mice could develop cardiac vascular disease more rapidly after circadian rhythm alteration . Our study also showed that apolipoprotein is related to high TG level, low HDL level and HTN. Rs132759 in APOL2 is both correlated with HTN and low HDL level. Previous studies have shown that APOL2 may be related to acute inflammation response and lipid metabolic processes [35, 36]. To our knowledge, our study is the first to identify that APOL2 is correlated to HTN.
There are 5 SNPs located at BMS1P20 which are long non-coding RNAs (lnc RNA). Previous studies have shown that BMS1P20 is positively corelated to cancer patients’ overall survival especially lung adenocarcinoma . There is also a hypothesis where lnc-RNA regulates our cell by lncRNA-miRNA-mRNA ceRNA network . There are some lnc-RNA reported to be in correlation with metabolism like 116HG, H19, HOTAIR and MIAT [39,40,41]. We have found rs403517 and rs405570 in BMS1P20 is related to DM, and we believe our study is the first to report BMS1P20 lnc-RNA is related to metabolic syndrome.
MYO18B gene expresses myosin heavy chain that is expressed in human cardiac and skeletal muscle . Some studies showed that MYO18B mutation is associated with myopathy or cardiomyopathy diseases in animal model or in humans [43, 44]. One animal study also show that MYO18B gene expression is regulated by circadian rhythm . In our study, we find that MYO18B is also associated with metabolic syndrome especially rs6004865 which is associated with low HDL levels. Although the SNPs which we find in MYO18B are all intronic or intergenic, we still need more studies to find the relationship between MYO18B and metabolic syndrome.
There are many studies exploring the RORA gene and its relation to circadian rhythm, associated with many psychiatry disorders including major depressive disorder, bipolar disorder, or sleep disturbance disorder [46,47,48]. RORA gene mutations also affect substance use like alcohol, tea, tobacco or caffeine . This is on a background of the widely accepted knowledge that smoking and alcohol.
consumption will increase the risk of developing metabolic syndrome. The result of an animal system study sees that suppression of RORA gene activity improves metabolic functions and reduces inflammation .
Many studies have found that SMARCB1 is a tumor suppressor gene and related to different types of cancer . Recent studies have shown that the circadian clock oscillation was developed during cell differentiation and some cancer cells lack the circadian gene which given the similarity between embryonic stem cell and cancer cell types . Our study found that multiple SNPs in SMARCB1 gene (rs5751740, rs5751741, rs5760038, rs5760046, rs5760057, rs5996620) are both related to high TG level and hypertension. However, the definite mechanism is still unknown.
ZNF280B is an oncogene in the prostate cancer and gastric cancer . Our study is the first to point out that ZNF280B mutation is related to metabolic syndrome. Rs142445063 and rs2051488 are related with diabetes mellitus in our study.
A previous study has used different machine learning method to predict metabolic syndrome. Both clinical information and genetic information were included in the model . In our study, entire dataset or selected SNPs were chosen in different models. The accuracy, AUC value and F1 value were improved in SNPs selected model. Previous studies have showed that feature selection model will have a better performance .
The advantage of this study is as follows. First, we examined multiple circadian genes and found multiple SNPs associated with metabolic syndrome. Some SNPs were first found related to metabolic syndrome. Among the significant SNPs, we did subgroup analysis to find out which SNPs corresponds to different metabolic syndrome criteria. Second, based on genetic information; we used four machine learning model to predict metabolic syndrome which to our knowledge has never been performed in previous studies and the AUC value can achieve 0.85 in SNPs selected model.
Nevertheless, there are several limitations in our study. First, the sample size is small and only includes healthy and aware Taiwanese participants. Therefore, this study should be replicated and validated in other populations. Second, this was a cross sectional study. It is difficult for us to find out causal relationships in this study. Third, we only used circadian gene SNPs in our prediction model. Other metabolic syndrome related SNPs or biomarkers can be included to increase accuracy.
We identified 186 circadian gene SNPs which were related to metabolic syndrome. Among these SNPs, there are 47 alleles associated with hypertension, 46 alleles associated with high serum TG levels, 27 alleles associated with diabetes mellitus and 10 alleles associated with low serum HDL levels. Some SNPs are first found to related with metabolic syndrome. Additional research is needed to confirm these SNPs. In addition, we applied several machine learning models to predict metabolic syndrome based on circadian gene data. We found that it is difficult to produce a high sensitivity model. Other clinical data should be added in to create a higher sensitivity model (Additional files 1, 2, 3, 4, 5, 6, 7, 8).
Availability of data and materials
The datasets generated and analyzed during the current study are not publicly available due to the privacy regulation of Taiwan biobank but are available from the corresponding author on reasonable request with permission of Taiwan biobank.
Single Nucleotide Polymorphism
Area under the receiver operating characteristics curve
Chronic Kidney Disease
Body mass index
Systolic blood pressure
The International Diabetes Federation
Nutrition and Health Survey in Taiwan
Whole genome sequence
Tanner RM, Brown TM, Muntner P. Epidemiology of obesity, the metabolic syndrome, and chronic kidney disease. Curr Hypertens Rep. 2012;14:152–9.
Samson SL, Garber AJ. Metabolic syndrome. Endocrinol Metab Clin North Am. 2014;43:1–23.
Sun K, Liu J, Ning G. Active smoking and risk of metabolic syndrome: a meta-analysis of prospective studies. PLoS ONE. 2012;7:e47791.
Narain A, Kwok CS, Mamas MA. Soft drink intake and the risk of metabolic syndrome: A systematic review and meta-analysis. Int J Clin Pract. 2017;71:23.
Malik VS, Popkin BM, Bray GA, Després JP, Willett WC, Hu FB. Sugar-sweetened beverages and risk of metabolic syndrome and type 2 diabetes: a meta-analysis. Diabetes Care. 2010;33:2477–83.
Burns TL, Letuchy EM, Paulos R, Witt J. Childhood predictors of the metabolic syndrome in middle-aged adults: the Muscatine study. J Pediatrics. 2009;155:S5.
Beltrán-Sánchez H, Harhay MO, Harhay MM, McElligott S. Prevalence and trends of metabolic syndrome in the adult US population, 1999–2010. J Am Coll Cardiol. 2013;62:697–703.
Ranasinghe P, Mathangasinghe Y, Jayawardena R, Hills AP, Misra A. Prevalence and trends of metabolic syndrome among adults in the asia-pacific region: a systematic review. BMC Public Health. 2017;17:101.
Pavlova M. Circadian rhythm sleep-wake disorders. Continuum Minneapolis, Minn. 2017;23:1051–63.
Pittendrigh CS, Daan S. A functional analysis of circadian pacemakers in nocturnal rodents. J Comp Physiol. 1976;106:291–331.
Cui P, Zhong T, Wang Z, Wang T, Zhao H, Liu C, Lu H. Identification of human circadian genes based on time course gene expression profiles by using a deep learning method. Mol Basis Dis. 2018;18664:2274–83.
Solovyeva IA, Dobrovolskayaa EV, Moskalev AA. Genetic Control of Circadian Rhythms and Aging. Genetika. 2016;52:393–412.
Cox KH, Takahashi JS. Circadian clock genes and the transcriptional architecture of the clock mechanism. J Mol Endocrinol. 2019;63:R93-r102.
Guan D, Lazar MA. Interconnections between circadian clocks and metabolism. J Clin Investig. 2021;131:23.
Maiese K. Cognitive Impairment and Dementia: Gaining Insight through Circadian Clock Gene Pathways. Biomolecules. 2021;11:34.
Schober A, Blay RM, SaboorMaleki S, Zahedi F, Winklmaier AE, Kakar MY, Baatsch IM, Zhu M, Geißler C, Fusco AE, Eberlein A, Li N, Megens RTA, Banafsche R, Kumbrink J, Weber C, Nazari-Jahantigh M. MicroRNA-21 controls circadian regulation of apoptosis in atherosclerotic lesions. Circulation. 2021;144:1059–73.
Wilking M, Ndiaye M, Mukhtar H, Ahmad N. Circadian rhythm connections to oxidative stress: implications for human health. Antioxid Redox Signal. 2013;19:192–208.
Cherukalady R, Kumar D, Basu P, Singaravel M. Risperidone resets the circadian clock in mice. Biol Rhythm Res. 2017;48:583–91.
Eftekhari A, Ahmadian E, Azarmi Y, Parvizpur A, Hamishehkar H, Eghbal MA. In vitro/vivo studies towards mechanisms of risperidone-induced oxidative stress and the protective role of coenzyme Q10 and N-acetylcysteine. Toxicol Mech Methods. 2016;26:520–8.
Cugini P, Lucia P. Circadian rhythm of the renin-angiotensin-aldosterone system: a summary of our research studies. Clin Ter. 2004;155:287–91.
Tsai SY, Chen HJ, Lio CF, Kuo CF, Kao AC, Wang WS, Yao WC, Chen C, Yang TY. Increased risk of chronic fatigue syndrome in patients with inflammatory bowel disease: a population-based retrospective cohort study. J Transl Med. 2019;17:55.
Yang TY, Lin CL, Yao WC, Lio CF, Chiang WP, Lin K, Kuo CF, Tsai SY. How mycobacterium tuberculosis infection could lead to the increasing risks of chronic fatigue syndrome and the potential immunological effects: a population-based retrospective cohort study. J Transl Med. 2022;20:99.
Izquierdo-Palomares JM, Fernandez-Tabera JM, Plana MN, AñinoAlba A, GómezÁlvarez P, Fernandez-Esteban I, Saiz LC, Martin-Carrillo P, PinarLópez Ó. Chronotherapy versus conventional statins therapy for the treatment of hyperlipidaemia. Cochrane Database System Rev. 2016;11:C009462.
Ahmadian E, Pennefather PS, Eftekhari A, Heidari R, Eghbal MA. Role of renin-angiotensin system in liver diseases: an outline on the potential therapeutic points of intervention. Expert Rev Gastroenterol Hepatol. 2016;10:1279–88.
Chaix A, Lin T, Le HD, Chang MW, Panda S. Time-Restricted Feeding Prevents Obesity and Metabolic Syndrome in Mice Lacking a Circadian Clock. Cell Metab. 2019;29:303-319.e304.
Jagannath A, Taylor L, Wakaf Z, Vasudevan SR, Foster RG. The genetics of circadian rhythms, sleep and health. Hum Mol Genet. 2017;26:R128-r138.
Lin E, Kuo PH, Liu YL, Yang AC, Kao CF, Tsai SJ. Effects of circadian clock genes and health-related behavior on metabolic syndrome in a Taiwanese population: Evidence from association and interaction analysis. PLoS ONE. 2017;12:e0173861.
Chen C-H, Yang J-H, Chiang CW, Hsiung C-N, Wu P-E, Chang L-C, Chu H-W, Chang J, Song I-W, Yang S-LJH. Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan. Biobank Project. 2016;25:5321–31.
Li H, Ruan J, Durbin RJG. Mapping short DNA sequencing reads and calling variants using mapping quality scores. BMJ. 2008;18:1851–8.
Engin A. Circadian Rhythms in Diet-Induced Obesity. Adv Exp Med Biol. 2017;960:19–52.
Zhang X, Zhao F, Xu C, Lu C, Jin H, Chen S, Qian R. Circadian rhythm disorder of thrombosis and thrombolysis-related gene expression in apolipoprotein E knock-out mice. Int J Mol Med. 2008;22:149–53.
Schilperoort M, De Berg R, Bosmans LA, Os BW, Dollé MET, Smits NAM, Guichelaar T, Baarle D, Koemans L, Berbée JFP, Deboer T, Meijer JH, de Vries MR, Vreeken D, Gils JM, Willems K, Kerkhof LWM, Lutgens E, Biermasz NR, Rensen PCN, Kooijman S. Disruption of circadian rhythm by alternating light-dark cycles aggravates atherosclerosis development in APOE*3-LeidenCETP mice. J Pineal Res. 2020;68:e12614.
Hyun MH, Kang JH, Kim S, Na JO, Choi CU, Kim JW, Kim EJ, Rha SW, Park CG, Lee E, Seo HS. Patterns of circadian variation in 24-hour ambulatory blood pressure, heart rate, and sympathetic tone correlate with cardiovascular disease risk: a cluster analysis. Cardiovasc Ther. 2020;2020:4354759.
Chalfant JM, Howatt DA, Tannock LR, Daugherty A, Pendergast JS. Circadian disruption with constant light exposure exacerbates atherosclerosis in male ApolipoproteinE-deficient mice. Sci Rep. 2020;10:9920.
Liu Z, Lu H, Jiang Z, Pastuszyn A, Hu CA. Apolipoprotein l6, a novel proapoptotic Bcl-2 homology 3-only protein, induces mitochondria-mediated apoptosis in cancer cells. Mol Cancer Res. 2005;3:21–31.
Rao SK, Pavicevic Z, Du Z, Kim JG, Fan M, Jiao Y, Rosebush M, Samant S, Gu W, Pfeffer LM, Nosrat CA. Pro-inflammatory genes as biomarkers and therapeutic targets in oral squamous cell carcinoma. J Biol Chem. 2010;285:32512–21.
Sui J, Li YH, Zhang YQ, Li CY, Shen X, Yao WZ, Peng H, Hong WW, Yin LH, Pu YP, Liang GY. Integrated analysis of long non-coding RNA-associated ceRNA network reveals potential lncRNA biomarkers in human lung adenocarcinoma. Int J Oncol. 2016;49:2023–36.
Guo Z, Cao Y. An lncRNA-miRNA-mRNA ceRNA network for adipocyte differentiation from human adipose-derived stem cells. Mol Med Rep. 2019;19:4271–87.
Powell WT, Coulson RL, Crary FK, Wong SS, Ach RA, Tsang P, AliceYamada N, Yasui DH, Lasalle JM. A Prader-Willi locus lncRNA cloud modulates diurnal genes and energy expenditure. Hum Mol Genet. 2013;22:4318–28.
Wang H, Cao Y, Shu L, Zhu Y, Peng Q, Ran L, Wu J, Luo Y, Zuo G, Luo J, Zhou L, Shi Q, Weng Y, Huang A, He TC, Fan J. Long non-coding RNA (lncRNA) H19 induces hepatic steatosis through activating MLXIPL and mTORC1 networks in hepatocytes. J Cell Mol Med. 2020;24:1399–412.
Meydan C, Bekenstein U, Soreq H. Molecular regulatory pathways link sepsis with metabolic syndrome: non-coding RNA elements underlying the sepsis/metabolic cross-talk. Front Mol Neurosci. 2018;11:189.
Salamon M, Millino C, Raffaello A, Mongillo M, Sandri C, Bean C, Negrisolo E, Pallavicini A, Valle G, Zaccolo M, Schiaffino S, Lanfranchi G. Human MYO18B, a novel unconventional myosin heavy chain expressed in striated muscles moves into the myonuclei upon differentiation. J Mol Biol. 2003;326:137–49.
Gurung R, Ono Y, Baxendale S, Lee SL, Moore S, Calvert M, Ingham PW. A Zebrafish Model for a Human Myopathy Associated with Mutation of the Unconventional Myosin MYO18B. Genetics. 2017;205:725–35.
Malfatti E, Böhm J, Lacène E, Beuvin M, Romero NB, Laporte J. A Premature Stop Codon in MYO18B is associated with severe nemaline myopathy with cardiomyopathy. J Neuromusc Dis. 2015;2:219–27.
Lazado CC, Nagasawa K, Babiak I, Kumaratunga HP, Fernandes JM. Circadian rhythmicity and photic plasticity of myosin gene transcription in fast skeletal muscle of Atlantic cod (Gadus morhua). Mar Genomics. 2014;18(Pt A):21–9.
Geoffroy PA, Etain B, Lajnef M, Zerdazi EH, Brichant-Petitjean C, Heilbronner U, Hou L, Degenhardt F, Rietschel M, McMahon FJ, Schulze TG, Jamain S, Marie-Claire C, Bellivier F. Circadian genes and lithium response in bipolar disorders: associations with PPARGC1A (PGC-1α) and RORA. Genes Brain Behav. 2016;15:660–8.
Hou SJ, Tsai SJ, Kuo PH, Liu YL, Yang AC, Lin E, Lan TH. An association study in the Taiwan Biobank reveals RORA as a novel locus for sleep duration in the Taiwanese Population. Sleep Med. 2020;73:70–5.
Chen Z, Tao S, Zhu R, Tian S, Sun Y, Wang H, Yan R, Shao J, Zhang Y, Zhang J, Yao Z, Lu Q. Aberrant functional connectivity between the suprachiasmatic nucleus and the superior temporal gyrus: Bridging RORA gene polymorphism with diurnal mood variation in major depressive disorder. J Psychiatr Res. 2021;132:123–30.
Billon C, Sitaula S, Burris TP. Metabolic Characterization of a Novel RORα Knockout Mouse Model without Ataxia. Front Endocrinol. 2017;8:141.
Kohashi K, Oda Y. Oncogenic roles of SMARCB1/INI1 and its deficient tumors. Cancer Sci. 2017;108:547–52.
Tsuchiya Y, Umemura Y, Yagita K. Circadian clock and cancer: from a viewpoint of cellular differentiation. Int J Urol. 2020;27:518–24.
Zhai J, Yang Z, Cai X, Yao G, An Y, Wang W, Fan Y, Zeng C, Liu K. ZNF280B promotes the growth of gastric cancer in vitro and in vivo. Oncol Lett. 2018;15:5819–24.
Choe EK, Rhee H, Lee S, Shin E, Oh SW, Lee JE, Choi SH. Metabolic syndrome prediction using machine learning models with genetic and clinical information from a nonobese healthy population. Genom Inform. 2018;16:e31.
Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Domingo M, Albia JJPO. Machine learning approach to single nucleotide polymorphism-based asthma prediction. LEARN. 2019;14:e0225574.
We would like to extend acknowledgements to Taiwan biobank for providing the preliminary data, Dr Benjamin Lai, Dr Che-Wei Su, and Dr Chon-Fu Lio for the initial suggestions, and to the organizations that have funded this project.
This study was supported by the Department of Medical Research at Mackay Memorial Hospital, Taiwan, Grant Numbers MMH-106-81, MMH-107-71, MMH-107-102, MMH-107-135, MMH-109-79, MMH-109-103, and Mackay Medical College, Grant Number 1082A03. The APC was funded by the Department of Medical Research at Mackay Memorial Hospital and both of the co-first and the corresponding author: Dr. Chien-Feng Kuo and Dr. Shin-Yi Tsai.
Ethics approval and consent to participate
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Mackay Memories Hospital (16MMHIS074) and Taiwan Biobank (TWBR10903-07).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Table S1.
Summary of the 186 significant circadian gene SNPs.
Additional file 2: Supplementary figure S2
AUC curve of neural network
Additional file 3: Supplementary figure S3
Precision-Recall curve ofneural network
Additional file 4: Supplementary figure S4
AUC curve of Adaboost model
Additional file 5: Supplementary figure S5
Precision-Recall curve of Adaboost model
Additional file 6: Supplementary figure S6
AUC curve of logisticregression
Additional file 7: Supplementary figure S7
Precision-Recall curve of logistic regression
Additional file 8: Supplementary figure S8
Biological pathways-based analysis of circadian rhythm(1)<br>Reference<br>1. Reactome
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hsu, NW., Chou, KC., Wang, YT.T. et al. Building a model for predicting metabolic syndrome using artificial intelligence based on an investigation of whole-genome sequencing. J Transl Med 20, 190 (2022). https://doi.org/10.1186/s12967-022-03379-7
- Circadian rhythm
- Metabolic syndrome
- Whole-genome sequencing
- Deep learning