Skip to main content

Building a model for predicting metabolic syndrome using artificial intelligence based on an investigation of whole-genome sequencing



The circadian system is responsible for regulating various physiological activities and behaviors and has been gaining recognition. The circadian rhythm is adjusted in a 24-h cycle and has transcriptional–translational feedback loops. When the circadian rhythm is interrupted, affecting the expression of circadian genes, the phenotypes of diseases could amplify. For example, the importance of maintaining the internal temporal homeostasis conferred by the circadian system is revealed as mutations in genes coding for core components of the clock result in diseases. This study will investigate the association between circadian genes and metabolic syndromes in a Taiwanese population.


We performed analysis using whole-genome sequencing, read vcf files and set target circadian genes to determine if there were variants on target genes. In this study, we have investigated genetic contribution of circadian-related diseases using population-based next generation whole genome sequencing. We also used significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. In addition, we used random forest model variables importance matrix to select 40 more significant SNPs, which were subsequently incorporated to create new prediction models and to compare with previous models. The data was then utilized for training set and testing set using five-fold cross validation. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).


After searching significant variants, we used Chi-Square tests to find some variants. We found 186 significant SNPs, and four predicting models which used 186 SNPs (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.81 respectively. The F1 scores were 0.412, 0.078, 0.295, 0.552, respectively. The other three models which used the 40 SNPs (logistic regression, adaboost and neural network), AUC were 0.82, 0.81, 0.81 respectively. The F1 scores were 0.584, 0.395, 0.574, respectively.


Circadian gene defect may also contribute to metabolic syndrome. Our study found several related genes and building a simple model to predict metabolic syndrome.


Metabolic syndrome (MetS) is a cluster of commonly concurrent metabolic risk factors associated with cardiovascular disease and type 2 diabetes mellitus, including: elevated blood pressure, atherogenic dyslipidemia, insulin resistance, and central obesity (measured as waist circumference with ethnic specific values). Thus, metabolic syndrome can eventually lead to conditions such as Chronic Kidney Disease (CKD) and atherosclerotic cardiovascular disease [1].

Risk factors of metabolic syndrome include family history, smoking, obesity, lack of physical activity and lifestyle factors [2, 3]. Sugar-sweetened soft drinks have been reported to increase risk [4, 5]. Children who have an increased body mass index (BMI), systolic blood pressure (SBP) and triglyceride levels are believed to be at higher risk of developing MetS in middle age [6].

The prevalence of metabolic syndrome is highest among those who are overweight and obese. The International Diabetes Federation (IDF) estimated that one-quarter of the world’s population suffers from metabolic syndrome. Taking age into consideration, metabolic syndrome appears to be most common in the elderly in those who are over 60 of age [2]. On average, the prevalence of metabolic syndrome in adults is about 23% [7]. A national survey done in Taiwan, the Nutrition and Health Survey in Taiwan (NAHSIT) 2005–2008 showed a significant increase in the prevalence of MetS from 13.6% (1993–1996) to 25.5% (2005–2008) for males, and 26.4% to 31.5% in females respectively over a period of 10–15 years. The relationship between diabetes, high blood pressure, heart disease, cerebrovascular disease and metabolic syndrome is inseparable, as these conditions and or their associations are among the top ten causes of death in Taiwan [8].

Circadian rhythm plays an important role in endocrine secretion, body temperature [9]. An important aspect of circadian rhythms is that they persist in the absence of external cues [10]. Circadian genes which express periodically in an approximate 24- hour period help to regulate the genes of metabolism [11,12,13]. Previous animal models have showed that knockout of specific circadian gene will influence the circadian behavior. The recognition that multiple transcription factors function in the circadian gene, and that each of these has thousands of genomic DNA binding sites. Each of the circadian genes contributes directly to individual gene regulation in addition to its role in the reciprocal and homeostatic regulation of other clock genes by transcriptional-translational feedback loops that define the clock itself [14]. Many disease have been found to related to circadian genes including Alzheimer’s diseases, Parkinson disease [15], atherosclerotic disease [16] or viral infection.

Circadian rhythm also affects oxidative stress, too. If the human body or cells experience significant stress, their ability to regulate internal systems, including redox levels and circadian rhythms, may become impaired [17]. Animal studies have showed that risperidone may reset circadian rhythm [18]. Risperidone was found to induce cytotoxicity via rising reactive oxygen species (ROS), mitochondrial potential collapse, lysosomal membrane leakiness, GSH depletion and lipid peroxidation, and some antioxidant like coenzyme Q10 or N-acetyl cysteine may have a role as a therapeutic options [19]. Circadian rhythm also has played a role in liver lipid metabolism and renin angiotensin system [20] and chronic fatigue syndrome [21, 22]. The timing of statins therapy may influence the effect [23]. Renin angiotensin system was found to induce oxidative stress and fibrogenic cytokine [24]. Altering circadian rhythm may have a huge amount of influence over treatment of chronic liver diseases.

Increasing evidence shows that circadian clock genes may contribute to the development of metabolic syndrome [25, 26]. Circadian clocks regulate the timing of biological events including the sleep–wake cycle, energy metabolism, and secretion of hormones, etc. In an association and interaction analysis from Lin et al., the study proposed that many of these core circadian clock genes impacts metabolic activity and metabolism, which may lead to metabolic syndrome [27]. We targeted the core circadian clock genes that have been potentially linked with MetS.


Study population

We used Taiwan Biobank (TWB) NGS cohort as our study population. TWB collects lifestyle, genomic data, and represent diseases from Taiwan residents. TWB recruits community-based volunteers who are 30 to 70 years of age and have no history of cancer. This cohort was based on the recruitment and monitoring from the general Taiwanese population, and has been utilized in previous genetic studies [28]. Our study included 642 TWB individuals who have whole genome sequence (WGS) data.

Metabolic syndrome definition

According to the new International Diabetes Federation (IDF) definition, metabolic syndrome must meet the criteria of having central obesity (measured in waist circumference specific to the ethnic values, see below) plus 2 of the following 4 factors:

  • Triglycerides ≥ 150 mg/dL (1.7 mmol/L) or taking drug treatment for elevated triglycerides

  • Fasting glucose ≥ 100 mg//dL or previously diagnosed Type 2 Diabetes Mellitus

  • Reduced high-density lipoprotein (HDL) cholesterol or drug treatment for reduced HDL cholesterol:

  • in men, < 40 mg/dL (1.0 mmol/L)

  • in women, < 50 mg/dL (1.3 mmol/L)

Elevated blood pressure demonstrated by any of the following:

  • systolic blood pressure ≥ 130 mm Hg or

  • diastolic blood pressure ≥ 85 mm Hg or

  • antihypertensive drug treatment in a patient with a history of hypertension.

As our study took place in Taiwan and our data from the Taiwan Biobank, we used the ethnic specific values for waist circumference according to the “South Asians” and “Chinese” groups, where central obesity was defined as having a waist circumference of ≥ 90 cm in males and ≥ 80 cm in females.

Finding suspected single nucleotide polymorphisms

This analysis analyzed a total of 642 cases of WGS with the illumina platform (of which 123 were defined as metabolic syndrome patients) with target genes: ALAS1, APOA5, ARNTL, BUD13, CETP, CLOCK, CRY1, CRY2, CSNK1D, CSNK1E, GSK3B, LIPA, NPAS2, NR1D1, PER1, PER2, PER3, RORA, RORB, RORC, SMAD2, SMAD3, SMAD4, TGFB2, TGFB3, TGFBR2 and other genes within the range of SNPs for analysis. The range of SNP was set between 17 and 37 (average of > 30) with Qual >  = 30 [29].

However, during this experiment, the range of data analysis was larger than originally expected due to a problem of the single nucleotide polymorphism (SNP) range set for CSNK1E. The definition of metabolic syndrome was primarily based on the physiological data of Taiwan's BioBank database. After it was imported into the SQL server, the patients were grouped with the database language as the basis for subsequent analysis.

The frequency of occurrence of single-strand, double-strand variation or non-variation in each group was counted. Subsequently the mathematical formula was written in Python and statistical analysis was applied to calculate the 95% confidence interval and the chi-square or Fisher’s Exact test to calculate the p value. After identifying significant SNPs, we conducted subgroup analysis to find out whether these SNPs are related to hypertension, low HDL level, diabetes or high TG level. Bonferroni Correction was used to tackle Multiple hypothesis testing, due to there are 5 category of metabolic syndrome, alpha value was set to 0.5/5 = 0.1.

Statistical analyses

P values for continuous variables were calculated using student’s t test. Categorical variables were compared using the chi-square test or exact test. Given the exploratory nature of this study, P < 0.05 was considered statistically significant. We use caret package in R software version 4.04 for model prediction. We also use C#, python and MySQL for data manipulation.

Creation of genome-based prediction model

We use significant SNPs to create a metabolic syndrome prediction model. Logistic regression, random forest, adaboost, and neural network were used to predict metabolic syndrome. The data was used for training set and testing set using five-fold cross validation. We assumed that there was a cumulative effect on SNPs, so we take homozygous equal to 2, heterozygous equal to 1 and wild type as 0. Since weight may be influenced by these genes, weights are not use as a covariate [30]. Besides the four models mentioned above, we selected 40 importance SNPs according to random forest important matrix, then using them to create another three model using the logistic regression, adaboost and neural network method (Fig. 1). We used a simple neural network with one layer and size 10 units in the hidden layer and decay equals to 0. Each model was evaluated with the following criteria: area under the receiver operating characteristics curve (AUC), precision, F1 score, and average precision (the area under the precision recall curve).

Fig. 1
figure 1

Flow diagram for model building


Baseline characteristic of metabolic syndrome individuals and control group

Among 642 study population, there were 124 individuals with metabolic syndrome and 518 individuals without metabolic syndrome. The mean age of metabolic syndrome cohort was 51 years old, and the mean age of non-metabolic syndrome cohort was 44 years old. We have found that the values of waistline, blood pressure, triglyceride level, hemoglobin A1C, fasting glucose and diabetes mellitus percentage in metabolic syndrome patient is higher than those without metabolic syndrome. In addition, the high-density lipoprotein value in metabolic syndrome is lower than those without metabolic syndrome which is corresponding to metabolic syndrome definition (Table 1).

Table 1 Baseline characteristic of the patients

Table 1 show the metabolic syndrome baseline value.

Spectrum of metabolic syndrome mutant alleles

We searched all alleles in the reference circadian gene and used chi-square test to find whether heterogenous or homogenous genotype is related to metabolic syndrome. Among the genes searched, we found 186 significant SNPs in circadian gene which is associated with metabolic syndrome. (Table 2). In the 186 SNP alleles, we identified 47 alleles associated with hypertension (Table 3), 27 alleles associated with diabetes mellitus (Table 4), 10 alleles associated with low HDL-C (Table 5) and 46 alleles associated with high TG level (Table 6).

Table 2 Significant SNPs and odds ratio
Table 3 Hypertension related SNPs
Table 4 Diabetes mellitus related SNPs
Table 5 Low HDL-C related SNPs
Table 6 Triglyceride level related SNPs

Gene based prediction model

We applied different machine learning models including logistic regression, random forest, adaboost and neural network to predict metabolic syndrome which is based on gene data. Using our four predicting models (logistic regression, random forest, adaboost and neural network), AUC were 0.68, 0.8, 0.82, 0.8, respectively. The F1 score were 0.424, 0.525, 0.528, 0.526 respectively (for details see Table 7). We chose 40 most significant SNPs in random forest model and used them as the new variable. We compared the 40 most significant OR value with the 40 most important SNPs in random forest model. We found that there are only 11 SNPs overlapping (Table 8) The SNP selected models ((logistic regression, adaboost and neural network) AUC were 0.82, 0.81, 0.85 respectively. The F1 score were 0.578, 0.415, 0.5, respectively (Table 9). Feature selecting models had better performance than original models. The AUC and F1 value are better than previous model.

Table 7 Prediction model using all significant SNPs
Table 8 40 most important SNPs in random forest model and OR value
Table 9 Prediction model using feature selecting SNPs


In this study, we found 186 circadian gene SNPs related to metabolic syndrome. Of that there were 8 SNPs related to apolipoprotein. Previous studies have shown that apolipoprotein E knocked out mice will be more likely to developed cardiovascular disease after circadian rhythm was interrupted [31, 32]. Circadian rhythm disorders can alter our body’s metabolic factors including cholesterol profile and apolipoprotein [33]. Another animal study also found that apolipoprotein-E knocked out mice could develop cardiac vascular disease more rapidly after circadian rhythm alteration [34]. Our study also showed that apolipoprotein is related to high TG level, low HDL level and HTN. Rs132759 in APOL2 is both correlated with HTN and low HDL level. Previous studies have shown that APOL2 may be related to acute inflammation response and lipid metabolic processes [35, 36]. To our knowledge, our study is the first to identify that APOL2 is correlated to HTN.

There are 5 SNPs located at BMS1P20 which are long non-coding RNAs (lnc RNA). Previous studies have shown that BMS1P20 is positively corelated to cancer patients’ overall survival especially lung adenocarcinoma [37]. There is also a hypothesis where lnc-RNA regulates our cell by lncRNA-miRNA-mRNA ceRNA network [38]. There are some lnc-RNA reported to be in correlation with metabolism like 116HG, H19, HOTAIR and MIAT [39,40,41]. We have found rs403517 and rs405570 in BMS1P20 is related to DM, and we believe our study is the first to report BMS1P20 lnc-RNA is related to metabolic syndrome.

MYO18B gene expresses myosin heavy chain that is expressed in human cardiac and skeletal muscle [42]. Some studies showed that MYO18B mutation is associated with myopathy or cardiomyopathy diseases in animal model or in humans [43, 44]. One animal study also show that MYO18B gene expression is regulated by circadian rhythm [45]. In our study, we find that MYO18B is also associated with metabolic syndrome especially rs6004865 which is associated with low HDL levels. Although the SNPs which we find in MYO18B are all intronic or intergenic, we still need more studies to find the relationship between MYO18B and metabolic syndrome.

There are many studies exploring the RORA gene and its relation to circadian rhythm, associated with many psychiatry disorders including major depressive disorder, bipolar disorder, or sleep disturbance disorder [46,47,48]. RORA gene mutations also affect substance use like alcohol, tea, tobacco or caffeine [47]. This is on a background of the widely accepted knowledge that smoking and alcohol.

consumption will increase the risk of developing metabolic syndrome. The result of an animal system study sees that suppression of RORA gene activity improves metabolic functions and reduces inflammation [49].

Many studies have found that SMARCB1 is a tumor suppressor gene and related to different types of cancer [50]. Recent studies have shown that the circadian clock oscillation was developed during cell differentiation and some cancer cells lack the circadian gene which given the similarity between embryonic stem cell and cancer cell types [51]. Our study found that multiple SNPs in SMARCB1 gene (rs5751740, rs5751741, rs5760038, rs5760046, rs5760057, rs5996620) are both related to high TG level and hypertension. However, the definite mechanism is still unknown.

ZNF280B is an oncogene in the prostate cancer and gastric cancer [52]. Our study is the first to point out that ZNF280B mutation is related to metabolic syndrome. Rs142445063 and rs2051488 are related with diabetes mellitus in our study.

A previous study has used different machine learning method to predict metabolic syndrome. Both clinical information and genetic information were included in the model [53]. In our study, entire dataset or selected SNPs were chosen in different models. The accuracy, AUC value and F1 value were improved in SNPs selected model. Previous studies have showed that feature selection model will have a better performance [54].

The advantage of this study is as follows. First, we examined multiple circadian genes and found multiple SNPs associated with metabolic syndrome. Some SNPs were first found related to metabolic syndrome. Among the significant SNPs, we did subgroup analysis to find out which SNPs corresponds to different metabolic syndrome criteria. Second, based on genetic information; we used four machine learning model to predict metabolic syndrome which to our knowledge has never been performed in previous studies and the AUC value can achieve 0.85 in SNPs selected model.

Nevertheless, there are several limitations in our study. First, the sample size is small and only includes healthy and aware Taiwanese participants. Therefore, this study should be replicated and validated in other populations. Second, this was a cross sectional study. It is difficult for us to find out causal relationships in this study. Third, we only used circadian gene SNPs in our prediction model. Other metabolic syndrome related SNPs or biomarkers can be included to increase accuracy.


We identified 186 circadian gene SNPs which were related to metabolic syndrome. Among these SNPs, there are 47 alleles associated with hypertension, 46 alleles associated with high serum TG levels, 27 alleles associated with diabetes mellitus and 10 alleles associated with low serum HDL levels. Some SNPs are first found to related with metabolic syndrome. Additional research is needed to confirm these SNPs. In addition, we applied several machine learning models to predict metabolic syndrome based on circadian gene data. We found that it is difficult to produce a high sensitivity model. Other clinical data should be added in to create a higher sensitivity model (Additional files 1, 2, 3, 4, 5, 6, 7, 8).

Availability of data and materials

The datasets generated and analyzed during the current study are not publicly available due to the privacy regulation of Taiwan biobank but are available from the corresponding author on reasonable request with permission of Taiwan biobank.



Single Nucleotide Polymorphism


Area under the receiver operating characteristics curve


Metabolic syndrome


Chronic Kidney Disease


Body mass index


Systolic blood pressure


The International Diabetes Federation


Nutrition and Health Survey in Taiwan


Taiwan Biobank


Whole genome sequence


High-density lipoprotein




  1. Tanner RM, Brown TM, Muntner P. Epidemiology of obesity, the metabolic syndrome, and chronic kidney disease. Curr Hypertens Rep. 2012;14:152–9.

    Article  CAS  PubMed  Google Scholar 

  2. Samson SL, Garber AJ. Metabolic syndrome. Endocrinol Metab Clin North Am. 2014;43:1–23.

    Article  PubMed  Google Scholar 

  3. Sun K, Liu J, Ning G. Active smoking and risk of metabolic syndrome: a meta-analysis of prospective studies. PLoS ONE. 2012;7:e47791.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Narain A, Kwok CS, Mamas MA. Soft drink intake and the risk of metabolic syndrome: A systematic review and meta-analysis. Int J Clin Pract. 2017;71:23.

    Article  Google Scholar 

  5. Malik VS, Popkin BM, Bray GA, Després JP, Willett WC, Hu FB. Sugar-sweetened beverages and risk of metabolic syndrome and type 2 diabetes: a meta-analysis. Diabetes Care. 2010;33:2477–83.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Burns TL, Letuchy EM, Paulos R, Witt J. Childhood predictors of the metabolic syndrome in middle-aged adults: the Muscatine study. J Pediatrics. 2009;155:S5.

    Article  Google Scholar 

  7. Beltrán-Sánchez H, Harhay MO, Harhay MM, McElligott S. Prevalence and trends of metabolic syndrome in the adult US population, 1999–2010. J Am Coll Cardiol. 2013;62:697–703.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ranasinghe P, Mathangasinghe Y, Jayawardena R, Hills AP, Misra A. Prevalence and trends of metabolic syndrome among adults in the asia-pacific region: a systematic review. BMC Public Health. 2017;17:101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Pavlova M. Circadian rhythm sleep-wake disorders. Continuum Minneapolis, Minn. 2017;23:1051–63.

    PubMed  Google Scholar 

  10. Pittendrigh CS, Daan S. A functional analysis of circadian pacemakers in nocturnal rodents. J Comp Physiol. 1976;106:291–331.

    Article  Google Scholar 

  11. Cui P, Zhong T, Wang Z, Wang T, Zhao H, Liu C, Lu H. Identification of human circadian genes based on time course gene expression profiles by using a deep learning method. Mol Basis Dis. 2018;18664:2274–83.

    Article  CAS  Google Scholar 

  12. Solovyeva IA, Dobrovolskayaa EV, Moskalev AA. Genetic Control of Circadian Rhythms and Aging. Genetika. 2016;52:393–412.

    CAS  PubMed  Google Scholar 

  13. Cox KH, Takahashi JS. Circadian clock genes and the transcriptional architecture of the clock mechanism. J Mol Endocrinol. 2019;63:R93-r102.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Guan D, Lazar MA. Interconnections between circadian clocks and metabolism. J Clin Investig. 2021;131:23.

    Article  Google Scholar 

  15. Maiese K. Cognitive Impairment and Dementia: Gaining Insight through Circadian Clock Gene Pathways. Biomolecules. 2021;11:34.

    Article  CAS  Google Scholar 

  16. Schober A, Blay RM, SaboorMaleki S, Zahedi F, Winklmaier AE, Kakar MY, Baatsch IM, Zhu M, Geißler C, Fusco AE, Eberlein A, Li N, Megens RTA, Banafsche R, Kumbrink J, Weber C, Nazari-Jahantigh M. MicroRNA-21 controls circadian regulation of apoptosis in atherosclerotic lesions. Circulation. 2021;144:1059–73.

    Article  CAS  PubMed  Google Scholar 

  17. Wilking M, Ndiaye M, Mukhtar H, Ahmad N. Circadian rhythm connections to oxidative stress: implications for human health. Antioxid Redox Signal. 2013;19:192–208.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Cherukalady R, Kumar D, Basu P, Singaravel M. Risperidone resets the circadian clock in mice. Biol Rhythm Res. 2017;48:583–91.

    Article  CAS  Google Scholar 

  19. Eftekhari A, Ahmadian E, Azarmi Y, Parvizpur A, Hamishehkar H, Eghbal MA. In vitro/vivo studies towards mechanisms of risperidone-induced oxidative stress and the protective role of coenzyme Q10 and N-acetylcysteine. Toxicol Mech Methods. 2016;26:520–8.

    Article  CAS  PubMed  Google Scholar 

  20. Cugini P, Lucia P. Circadian rhythm of the renin-angiotensin-aldosterone system: a summary of our research studies. Clin Ter. 2004;155:287–91.

    CAS  PubMed  Google Scholar 

  21. Tsai SY, Chen HJ, Lio CF, Kuo CF, Kao AC, Wang WS, Yao WC, Chen C, Yang TY. Increased risk of chronic fatigue syndrome in patients with inflammatory bowel disease: a population-based retrospective cohort study. J Transl Med. 2019;17:55.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Yang TY, Lin CL, Yao WC, Lio CF, Chiang WP, Lin K, Kuo CF, Tsai SY. How mycobacterium tuberculosis infection could lead to the increasing risks of chronic fatigue syndrome and the potential immunological effects: a population-based retrospective cohort study. J Transl Med. 2022;20:99.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Izquierdo-Palomares JM, Fernandez-Tabera JM, Plana MN, AñinoAlba A, GómezÁlvarez P, Fernandez-Esteban I, Saiz LC, Martin-Carrillo P, PinarLópez Ó. Chronotherapy versus conventional statins therapy for the treatment of hyperlipidaemia. Cochrane Database System Rev. 2016;11:C009462.

    Google Scholar 

  24. Ahmadian E, Pennefather PS, Eftekhari A, Heidari R, Eghbal MA. Role of renin-angiotensin system in liver diseases: an outline on the potential therapeutic points of intervention. Expert Rev Gastroenterol Hepatol. 2016;10:1279–88.

    Article  CAS  PubMed  Google Scholar 

  25. Chaix A, Lin T, Le HD, Chang MW, Panda S. Time-Restricted Feeding Prevents Obesity and Metabolic Syndrome in Mice Lacking a Circadian Clock. Cell Metab. 2019;29:303-319.e304.

    Article  CAS  PubMed  Google Scholar 

  26. Jagannath A, Taylor L, Wakaf Z, Vasudevan SR, Foster RG. The genetics of circadian rhythms, sleep and health. Hum Mol Genet. 2017;26:R128-r138.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lin E, Kuo PH, Liu YL, Yang AC, Kao CF, Tsai SJ. Effects of circadian clock genes and health-related behavior on metabolic syndrome in a Taiwanese population: Evidence from association and interaction analysis. PLoS ONE. 2017;12:e0173861.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Chen C-H, Yang J-H, Chiang CW, Hsiung C-N, Wu P-E, Chang L-C, Chu H-W, Chang J, Song I-W, Yang S-LJH. Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan. Biobank Project. 2016;25:5321–31.

    CAS  Google Scholar 

  29. Li H, Ruan J, Durbin RJG. Mapping short DNA sequencing reads and calling variants using mapping quality scores. BMJ. 2008;18:1851–8.

    CAS  Google Scholar 

  30. Engin A. Circadian Rhythms in Diet-Induced Obesity. Adv Exp Med Biol. 2017;960:19–52.

    Article  CAS  PubMed  Google Scholar 

  31. Zhang X, Zhao F, Xu C, Lu C, Jin H, Chen S, Qian R. Circadian rhythm disorder of thrombosis and thrombolysis-related gene expression in apolipoprotein E knock-out mice. Int J Mol Med. 2008;22:149–53.

    PubMed  Google Scholar 

  32. Schilperoort M, De Berg R, Bosmans LA, Os BW, Dollé MET, Smits NAM, Guichelaar T, Baarle D, Koemans L, Berbée JFP, Deboer T, Meijer JH, de Vries MR, Vreeken D, Gils JM, Willems K, Kerkhof LWM, Lutgens E, Biermasz NR, Rensen PCN, Kooijman S. Disruption of circadian rhythm by alternating light-dark cycles aggravates atherosclerosis development in APOE*3-LeidenCETP mice. J Pineal Res. 2020;68:e12614.

    Article  CAS  PubMed  Google Scholar 

  33. Hyun MH, Kang JH, Kim S, Na JO, Choi CU, Kim JW, Kim EJ, Rha SW, Park CG, Lee E, Seo HS. Patterns of circadian variation in 24-hour ambulatory blood pressure, heart rate, and sympathetic tone correlate with cardiovascular disease risk: a cluster analysis. Cardiovasc Ther. 2020;2020:4354759.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Chalfant JM, Howatt DA, Tannock LR, Daugherty A, Pendergast JS. Circadian disruption with constant light exposure exacerbates atherosclerosis in male ApolipoproteinE-deficient mice. Sci Rep. 2020;10:9920.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Liu Z, Lu H, Jiang Z, Pastuszyn A, Hu CA. Apolipoprotein l6, a novel proapoptotic Bcl-2 homology 3-only protein, induces mitochondria-mediated apoptosis in cancer cells. Mol Cancer Res. 2005;3:21–31.

    CAS  PubMed  Google Scholar 

  36. Rao SK, Pavicevic Z, Du Z, Kim JG, Fan M, Jiao Y, Rosebush M, Samant S, Gu W, Pfeffer LM, Nosrat CA. Pro-inflammatory genes as biomarkers and therapeutic targets in oral squamous cell carcinoma. J Biol Chem. 2010;285:32512–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Sui J, Li YH, Zhang YQ, Li CY, Shen X, Yao WZ, Peng H, Hong WW, Yin LH, Pu YP, Liang GY. Integrated analysis of long non-coding RNA-associated ceRNA network reveals potential lncRNA biomarkers in human lung adenocarcinoma. Int J Oncol. 2016;49:2023–36.

    Article  CAS  PubMed  Google Scholar 

  38. Guo Z, Cao Y. An lncRNA-miRNA-mRNA ceRNA network for adipocyte differentiation from human adipose-derived stem cells. Mol Med Rep. 2019;19:4271–87.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Powell WT, Coulson RL, Crary FK, Wong SS, Ach RA, Tsang P, AliceYamada N, Yasui DH, Lasalle JM. A Prader-Willi locus lncRNA cloud modulates diurnal genes and energy expenditure. Hum Mol Genet. 2013;22:4318–28.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Wang H, Cao Y, Shu L, Zhu Y, Peng Q, Ran L, Wu J, Luo Y, Zuo G, Luo J, Zhou L, Shi Q, Weng Y, Huang A, He TC, Fan J. Long non-coding RNA (lncRNA) H19 induces hepatic steatosis through activating MLXIPL and mTORC1 networks in hepatocytes. J Cell Mol Med. 2020;24:1399–412.

    Article  CAS  PubMed  Google Scholar 

  41. Meydan C, Bekenstein U, Soreq H. Molecular regulatory pathways link sepsis with metabolic syndrome: non-coding RNA elements underlying the sepsis/metabolic cross-talk. Front Mol Neurosci. 2018;11:189.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Salamon M, Millino C, Raffaello A, Mongillo M, Sandri C, Bean C, Negrisolo E, Pallavicini A, Valle G, Zaccolo M, Schiaffino S, Lanfranchi G. Human MYO18B, a novel unconventional myosin heavy chain expressed in striated muscles moves into the myonuclei upon differentiation. J Mol Biol. 2003;326:137–49.

    Article  CAS  PubMed  Google Scholar 

  43. Gurung R, Ono Y, Baxendale S, Lee SL, Moore S, Calvert M, Ingham PW. A Zebrafish Model for a Human Myopathy Associated with Mutation of the Unconventional Myosin MYO18B. Genetics. 2017;205:725–35.

    Article  CAS  PubMed  Google Scholar 

  44. Malfatti E, Böhm J, Lacène E, Beuvin M, Romero NB, Laporte J. A Premature Stop Codon in MYO18B is associated with severe nemaline myopathy with cardiomyopathy. J Neuromusc Dis. 2015;2:219–27.

    Article  Google Scholar 

  45. Lazado CC, Nagasawa K, Babiak I, Kumaratunga HP, Fernandes JM. Circadian rhythmicity and photic plasticity of myosin gene transcription in fast skeletal muscle of Atlantic cod (Gadus morhua). Mar Genomics. 2014;18(Pt A):21–9.

    Article  PubMed  Google Scholar 

  46. Geoffroy PA, Etain B, Lajnef M, Zerdazi EH, Brichant-Petitjean C, Heilbronner U, Hou L, Degenhardt F, Rietschel M, McMahon FJ, Schulze TG, Jamain S, Marie-Claire C, Bellivier F. Circadian genes and lithium response in bipolar disorders: associations with PPARGC1A (PGC-1α) and RORA. Genes Brain Behav. 2016;15:660–8.

    Article  CAS  PubMed  Google Scholar 

  47. Hou SJ, Tsai SJ, Kuo PH, Liu YL, Yang AC, Lin E, Lan TH. An association study in the Taiwan Biobank reveals RORA as a novel locus for sleep duration in the Taiwanese Population. Sleep Med. 2020;73:70–5.

    Article  PubMed  Google Scholar 

  48. Chen Z, Tao S, Zhu R, Tian S, Sun Y, Wang H, Yan R, Shao J, Zhang Y, Zhang J, Yao Z, Lu Q. Aberrant functional connectivity between the suprachiasmatic nucleus and the superior temporal gyrus: Bridging RORA gene polymorphism with diurnal mood variation in major depressive disorder. J Psychiatr Res. 2021;132:123–30.

    Article  PubMed  Google Scholar 

  49. Billon C, Sitaula S, Burris TP. Metabolic Characterization of a Novel RORα Knockout Mouse Model without Ataxia. Front Endocrinol. 2017;8:141.

    Article  Google Scholar 

  50. Kohashi K, Oda Y. Oncogenic roles of SMARCB1/INI1 and its deficient tumors. Cancer Sci. 2017;108:547–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Tsuchiya Y, Umemura Y, Yagita K. Circadian clock and cancer: from a viewpoint of cellular differentiation. Int J Urol. 2020;27:518–24.

    Article  PubMed  Google Scholar 

  52. Zhai J, Yang Z, Cai X, Yao G, An Y, Wang W, Fan Y, Zeng C, Liu K. ZNF280B promotes the growth of gastric cancer in vitro and in vivo. Oncol Lett. 2018;15:5819–24.

    PubMed  PubMed Central  Google Scholar 

  53. Choe EK, Rhee H, Lee S, Shin E, Oh SW, Lee JE, Choi SH. Metabolic syndrome prediction using machine learning models with genetic and clinical information from a nonobese healthy population. Genom Inform. 2018;16:e31.

    Article  Google Scholar 

  54. Gaudillo J, Rodriguez JJR, Nazareno A, Baltazar LR, Vilela J, Bulalacao R, Domingo M, Albia JJPO. Machine learning approach to single nucleotide polymorphism-based asthma prediction. LEARN. 2019;14:e0225574.

    CAS  Google Scholar 

Download references


We would like to extend acknowledgements to Taiwan biobank for providing the preliminary data, Dr Benjamin Lai, Dr Che-Wei Su, and Dr Chon-Fu Lio for the initial suggestions, and to the organizations that have funded this project.


This study was supported by the Department of Medical Research at Mackay Memorial Hospital, Taiwan, Grant Numbers MMH-106-81, MMH-107-71, MMH-107-102, MMH-107-135, MMH-109-79, MMH-109-103, and Mackay Medical College, Grant Number 1082A03. The APC was funded by the Department of Medical Research at Mackay Memorial Hospital and both of the co-first and the corresponding author: Dr. Chien-Feng Kuo and Dr. Shin-Yi Tsai.

Author information

Authors and Affiliations



SYT conceptualized and designed the study. NWH, KCC, CFK and SYT were responsible for investigation, formal analysis, and interpreted the data and all authors wrote the preliminary draft. SYT was responsible for supervision, major revision, and verifying the data. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shin-Yi Tsai.

Ethics declarations

Ethics approval and consent to participate

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Mackay Memories Hospital (16MMHIS074) and Taiwan Biobank (TWBR10903-07).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Summary of the 186 significant circadian gene SNPs.

Additional file 2: Supplementary figure S2

AUC curve of neural network

Additional file 3: Supplementary figure S3

Precision-Recall curve ofneural network

Additional file 4: Supplementary figure S4

AUC curve of Adaboost model

Additional file 5: Supplementary figure S5

Precision-Recall curve of Adaboost model

Additional file 6: Supplementary figure S6

AUC curve of logisticregression

Additional file 7: Supplementary figure S7

Precision-Recall curve of logistic regression

Additional file 8: Supplementary figure S8

Biological pathways-based analysis of circadian rhythm(1)<br>Reference<br>1. Reactome

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsu, NW., Chou, KC., Wang, YT.T. et al. Building a model for predicting metabolic syndrome using artificial intelligence based on an investigation of whole-genome sequencing. J Transl Med 20, 190 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: