External validation of the European risk assessment tool for chronic cardio-metabolic disorders in a Middle Eastern population

Background High burden of chronic cardio-metabolic disorders including type 2 diabetes mellitus (T2DM), chronic kidney disease (CKD), and cardiovascular disease (CVD) have been reported in the Middle East and North Africa region. We aimed to externally validate a non-laboratory risk assessment tool for the prediction of the chronic cardio-metabolic disorders in the Iranian population. Methods The predictors included age, body mass index, waist circumference, use of antihypertensive medications, current smoking, and family history of cardiovascular disease and/or diabetes. For external validation of the model in the Tehran lipids and glucose study (TLGS), the Area under the curve (AUC) and the Hosmer–Lemeshow (HL) goodness of fit test were performed for discrimination and calibration, respectively. Results Among 1310 men and 1960 women aged 28–85 years, 29.5% and 47.4% experienced chronic cardio-metabolic disorders during the 6 and 9-year follow-up, respectively. The model showed acceptable discrimination, with an AUC of 0.72 (95% CI 0.69–0.75) for men and 0.73 (95% CI 0.71–0.76) for women. The calibration of the model was good for both genders (min HL P = 0.5). Considering separate outcomes, AUC was highest for CKD (0.76 (95% CI 0.72–0.79)) and lowest for T2DM (0.65 (95% CI 0.61–0.69)), in men. As for women, AUC was highest for CVD (0.82 (95% CI 0.78–0.86)) and lowest for T2DM (0.69 (95% CI 0.66–0.73)). The 9-year follow-up demonstrated almost similar performances compared to the 6-year follow-up. Using Cox regression in place of logistic multivariable analysis, model’s discrimination and calibration were reduced for prediction of chronic cardio-metabolic disorders; the issue which had more effect on the prediction of incident CKD among women. Moreover, adding data of educational levels and marital status did not improve, the discrimination and calibration in the enhanced model. Conclusion This model showed acceptable discrimination and good calibration for risk prediction of chronic cardio-metabolic disorders in short and long-term follow-up in the Iranian population.


Study population
Tehran Lipid and Glucose Study (TLGS) is a communitybased prospective cohort study conducted on an Iranian urban population in Tehran. The study aims to determine the prevalence and incidence of non-communicable diseases and related risk factors among individuals aged ≥ 3 years and promote a healthy lifestyle and programs for the prevention of NCDs. The study has been established in two phases including the first (1999-2001; n = 15,005) and the second (2001)(2002)(2003)(2004)(2005); n = 3550) and is designed to keep on for at least 20 years on the triennial basis. The design and methodology of the TLGS study have been reported elsewhere [19]. Since the detail of the data regarding the cardiovascular status at the recruitment time was available from the phase II, the current study was designed on 7490 individuals aged 28-85 years who participated in the second phase of the TLGS study (phase I = 5716 and phase II = 1774). From this number, we excluded those with prevalent CVD (i.e. participants with a history of myocardial infarction, angioplasty, coronary artery bypass graft (CABG) or stroke, (n = 546)), prevalent T2DM defined as self-reported use of diabeteslowering medication (n = 856) and prevalent end-stage renal disease (ESRD) defined by estimated Glomerular Filtration Rate (eGFR) < 15 mL/min/1.73 m 2 (n = 1). After excluding those with missing data at baseline for creatinine (Cr), fasting plasma glucose (FPG), 2-hour postchallenge plasma glucose (2 h-PCG), body mass index (BMI), waist circumference (WC) and smoking status (n = 1864, considering overlap features between missing values) as well as participants with missing data during follow-up on Cr (n = 32), FPG, 2 h-PCG (n = 718) and CVD status (n = 203), 3270 individuals were eligible for the current study during 6-year follow-up until March 2011. In line with the risk assessment tool, no one died from non-cardiovascular causes during the follow-up period.
To investigate the long-term effect of the risk assessment tool for prediction of chronic cardio-metabolic disorders, from a total of 4223 individuals, we excluded prevalent cases of CVD, T2DM, and ESRD and those with missing data on covariates using the above approach. 3240 individuals remained for the analysis during 9-year follow-up until March 2018 (Additional file 1: Figure S1). This study was approved by the Institutional Review Board (IRB) of the Research Institute for Endocrine Sciences (RIES), Shahid Beheshti University of Medical Sciences, Tehran, Iran, and all participants provided written informed consent.

Clinical and laboratory measurements
Information on demographic data, family history of premature CVD and T2DM, current smoking status, and medication history were obtained by a trained interviewer using a standard questionnaire. Details for anthropometric measurements including height, weight, and WC are reported elsewhere [19]. A blood sample was taken from all study participants between 7:00 and 9:00 AM after 12 to 14 h overnight fasting. More detail for laboratory measurements including FPG, 2 h-PCG, and serum creatinine was addressed previously [19].

Definition of variables
BMI was calculated as weight (kg) divided by height (m 2 ). A positive family history of premature CVD for the study participant was considered as having previously diagnosed CVD in first-degree male and female relatives aged < 55 and < 65 years, respectively. The current smoker was defined as who smokes cigarettes daily or occasionally.
Outcomes a. Type 2 diabetes T2DM was defined as FPG ≥ 7 mmol/L, 2 h-PCG ≥ 11.1 mmol/L or use of anti-diabetic medications. b. Chronic kidney disease CKD was defined as eGFR < 60 mL/min/1.73 m 2 , provided by the Modification of Diet in Renal Disease (MDRD) [20,21]. c. Cardiovascular disease According to the previously published article about CVD outcomes in the TLGS cohort [22,23], each participant is followed-up for any medical event leading to hospitalization during the previous year by telephone call. They were asked for any medical conditions by a trained nurse and later, a trained physician collected complementary data regarding that event during a home visit and by the acquisition of data from medical files. The collected data were then evaluated by an outcome assessment committee consisting of an internist, endocrinologist, cardiologist, epidemiologist, and other experts, if required, to assign a specific outcome for every event. In the current study, CHD events included cases of definite and probable MI, unstable angina, angiographic proven CHD, heart failure, and CHD death. Stroke was also defined as a definite or possible stroke or transient ischemic attack. Finally, CVD was clarified as a composite measure of any CHD events, stroke, or cerebrovascular death. d. Chronic cardio-metabolic disorders Chronic cardio-metabolic disorders was defined as the diagnosis of either T2DM, CKD or CVD during the follow-up period.

Risk tool for chronic cardio-metabolic disease
To evaluate the chronic cardio-metabolic disorders outcome, the risk tool was developed on 6780 Dutch men and women (aged 28-85 years) based on three population-based cohorts: the Rotterdam study(n = 4018), the Hoorn study (n = 627) and the Prevention of Renal and Vascular End-stage Disease (PREV-END; n = 2135) [16]. The sex-stratified model including age, BMI, WC, use of antihypertensive medications, current smoking, parent and/or sibling with MI or stroke (age < 65 years), and parent and/or sibling with diabetes were developed using logistic regression (Additional file 2: Table S1). The 7-year risk of chronic cardio-metabolic disorders was calculated for each subject according to the original risk assessment tool recommended by Alssema et al. [16] for each TLGS men and women.

Statistical analysis
Baseline characteristics of respondents (those with complete data) and non-respondents (those with missing data of covariates or loss to follow-up) were expressed as mean (standard deviation) and number (%) for categorical variables. For covariates with a skewed distribution, the median (interquartile range: IQR) was reported. A comparison of baseline characteristics between men and women was done by the Student's t test for normally distributed continuous variables, Maan -Whitney u test for skewed variables, and the Chi squared test for categorical variables. To evaluate the external validity of the risk equation, Area under the curve (AUC) and Hosmer-Lemeshow Chi square were applied to determine the discrimination and calibration of these predictor models, respectively. According to the Hosmer et al. [24] criteria, the AUCs 0.5-0.7, 0.70-80, 0.80-0.90, and ≥ 0.90 indicated poor, acceptable, excellent, and outstanding discrimination, respectively. Bootstrapping method with 1000 replications was used to estimate the uncertainty interval [25,26].To show the calibration in detail, the observed risk was plotted versus the mean of predicted probabilities using the calibration belt Stata module [27]. Besides, the observed to an expected ratio (O/E) for the chronic cardio-metabolic disorders outcome was calculated; ratio < 1 indicated overestimation, and > 1 indicated underestimation of the risk. We also recalibrated the risk assessment tool for the TLGS cohort characteristics by adjusting the intercept of the model; the same predictors with the same regression coefficients of the original model were fixed while the intercept was estimated as the free parameters [28]. The clinical performance of the validated model was evaluated using the same scoring point as defined by Alssema et al. [16]. The non-laboratory risk score was calculated by summing the risk points over the defined variables for both men and women (Additional file 2: Table S1). The cut-off point in the TLGS data was assessed by the maximum value of the Youden index (sensitivity + specificity-1) in each gender. Besides, the frequency of the high-risk population was calculated according to the 2016 national Iranian censuses. Using the above statistical approach, we repeated our data analysis for those participants with a 9-year follow-up. To compare the discrimination measurement of the risk assessment tool with other available non-invasive prediction models for the CVD outcome, we used the Gaziano et al. [13] risk score. We also assessed whether adding educational levels and marital status, two important social factors that previously showed moderate association with incident T2DM and CKD among Iranian population [29,30], would improve the discrimination or calibration. Furthermore, to avoid complete case analysis bias, we accounted for missing information on the baseline variables (outcomes were not imputed) by using single imputation methods. The missing values were estimated using multivariable regression models [31]. A sensitivity analysis was done by re-estimating the same covariates as Alssema et al. [16] in the validated sample using Cox-proportional hazard regression. To check for the discriminative power of the models, Harrell's C index (95% CI) was used. Statistical analysis was performed using STATA version 14 (Stata-Corp LP, College Station, Texas), statistical software. p < 0.05 were considered as statistically significant.

Baseline characteristics
The study population consisted of 1310 men and 1960 women at baseline with a mean (SD) age of 47.1 (12.8) and 45.3 (11.3) years, respectively. The baseline characteristics of men and women are shown in Table 1. There were significant differences between men and women; men were older and had a higher level of WC and higher frequencies of being a current smoker, whereas women had a higher level of BMI and higher frequencies of using antihypertensive medications and positive family history of CVD. The comparison of the baseline characteristics of the respondents vs. non-respondents is shown in Additional file 3: Table S2.

Model performance
According to Table 2, the combined risk score showed acceptable discrimination for incident chronic cardiometabolic disorders in the TLGS study, with AUC (95% CI) of 0.72 (0.69-0.75) in men and 0.73 (0.71-0.76) in women. Restricting CVD events, we found similar discrimination to chronic cardio-metabolic disorders in men, but significantly higher discrimination value in women; the AUC (95% CI) was 0.82 (0.78-0.86) that indicates excellent discrimination. Considering CKD outcome, the discrimination value was 0.76 (95% CI 0.72-0.79) in men and 0.71 (95% CI 0.69-0.74) in women. Poor discrimination was shown for T2DM events; AUC was 0.65 (95% CI 0.61-0.69) in men and 0.69 (95% CI 0.66-0.73) in women. A receiver operating characteristic curve for men and women is presented in Additional file 4: Figure S2.
The secondary analysis during the median (IQR) 9.2 years (8.7-10.2) follow-up, demonstrated almost similar discrimination and calibration for both genders compared with the 6-year follow-up ( Table 2).
The Hosmer-Lemeshow goodness-of-fit test showed good calibration for men (χ 2 = 6.87, p = 0.55) and women (χ 2 = 5.62, p = 0.69) for incident chronic cardio-metabolic disorders. Focusing on each of chronic cardio-metabolic disorders outcomes, the calibration was poor for men with CVD events (HL test: p = 0.02) and poor for women with incident T2DM (HL test: p = <0.001). The observedexpected plot was shown in Fig. 1. The hypothesis of the good calibration was not rejected for incident chronic cardio-metabolic disorders in men and women. However, considering T2DM (among women) and CVD (for both genders), the hypothesis of the good calibration was rejected. Moreover, recalibration with adjusting the TLGS study intercept did not improve the model goodness-of-fit; HL tests were significant regarding T2DM for women and CVD for men. Also, the AUC showed similar discrimination compared with the original model ( Table 2). The O/E ratio for the combined cardio-metabolic disease was almost 1 for both men and women.
When implicating the score threshold of ≥ 35 from the Alssema, et al. [16] among our population, 21.8% of men were detected as high-risk population and yielded a sensitivity of 42.9%, and specificity 87.0%; this cut-off point detected 20.8% of women as a high-risk group and yielded a sensitivity of 35.4%, and specificity of 92.3%. Hence, considering the low sensitivity of the derived cut-off point for prediction of chronic cardio-metabolic disorders in our population, we calculated cut-off points Using Youden's index [32]. The results showed that the cut-off point ≥ 25 for men and ≥ 19 for women during 6-year follow-up has the highest Youden's index. These cut-points detected 43.9% of men and 50.6% of women as high-risk for chronic cardio-metabolic disorders; yielded the sensitivity of 63.8%, and the specificity of 70.9% for men and the sensitivity of 66.7%, and the specificity of 70.0% for women ( Table 3). The estimated values for 9-year follow-up are reported in Additional file 5: Table S3.
We examined the additional value of the educational levels and marital status in the univariable analysis and they were significant predictors of defined outcomes for both men and women;both lost their significance after adjusting the chronic cardio-metabolic disorders risk score into the model. Considering the educational level, discrimination and calibration did not improve  18:267 in the enhanced model (Additional file 7: Table S5). Moreover, considering marital status in the enhanced model, it only improved the calibration performance of the incident CVD; the discrimination remained almost unchanged (Additional file 8: Table S6). Based on the results of the imputed data set (Additional file 9: Table S7) the discrimination and calibration remained essentially unchanged.
In the sensitivity analysis using Cox regression in place of logistic multivariable analysis, model's discrimination and calibration were reduced for prediction of chronic cardio-metabolic disorders; the issue which had more effect on the prediction of incident CKD among women. However, some improvements were observed in the discrimination and the calibration of the model for prediction of incident T2DM (Additional file 10: Table S8).

Discussion
The current study is the second global and the first non-Europoid external validation of a previously developed, non-laboratory based 7-year risk prediction tool for chronic cardio-metabolic disorders. This model showed acceptable discrimination and good calibration for 6-and 9-year risk prediction of chronic cardio-metabolic disorders among the metropolitan city of Tehran. In women, the model performed best for discriminating CVD followed by CKD and among men, for both CVD and CKD. The model performed worst for predicting T2DM in both genders during both follow-up periods. Moreover, the performance of the model remained the same even with the updated cutoff values considering the Iranian ethnicity.
Generally compared with the development data [16], our study population is younger, more obese (general Table 2 18:267 or central), with a higher frequency of family history of T2DM, and lower frequency of premature CVD.
The model showed an acceptable discriminative performance in the TLGS population despite the lower AUC levels (0.72 and 0.73 for men and women, respectively) compared to the development data (AUC of 0.80 and 0.82 for men and women, respectively) [16] and the AusDiab study (AUC of 0.78 and 0.80 for men and women, respectively) [18]. This difference might be explained by the difference in the discrimination for the specific NCDs groups despite the higher incidence of chronic cardio-metabolic disorders (40.2%) compared to the development (36.0%) and AusDiab data (13.3%) (Fig. 2). Moreover, in the current study, we reported the high prevalence of newly diagnosed T2DM and CKD (i.e. those with eGFR 15 to 60 mL/min/1.73 m 2 ) among Iranian population at baseline compared to the development (4.6% and 7.2%, respectively) [16] and AusDiab data (3.7% and 11.2%, respectively) [34,35]. An efficient risk prediction model requires a series of assumptions to eliminate the potential presence of reverse causality [36]. We believe that the high prevalence of newly diagnosed T2DM among the TLGS population at the baseline caused reverse causality that might have affected obesity indices, leading to lower performance of the model in the prediction of T2DM.
Focusing on cut-off points for prediction of chronic cardio-metabolic disorders, the Alssema et al. [16] demonstrated a cut-off point between 35 and 40 with high sensitivity (75 to 85% for men and 83 to 90% for women) and moderate specificity (55 to 66% for men and 49 to 62% for women). Considering the derived score thresholds in our data, the sensitivity levels decreased significantly for predicting chronic cardio-metabolic disorders (Max 42.9%). Since, cut-off point selection depends on the implemented population, considering a trade-off between sensitivity and specificity, score ≥ 25 for men and ≥ 19 for women indicates an acceptable value of sensitivity (63.8% for men and 66.7% for women) and specificity (70.9% for men and 70.0% for women) for predicting 6-year chronic cardio-metabolic disorders in our poplation.
According to the medical university reports [37][38][39], there are 77 health centers in Tehran, each of which covers more than 50,000 people [40] living in the 22 districts of city. Based on the results, of total, 6,722,079 Tehranian aged 28-85 years old (according to the 2016 Iran census) will be classified as high-risk, which requires further investigations. Accordingly,the number of existing centers are not enough and it is recommended to increase this number.
The incidence of CVD was lower in the TLGS population compared to the development and AusDiab data ( Fig. 2). There are several previously developed models comprising non-laboratory measures for the prediction of CVD only [13,41]. One of which is a model introduced by Gaziano et al. [13] for the prediction of CVD. The aforementioned model revealed no significant difference in CVD discrimination compared to our adjusted model. When comparing the performance of the current model with the non-laboratory INTERHEART risk score (AUC = 0.74 (0.70 to 0.78)) in the Middle East population, our model showed good calibration in the 9-year follow-up, better discrimination in women and the same discriminative performance in men; Despite not including T2DM as a major risk factor in chronic cardio-metabolic disorders model [41]. Although CVD showed less contribution to the composite outcome, the model revealed the best CVD discrimination for women and the second-best discriminative performance for CVD in men for both follow-up periods. Other models for prediction of CVD showed the same gender difference as the current model. Framingham CVD risk score is one of the models also validated in Iran. The results were in line with ours and showed higher discrimination in women compared to men [15]. The model showed a good calibration for CVD for both genders during the 9-year follow-up. This could be explained by the time-dependent course of CVD progression leading to a higher rate of CVD incidence in the long-term follow-up.
The incidence of CKD was higher in the TLGS population compared to the development and AusDiab data  . 2). There are several previously developed models comprising non-laboratory measures for the prediction of CKD [42]. CKD had the most contribution to the composite chronic cardio-metabolic disorders outcome. This could be due to the presence of multiple major risk factors of CKD in the current model including age, hypertension, and smoking. The inclusion of laboratory measures could increase the predictive power of the model as it has been addressed in a meta-analysis (C-statistic probability = 0.845) [11]. Considering that only non-laboratory measures were included in the model and eGFR was absent as a major predictor of CKD, an AUC of about 0.76 in men and 0.71 in women is acceptable. The model showed the best CKD discrimination for men and the second-best discriminative performance for CKD in women. Calibration was good for the prediction of CKD. Focusing on T2DM, its incidence was almost similar to the development data but higher than the AusDiab population (Fig. 2). The model showed the worst discriminative performance for T2DM in both men and women. Calibration was good in men but poor in women. Several explanations could be proposed for the poor performance of the model in the prediction of T2DM. Firstly, as mentioned earlier the percentage of newly diagnosed T2DM in the TLGS study was higher compared to the development and AusDiab data [16,35]; this issue affects the discriminative power of the chronic cardio-metabolic disorders model for incident T2DM. Secondly, during 6-years follow-up we previously found that general adiposity was not an independent risk factor for incident T2DM, however including age, SBP as well as waist to hip ratio(WHR) and waist to height ratio(WHtR) in a nonlaboratory model resulted in an AUC of 0.75 (0.72-0.78) [43]. This suggests that replacing BMI or WC with WHtR or WHR could boost the discriminative power of the model for T2DM. Several other studies have also established the higher predictive power of WHtR compared to WC and BMI in the prediction of T2DM [44]. Most of the previously developed prediction tools for T2DM (Finish Diabetes Risk Score (FINDRISK), ADA risk score and AUSDRISK) are indicative of the same discriminative performance as our results; except for AUSDRISK which manifests somehow better discriminative performance in prediction of DM (AUC = 0.787 (0.747-0.787) in TLGS data) [9,45,46].
This study had several strengths. Firstly, this is the first study to validate this model in a non-Europoid population especially in the MENA region with a high burden of NCDs. Secondly, we also examined the accuracy of this model in an extended follow-up period. Thirdly, the calculation of ethnicity-based cut-off values and comparing them with the original cutoff values suggested an insensitivity of the model to ethnicity-based cut-off values.
This study had several limitations. Firstly, in the TLGS study, the incident angina pectoris, history of intermittent claudication (using Rose questionnaire), and peripheral intervention was not assessed, so as shown in Fig. 2, the incident CVD events might have been underestimated. However, despite differences in CVD definitions with the original study, the discriminative power of the chronic cardio-metabolic disorders tool for CVD assessment was acceptable. A higher incidence rate of CVD might have resulted in better calibration performance; which was not captured by updating the intercept as shown in Table 2. However, the calibration improved in longer follow-up (9-year) due to a higher number of CVD events. Regarding heart failure, although it was not assessed at the baseline recruitments, it was evaluated as one of the CVD outcomes in the TLGS study [22]. Secondly, men participants, compared to non-respondents were more obese and reported a higher rate of smoking while women participants reported less frequency of smoking and use of anti-hypertensive medications; leading to over-and underestimation of chronic cardiometabolic disorders among men and women, respectively (Additional file 3: Table S2). Thirdly, as reported in TLGS protocol, age and sex distribution of the population in district No. 13 is representative of the overall urban population of Tehran and Iran at the time of baseline recruitment (Iran National Census, 1996) [19], hence our findings might not be generalizable to the entire population of Iran, especially rural areas. Finally, the values of serum creatinine were not calibrated to the Cleveland Clinic.
The current risk prediction tool is freely available on websites in the Netherlands and is also incorporated into the Dutch guidelines for general practitioners, 'The Prevention Visit' [17]. Recent studies have discussed the cost-effectiveness of the cardio-metabolic risk assessment [47,48]. This model could help differentiate between the high-risk population in need for further risk assessment and those at low risk in the MENA region.

Conclusion
In conclusion, the previously developed non-invasive 7-year risk prediction tool for chronic cardio-metabolic disorders performed well in regards to discrimination and calibration in a non-Europoid population with a 6-and 9-year follow up. The model performed best for prediction of CVD and CKD in both genders but further workup evaluation is needed for better prediction of T2DM. Results from this study suggest that this model has an acceptable performance in other ethnic groups and for a longer follow-up period. World health organization (WHO) has implemented a prevention program to reduce death from NCD by 25% in the