Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up study

Background Chronic kidney disease (CKD) is a growing public health challenges worldwide. Various studies have investigated risk factors of incident CKD; however, a very few studies examined interaction between these risk factors. In an attempt to clarify the potential interactions between risk factors of CKD, we performed survival tree analysis. Methods A total of 8238 participants (46.1% men) aged > 20 years without CKD at baseline [(1999–2001) and (2002–2005)], were followed until 2014. The first occurrence of CKD, defined as the estimated glomerular filtration rate (eGFR) < 60 ml/min/1.73 m2, was set as the main outcome. Multivariable Cox proportional hazard (Cox PH) regression was used to identify significant independent predictors of CKD; moreover, survival tree analysis was performed to gain further insight into the potential interactions between predictors. Results The crude incidence rates of CKD were 20.2 and 35.2 per 1000 person-years in men and women, respectively. The Cox PH identified the main effect of significant predictors of CKD incidence in men and women. In addition, using a limited number of predictors, survival trees identified 12 and 10 subgroups among men and women, respectively, with different survival probability. Accordingly, a group of men with eGFR > 74 ml/min/1.73 m2, age ≤ 46 years, low level of physical activity, waist circumference ≤ 100 cm and FPG ≤ 4.7 mmol/l had the lowest risk of CKD incidence; while men with eGFR ≤ 63.4 ml/min/1.73 m2, age > 50 years had the highest risk for CKD compared to men in the lowest risk group [hazard ratio (HR), 70.68 (34.57–144.52)]. Also, a group of women aged ≤ 45 years and eGFR > 83.5 ml/min/1.73 m2 had the lowest risk; while women with age > 48 years and eGFR ≤ 69 ml/min/1.73 m2 had the highest risk compared to low risk group [HR 27.25 (19.88–37.34)]. Conclusion In this post hoc analysis, we found the independent predictors of CKD using Cox PH; furthermore, by applying survival tree analysis we identified several numbers of homogeneous subgroups with different risk for incidence of CKD. Our study suggests that two methods can be used simultaneously to provide new insights for intervention programs and improve clinical decision making.


Background
Chronic kidney disease (CKD) is a growing public health challenges worldwide [1]. CKD is a major cause of not only progression to renal failure, but also excess cardiovascular morbidity and mortality. It affects 10-15% of the adult population around the world [1,2]. In recent years, a silent CKD epidemic has been proposed by many authors [3,4]. The outbreak is because of the inclusion of a large proportion of the elderly population within stage 3 CKD [5]. Also, in the developing countries, up to 40% of those identified with CKD in screening programs were often young [4]. Because early interventions could prevent or postpone the progression to end stage renal disease (ESRD), identification of risk factors is a major step in dealing with the CKD epidemic [6,7].
The primary method for identifying predictors of incident CKD has been use of traditional regression models such as logistic or Cox proportional hazards (Cox PH) model. However, assessment of interactions in the presence of a large number of variables leads to a complicated model that can be difficult to fit and interpret [8][9][10]. For example, if we have 20 covariates, then there are (20 × 19)/2 = 190 two-factor interaction terms for including into regression models. On the other hand, if many interactions are examined and only the strongest included in the regression model, this would contribute to biased estimation of the effects and overly optimistic performance estimates [11]. Hence, in prediction of CKD, many studies have examined the interactions between only important risk factors such as hypertension or diabetes with other factors [12,13].
Recursive partitioning or 'decision trees' are another class of nonparametric regressions which are widely used in many fields [14][15][16][17]. Survival trees are popular nonparametric alternatives to the Cox PH model, which have great flexibility and can automatically detect certain types of interactions. Moreover, they can naturally group subjects according to their length of survival based on their covariates patterns [11,14,18,19]. The aims of the present study, conducted among Iranian adult population, were to investigate (1) the independent predictors of incident CKD separately among men and women using traditional statistical method and (2) the potential interactions between predictors to identify different risk groups for CKD incidence using survival tree analysis. We used data from the Tehran Lipid and Glucose Study (TLGS) for our investigation.

Study population
The TLGS is an ongoing community-based prospective study performed on a representative sample of residents of Tehran, the capital of the Islamic Republic of Iran. This city composed of 22 urban districts. Study samples were selected from the district 13. The rationales for choosing district 13 were: (1) high stability of the population residing in that district compared to other districts of Tehran, and (2) the age distribution of the population of district 13 was representative of the overall population of Tehran. At first phase of study (1999)(2000)(2001)(2002), a multistage stratified cluster random sampling technique was used to select more than 15,000 people aged ≥ 3 year. The participants then were followed in four consecutive phases, i.e. 2002-2005 (phase 2), 2005-2008 (phase 3), 2008-2011 (phase 4) and the final survey in 2011-2014 (phase 5). Another 3551 residents were invited for the second phase and were followed in the next 3 phases. The details of the study have been published elsewhere [20]. The present study was a post hoc analysis of the TLGS. A total of 12,808 adult participants aged ≥ 20 year from the first and second phases were selected as baseline population and the cohort was re-examined at next phases until the phase 5. Among this participants, we excluded subjects with prevalent CKD (n = 2061) at baseline, and those with missing data on creatinine level (n = 414). Finally, after excluding those without any follow-up data after recruitment (n = 2095), 8238 participants (3795 men and 4443 women) (80% of eligible subjects) were followed until the end of study (Fig. 1). This study was approved by the ethical committee of the Research Institute for Endocrine Sciences of Shahid Beheshti University of Medical Sciences, Tehran, Iran, and conducted in accordance with the Declaration of Helsinki. All participants signed informed consent forms.

Clinical, anthropometric and laboratory measurements
At baseline, information on demographics, education, smoking status, medical and drug history was collected using interview. Anthropometric measures including body weight and height, hip, wrist and waist circumferences (WC) were measured according to a standard protocol. Body mass index (BMI) was calculated as weight in kilogram divided by the square of height in meter. Blood pressure was measured twice and mean of two measurements was considered as the participant's blood pressure. Blood samples were drawn after an overnight fast of ≥ 12 h to assess the following parameters: fasting plasma glucose (FPG) (measured by the enzymatic colorimetric glucose oxidase); total cholesterol (TC) and triglycerides (TGs) (were assayed using the enzymatic colorimetric method) and the Serum creatinine (cr) level which was assayed by kinetic colorimetric Jaffe using commercial kits (Pars Azmoon Inc., Tehran, Iran) by a Selectra 2 auto analyzer (Vital Scientific, Spankeren, The Netherlands) [20,21]. The sensitivity of the assay was 0.2 mg/dl. Reference intervals according to manufacturer's recommendation were 53-97 mmol/l (0.6-1.1 mg/dl) and 80-115 mmol/l (0.9-1.3 mg/dl) in women and men respectively. Both intra-assay and inter-assay CVs were less than 3.1% in both baseline and follow-up phases. Assay performance was monitored after every 25 tests using lyophilized serum controls in normal and abnormal ranges Another 2440 subjects aged ≥ 20 years were participated in the second phase and after exclusion of 273 subjects, 2167 eligible people were followed in the next 3 phases. Overall, 8238 eligible subjects were followed until 2014, and 2308 new cases of CKD (233 in phase 2; 751 in phase 3; 519 in phase 4 and 805 in phase 5 were identified after a median follow-up of 12.4 years. A total of 2095 subjects lost to follow up due to incomplete follow-up data and 5930 individuals did not developed CKD and all samples were assayed when internal quality control met the standard satisfactory criteria [17,18]. In the first phase, physical activity level (PAL) was assessed by the lipid research clinic questionnaire (LRC) [22], which was validated for this study [23]. LRC is one of the most common questionnaires to assess PAL in large-scale epidemiological studies [24]. It was replaced by the Modifiable Activity Questionnaire (MAQ) from the 2nd phase for obtaining the quantitative measure of PAL. This questionnaire measures all three forms of activities including leisure time, job, and household activities in the past year [25].

Definition of variables and outcomes
Kidney function was estimated by the estimated glomerular filtration rate (eGFR) that was calculated using the modification of diet in renal disease (MDRD) equation [26]. The definition of incident CKD was an eGFR of < 60 ml/min per 1.73 m 2 (CKD Stage 3-5) occurring at any time during the follow-up period. Educational status was categorized to 4 levels as illiterate, 1-9 years, 10-12 years and over 12 years of schooling. Marital status was categorized as single, married, and widowed/ divorced. Different categories of smoking status were defined according to WHO guidelines [27]. A current smoker was defined as a person who smokes cigarettes or other smoking implements (water-pipes, or pipes) at least once a day or occasionally. Ex-smoker was formerly daily or occasional smoker who currently does not smoke and non-smoker defined as people who never smoked before. A family history of premature cardio-vascular diseases (CVDs) was considered as any experience of fatal or nonfatal myocardial infarction, stroke or sudden cardiac arrest in first-degree relatives, if it occurred before 55 year of age in male relatives (father, brother and son) and before 65 year of age in female relatives (mother, sister and daughter). Family history of diabetes (FHD) was defined as having type 2 diabetes in first-degree relatives. In first phase, physically active participants were identified as those who were participating in a vigorous physical activity at least 3 days per week [22], and in the second phase, it was defined as achieving a minimum of at least 600 MET (metabolic equivalent task)-minutes per week [28].

Statistical analyses
Descriptive statistics (mean, SD, proportion) were used to describe baseline characteristics. Comparison of baseline characteristics between men vs. women, between participants with and without incident CKD and between followed and non-followed participants was done using student's t test, Chi square and Mann-Whitney test, as appropriate. Incidence density rate of CKD and respective 95% confidence interval (CI) were calculated for each gender, by dividing the number of events to person-years at risk. Time to event was defined as time of censoring or date of incidence of CKD, whichever occurred first. The event date for the incident cases of CKD was defined as midtime between the date of follow-up visit at which the CKD was diagnosed for the first time, and the most recent follow-up visit prior to the diagnosis. For censored subjects, the time was the interval between the first and the last observation dates. Study participants were censored due to death, loss to follow-up, or the end of the observation period. The Stepwise Cox PH regression model using Akaike information criteria (AIC), as model selection approach, was used to identify significant predictors of incident CKD by calculating multivariable hazard ratios (HRs) with 95% CI (after confirming the proportionality in the model). Due to significant effect modification of gender on FPG, SBP, diastolic blood pressure (DBP), eGFR and WC (all p < 0.05), we stratified our analysis by gender. Moreover, for examination of the gender effect in the multivariable model we fitted the Cox PH model on whole population. In further analysis, conditional survival tree was implemented with the same outcome and covariates as for Cox method [29]. In step 1, among m covariate, survival tree algorithm selects the variable X i with the highest ability to separate survivors and non-survivors using p-values from permutation distributions. In step 2, in order to split X i into two disjoint sets (dichotomizing), the algorithm first sorts the values of X i in increasing order. Typically, the midpoint between each pair of adjacent values is considered as a possible cut-point. Therefore, given v values of X i , then v-1 possible splits are evaluated. For each possible cut-point for X i , the value of the log-rank statistic associated with the considered cut-point is computed; then, the cut-point associated with the smallest p-value is selected. This procedure is applied recursively until the tree grew to an optimal number of terminal nodes. A survival tree can naturally group subjects according to their survival time and based on their covariates; hence, it automatically detects complex interactions between covariates without the need to specify them beforehand. In the present study, all baseline variables were entered for constructing survival trees. The minimum criterion for node split was defined as p < 0.05 and the minimum final nodes size were defined as 100 persons. Kaplan-Meier (KM) curves were constructed for each subgroup identified by the survival tree. Also, according the number of subgroups were found by survival tree, we specified one categorical variables with k level (k is the number of terminal nodes or subgroups in survival trees) and estimated the HR of CKD events among the identified subgroups by considering the low risk group (highest survival probability) as the reference category. We used two packages of Party and Survival from the R software (http://www.r-project.org/) for the analysis. Two-tailed p-values < 0.05 were considered statistically significant.

Study participants and baseline characteristics
The study sample consisted of 3795 men, aged 20-84 year (mean, 41.4 ± 14.1) and 4443 women, aged 20-83 year (mean, 37.5 ± 12.1). Table 1 shows the baseline characteristics of the participants. There was a gender difference in all baseline variables except for TC, family history of premature CVD in male relatives, PAL and use of blood glucose lowering drugs. Women were younger and had lower eGFR, WC, wrist circumference, FPG, SBP and DBP, but had higher BMI, hip circumference and heart rate, compared with men (Table 1). A comparison of baseline characteristics between followed and non-followed subjects is shown in Table 2. Followed subjects had a significantly higher BMI (26.5 vs. 26.1 kg/m 2 ), WC (87.4 vs. 86.3 cm), hip circumference (100.2 vs. 99.6 cm) and TC (5.2 vs. 5.1 mmol/l), but lower FPG (5.2 vs. 5.3 mmol/l) and eGFR (75.5 vs. 76.1) compared to non-followed subjects, respectively. Table 3 shows baseline characteristics of the study participants with and without incident CKD. In both genders, the participants who developed incident CKD were older, had higher BMI, FPG, TG, TC, SBP, DBP, WC and wrist circumferences but lower heart rate and were more likely to be married compared with participants free of CKD. Moreover, frequency of taking lipid, blood glucose and blood pressure lowering drugs as well as aspirin was higher in incident cases of CKD compared to non CKD participants (p < 0.01 for all of these measures).

Predictors of incident CKD in men according to Cox model
Among men, age, FPG and DBP were positively associated with the risk of CKD (Table 4). In contrast, WC and TC were negatively associated with the risk of CKD. Also, an inverse association between eGFR and CKD was found in men; every 1-unit increase in eGFR was associated with a 10% reduction in the risk of CKD (HR 0.90; 95% CI 0.89-0.91). The HR for CKD among men with higher education (> 12 years) was 1.56 (1.06-2.28) compared to illiterate group. Being divorced/widower increased the risk of CKD by 204% (1.09-3.84) compared with being married. Interestingly, active men had 21% (1.03-1.42) higher risk of CKD compared to the inactive group. Current smoking was associated with a 24% (1.03-1.48) increase in the risk of CKD. The use of blood glucose lowering drugs at study entry was associated with 59% (1.08-2.35) higher risk of CKD.

Predictors of incident CKD in women according to Cox model
The associations of age, FPG and WC with incidence of CKD in women were similar to those observed for men (Table 4). Every 1-unit increase in eGFR was associated with a 9% reduction in the risk of CKD (HR 0.91; 0.91-0.92). Additionally, SBP and heart rate were significantly associated with CKD incidence. Prevalent CVD at baseline was associated with a 35% (0.99-1.82) increase in the risk of CKD.

Predictors of incident CKD in whole population according to Cox model
In the whole population, women had more than double risk (HR 2.09; 1.82-2.39) of CKD compared with men. Moreover, age, FPG, and DBP were positively associated with CKD incidence. In contrast, eGFR, WC, TC and heart rate were negatively associated with the risk of CKD. Every 1-unit increase in eGFR was associated with a 9% reduction in the risk of CKD (HR 0.91; 0.90-0.92). Prevalent CVD at baseline was associated with a 25% (1.01-1.54) increase in the risk of CKD. The HR for CKD among individuals with higher education (> 12 years) was 1.41 (1.14-1.72) compared to illiterate individuals (Table 5).

Survival tree in men
The results of the survival tree in men are presented in Fig. 3. The eGFR, age, FPG, PAL and WC were selected by the tree to group the male population. The best cutoff values for these predictors were found by the survival tree algorithm. The eGFR was found as the most important predictor for CKD with a cutoff value of 69.8 ml/min/1.73 m 2 . The group with eGFR ≤ 69.8 ml/ min/1.73 m 2 (left side of the tree) was further split by age; so that, the group with eGFR < 63.4 ml/min/1.73 m 2 and age > 50 year had the highest risk for CKD. The lowest risk was found among younger men aged ≤ 46 year and eGFR > 74 ml/min/1.73 m 2 . A moderate risk was explored among the groups with eGFR > 69.85 ml/min/1.73 m 2 and age > 46 year; survival probability among these groups depended on the SBP and FPG. So that, the group with SBP > 119 mmHg and FPG > 5.8 mmol/l had the lowest survival. Ultimately, 12 groups were identified by the survival tree algorithm. The Kaplan-Meier curves for these groups are shown in Fig. 4a. The curves of nodes 4, 5, 7, 8, 22 and 23 were lower than the whole population's curve, showing that they covered high risk groups for the CKD events among men. The symmetric remark can be made for nodes (11, 15, 16, 17, 18 and 20) which had a survival probability above the whole population's curve which covered low risk groups for the CKD events in men. Using node 15 with the lowest risk as the reference group, HRs of CKD was estimated for all other groups (nodes). Accordingly, node 5 had the highest risk for CKD incidence, compared to node 15 (HR 70.6; 34.6-144.5; p < 0.001) ( Table 6).

Survival tree in women
The survival tree for women is shown in Fig. 5. Three variables (age, eGFR and SBP) were found as the most , those with eGFR > 83.5 ml/ min/1.73 m 2 (node 12) and eGFR < 65 ml/min/1.73 m 2 (node 4) had the lowest and highest risk, respectively, for incidence of CKD. The Kaplan-Meier curves for the ten groups are shown in Fig. 4b. The survival curves of nodes 16, 4, 19, 15, and 7 were lower than the whole population's curve, and survival curves of nodes 12, 11, 6, and 9 had a survival probability above the whole population's curve, showing the high risk and low risk groups for the CKD events among women, respectively. As Table 7 shows, node 16 had the highest risk for CKD incidence, compared to node 12 as the reference group (HR 27.25; 19.88-37.34; p < 0.001).

Discussion
The results of the present study showed that more than 20 and 35% of men and of women, respectively, developed CKD during 15 years follow-up. Results of the multivariate Cox PH models showed that women had more than double risk for CKD incidence compared to men. The eGFR, age, FPG and WC were significant predictors for incident CKD in both genders and whole population. However, in men, DBP, TC, current smoking, being divorced/widower, higher educational levels (> 12 year), taking blood glucose lowering drugs and PAL were significantly associated with an increased risk of CKD. In women, it was found that SBP, heart rate and prevalent CVD were significantly associated with CKD in addition to the four common predictors. Compared to other population-based studies in the US, Europe and Japan [30], the incidence rate of CKD was higher in our study, despite the lower mean age of the population. The higher incidence of CKD in our study population could be due to life style changes, leading to a sharp rise in risk factors of chronic diseases [31]. In line with our previous study [32], we found female gender and aging as an independent factor associated with CKD, as reported by other studies [30,33].
Interestingly, Cox models showed that higher WC was associated with lower risks of CKD among both genders. Some studies have shown that a higher WC was associated with the lower risk of CKD [32]; while a number of studies reported no significant association [34]. Surprisingly, we found a negative association between TC and incident CKD in men and whole population. The association between CKD and dyslipidemia has not been demonstrated in prospective studies, and is still subject to controversy [35,36].
Hypertension is recognized as a strong predictor for progression of renal dysfunction [32]. We found a positive and significant relationship between hypertension and CKD in both genders and whole population. In line with other studies [37], we showed a positive association between FPG and incidence of CKD in both genders and whole population. In the present study, among whole population and men, individuals with high educational levels (> 12 years of education), showed increased risk for CKD compared to illiterates. In our previous study, a similar association was found in women [32]. However, inverse association has been reported in other cross sectional and prospective studies [38]. The positive association between high educational levels and incident CKD observed in our study may be attributed to the direct and indirect impact of higher education on higher income. Since high income is almost coincides with high fat, protein and salt consumption, it affects the incident CKD through increasing prevalence of some chronic disease such as hypertension, diabetes, and obesity [39]. Further investigations are needed in order to better clarify this relationships.  Smoking is known as an independent risk factor for progression to CKD [37,40]. Our study also showed that current smokers had increased risk of CKD compared to never smokers only in men.
To date, studies assessing the association between physical activity and CKD incidence have provided conflicting results [41]. Surprisingly, we found 21% higher risk of CKD in "active" men.
We found significant association between elevated heart rate and lower risks of CKD incidence among women and whole population. This association is contrary to the results of Brotman [42], who reported an association between higher resting heart rate and CKD incidence. As heart rate is affected by many interior and exterior factors, more studies are needed to explain this unexpected result.
Compared with the general population, patients with CKD have an unacceptably higher risk for premature death, primarily as a result of CVD [43]. In our study, prevalent CVD at baseline was associated with risk of developing CKD among women and whole population.

Interaction patterns identified by the survival trees
According to survival tree analysis, eGFR and age were found as the most important predictors for CKD incidence among men and women, respectively. Current evidences from cross sectional and longitudinal studies suggest that decline in GFR is a universal phenomenon which happens after age 30 in both men and women, and the rate of decline accelerates after age 50-60 years [44]. This decline appears to be a part of the normal physiologic process and is associated with structural changes in the kidney [45].
Survival tree in woman (Fig. 5) shows that not only aging is the most important predictor for CKD progression, but suggesting that the critical threshold of age for developing CKD is much lower than 30. As Fig. 5 shows, a group of women with age ≤ 29 year and eGFR between 65 and 71 ml/min/1.73 m 2 had a moderate risk of developing CKD during the follow-up period (node 6).
According to our findings, the lowest risk was found among women aged ≤ 45 year and with eGFR > 83 ml/ min/1.73 m 2 ; suggesting that the threshold of ≥ 83 ml/ min/1.73 m 2 for eGFR is a safe value for decreasing the risk of CKD among women aged ≤ 45 year. Figure 5 also shows that among women aged > 45 year, a SBP > 133 mmHg (high-range pre-hypertension and hypertension) was significantly associated with increased risk of CKD even with eGFR > 69 ml/min/1.73 m 2 . According to a recent study, the overall prevalence of prehypertension was approximately 46% among adult Iranian population [46]; therefore, the effective strategies for the prevention and treatment of prehypertension is required to be considered among Iranian women.
Overall, the survival tree in women found 10 prognostic groups, defined only by the age, eGFR and SBP; it does not mean that other variables have no predictive

Table 5 Multivariate Cox regression analysis of factors associated with CKD incidence in total population, Tehran Lipid and Glucose Study (1999-2014)
The Stepwise Cox PH regression model using AIC as model selection approach was used to identify significant predictors of incident CKD by calculating multivariable hazard ratios (HRs) with 95% CI TLGS Tehran Lipid and Glucose Study, PH proportional hazard, CI confidence interval, AIC Akaike information criteria, FPG fasting plasma glucose, TC total cholesterol, eGFR estimated glomerular filtration rate, SBP systolic blood pressure, DBP diastolic blood pressure, FHD family history of diabetes mellitus, PAL physical activity level, CVD cardiovascular disease a Doing exercise or labor less than three times a week or performing activities achieving < 600 metabolic equivalent of task (MET) ability in the CKD incidence. If the tree is growing in full depth, other variables may be involved in the prediction of CKD; however, the number of subjects in the terminal nodes will be so small for significant results. Survival tree in male population identified 12 subgroups with different patterns (Fig. 3). The highest risk was found among men with eGFR < 63 ml/min/1.73 m 2 (nodes 4 and 5).
As we previously mentioned, the Cox PH model found negative association between WC and PAL in relation to risk of CKD among male population (Table 4). These unexpected results may be explained by the interactions found by the survival tree; as the Fig. 3 shows, low PAL and WC > 100 cm was significantly associated with increased risk of CKD (node 17); while, WC < 100 cm and FPG ≤ 4.7 mmol/l improved survival rate among inactive men (node 15). These findings suggest that there is a particularly strong interaction between these factors in relation to CKD incidence among male population. Survival tree also found a group of men aged > 46 year with a SBP > 119 mmHg (prehypertension and hypertension) and FPG > 5.83 mmol/l (pre-diabetes and diabetes), who had moderate risk of CKD incidence. In line with our findings, Yamagata et al. [37] showed that hypertension and diabetes were independent predictor for developing CKD. Because of the high prevalence and incidence of pre-hypertension, hypertension, pre-diabetes and diabetes among Iranian population [47], more attention should be paid to early intervention and risk factor modification for this high risk group.
In summary, survival trees are the helpful exploratory tools that have been applied in recent years in different medical studies [48][49][50][51]. However, investigators should be aware of their drawbacks and limitations. A major issue with survival trees is that they can be built to fit the training data too perfectly (over fit); while it may not perform well on other unknown data [52]. We used the conditional inference survival tree developed by Hothorn et al. [53], which is shown to be more reliable and less likely to over fit compared to the other survival tree algorithms.
This study is subject to some limitations. First, our study was a post hoc analysis of a prospective cohort that was not able to control for some covariates related to CKD such as family history of kidney disease that is related to decline in kidney function beginning at early ages [54]. Additionally, no quantitative urinary indexes were available at the baseline or follow-up visit, because,  Fig. 3 for male population. It can be seen that the curves of nodes 4, 5, 7, 8, 22 and 23 are lower than the whole population's curve, showing that they cover high risk groups for the CKD events. The symmetric remark can be made for nodes (11, 15, 16, 17, 18 and 20) which have a survival probability above the whole population's curve which cover low risk groups for the CKD events among men. b Kaplan-Meier survival of 10 nodes of the survival tree of Fig. 5 for female population. The survival curves of nodes 16, 4, 19, 15, and 7 are lower than the whole population's curve, and survival curves of nodes 12, 11, 6, and 9 have a survival probability above the whole population's curve, showing the high risk and low risk groups for the CKD events among women, respectively Table 6 Cox analyses of patterns identified using survival tree in the male population, Tehran Lipid and Glucose Study (1999-2014) Column 1 shows the number of terminal nodes of survival tree of Fig. 3 for male population. Column 2 displays the sample size and number of events in each terminal node. Column 3 represents the patterns of each terminal node. Column 4 shows the HR for each node compared to node with high survival probability as the reference node CI Confidence intervals, HRs Hazard ratio, SBP Systolic blood pressure, eGFR Estimated glomerular filtration rate, FPG Fasting plasma glucose, PAL Physical activity level, WC Waist circumference, MET Metabolic equivalent of task  Circles represent the most significant variable based on log-rank (LR) and permutation P for splitting population to smaller groups. Three variables were entered as the most important predictors of the occurrence of CKD over a 12.4 year period. Node 1 at the top of figure shows that age is the most important variable for first split (p < 0.001). The best cutoff value for age was 45 years; accordingly, female populations were divided into two groups: left (≤ 45 year) and right (> 45 year) branches. This procedure was applied recursively until the tree grew to an optimal number of terminal nodes. Therefore, 10 groups were identified by the survival tree algorithm. Each path from the first node to a terminal node specifies a combination of predictors and their cutoff values leading to a terminal node forms an interaction pattern. Each interaction pattern specifies a subgroup of individuals with similar survival probability. For example, node 4 shows the survival probability for a group of women aged ≤ 45 year with eGFR of ≤ 64.97 mL/min/1.73 m 2 which is worse than survival of nodes 12. SBP systolic blood pressure, eGFR estimated glomerular filtration rate lost to follow-up in our study; we found statistically but not clinically important differences between the followed vs. non-followed population in some baseline variables. In total population, mean values of FPG and eGFR were lower, and TC and WC were higher in followed compared to non-followed population. Since FPG and eGFR were reported to be the most important predictors for CKD among Iranian population [32], we may underestimate the incidence of CKD in our study. To evaluate the extent to which lost to follow-up may have influenced measures of association in final models, we calculated propensity scores in both genders and in the total population. The PS is the conditional probability that an individual participate in the follow-up, given a set of covariates [55]. To obtain PS, we applied multivariable logistic regression with participation in the follow-up as the outcome and all baseline variables as the predictors. The estimated probability of participation, or the PS, derived from the multivariable logistic regression model described above was added to the final Cox PH models. In general, in both genders and total population, there were no changes in estimated HRs and CIs for predictors in the Cox PH models. We also added PS as a covariate to develop survival tree models along with all baseline covariates; interestingly, adding the PS did not alter the structure of survival tree in both genders and total population (data not shown). Therefore, lost to follow-up did not cause selection bias in our study. And last but not least, our diagnosis of CKD was based on a single estimate of GFR, which we acknowledge tends to overestimate the incidence of kidney disease [37]. As strengths, this is the first study that identified interactions between CKD risk factors in a well-characterized cohort of adults using the survival tree approach. We used data from a large, long-term, population-based cohort of men and women. In fact, in men, the statistical power of our study to detect a HR of 20, 30 and 50% for incident CKD was 72, 95 and 99%; the corresponding values in women were 94, 99 and 100%, respectively. We assessed the actual measurement of blood pressure and laboratory parameters, rather than self-reported data. In addition, we investigated a wide range of variables as potential predictors.
As, there will never be enough resources to implement every programme for all population, health policy makers prefer interventions that target high-risk groups [56]. Our study, identified not only risk factors of CKD in total population by Cox model but also identified high-risk groups by survival tree analysis, which can be used for implement specific interventions for each group according their level of risk.

Conclusions
Cox regression models is suitable for demonstrating the relation of a predictor to the outcome; while, survival trees are ideally suited for exploration of some prognostic groups of individuals by detecting certain types of interactions without the need to specify them beforehand. In our study, Cox PH models identified the main effects of the several traditional risk factors of CKD; however, survival tree analysis identified a number of interactions between those risk factors. Survival trees created 12 subgroups with different patterns of eGFR, age, FPG, physical activity level and WC in men. Among women, age, eGFR and SBP interacted with each other and identified 10 subgroups. In conclusion, these two statistical methods can be deemed to be somewhat complementary for survival data analysis. The prognostic groups could potentially be valuable for early detection and management of high risk individuals for CKD in the public health setting and at clinic. Authors' contributions AR analyzed data and wrote the manuscript. MT participated in the interpretation of results. FA and FH, designed the study protocol, and participated in the coordination and management of the study. All authors read and approved the final manuscript.