- Open Access
Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up study
Journal of Translational Medicine volume 15, Article number: 240 (2017)
Chronic kidney disease (CKD) is a growing public health challenges worldwide. Various studies have investigated risk factors of incident CKD; however, a very few studies examined interaction between these risk factors. In an attempt to clarify the potential interactions between risk factors of CKD, we performed survival tree analysis.
A total of 8238 participants (46.1% men) aged > 20 years without CKD at baseline [(1999–2001) and (2002–2005)], were followed until 2014. The first occurrence of CKD, defined as the estimated glomerular filtration rate (eGFR) < 60 ml/min/1.73 m2, was set as the main outcome. Multivariable Cox proportional hazard (Cox PH) regression was used to identify significant independent predictors of CKD; moreover, survival tree analysis was performed to gain further insight into the potential interactions between predictors.
The crude incidence rates of CKD were 20.2 and 35.2 per 1000 person-years in men and women, respectively. The Cox PH identified the main effect of significant predictors of CKD incidence in men and women. In addition, using a limited number of predictors, survival trees identified 12 and 10 subgroups among men and women, respectively, with different survival probability. Accordingly, a group of men with eGFR > 74 ml/min/1.73 m2, age ≤ 46 years, low level of physical activity, waist circumference ≤ 100 cm and FPG ≤ 4.7 mmol/l had the lowest risk of CKD incidence; while men with eGFR ≤ 63.4 ml/min/1.73 m2, age > 50 years had the highest risk for CKD compared to men in the lowest risk group [hazard ratio (HR), 70.68 (34.57–144.52)]. Also, a group of women aged ≤ 45 years and eGFR > 83.5 ml/min/1.73 m2 had the lowest risk; while women with age > 48 years and eGFR ≤ 69 ml/min/1.73 m2 had the highest risk compared to low risk group [HR 27.25 (19.88–37.34)].
In this post hoc analysis, we found the independent predictors of CKD using Cox PH; furthermore, by applying survival tree analysis we identified several numbers of homogeneous subgroups with different risk for incidence of CKD. Our study suggests that two methods can be used simultaneously to provide new insights for intervention programs and improve clinical decision making.
Chronic kidney disease (CKD) is a growing public health challenges worldwide . CKD is a major cause of not only progression to renal failure, but also excess cardiovascular morbidity and mortality. It affects 10–15% of the adult population around the world [1, 2]. In recent years, a silent CKD epidemic has been proposed by many authors [3, 4]. The outbreak is because of the inclusion of a large proportion of the elderly population within stage 3 CKD . Also, in the developing countries, up to 40% of those identified with CKD in screening programs were often young . Because early interventions could prevent or postpone the progression to end stage renal disease (ESRD), identification of risk factors is a major step in dealing with the CKD epidemic [6, 7].
The primary method for identifying predictors of incident CKD has been use of traditional regression models such as logistic or Cox proportional hazards (Cox PH) model. However, assessment of interactions in the presence of a large number of variables leads to a complicated model that can be difficult to fit and interpret [8,9,10]. For example, if we have 20 covariates, then there are (20 × 19)/2 = 190 two-factor interaction terms for including into regression models. On the other hand, if many interactions are examined and only the strongest included in the regression model, this would contribute to biased estimation of the effects and overly optimistic performance estimates . Hence, in prediction of CKD, many studies have examined the interactions between only important risk factors such as hypertension or diabetes with other factors [12, 13].
Recursive partitioning or ‘decision trees’ are another class of nonparametric regressions which are widely used in many fields [14,15,16,17]. Survival trees are popular nonparametric alternatives to the Cox PH model, which have great flexibility and can automatically detect certain types of interactions. Moreover, they can naturally group subjects according to their length of survival based on their covariates patterns [11, 14, 18, 19]. The aims of the present study, conducted among Iranian adult population, were to investigate (1) the independent predictors of incident CKD separately among men and women using traditional statistical method and (2) the potential interactions between predictors to identify different risk groups for CKD incidence using survival tree analysis. We used data from the Tehran Lipid and Glucose Study (TLGS) for our investigation.
The TLGS is an ongoing community-based prospective study performed on a representative sample of residents of Tehran, the capital of the Islamic Republic of Iran. This city composed of 22 urban districts. Study samples were selected from the district 13. The rationales for choosing district 13 were: (1) high stability of the population residing in that district compared to other districts of Tehran, and (2) the age distribution of the population of district 13 was representative of the overall population of Tehran. At first phase of study (1999–2002), a multistage stratified cluster random sampling technique was used to select more than 15,000 people aged ≥ 3 year. The participants then were followed in four consecutive phases, i.e. 2002–2005 (phase 2), 2005–2008 (phase 3), 2008–2011 (phase 4) and the final survey in 2011–2014 (phase 5). Another 3551 residents were invited for the second phase and were followed in the next 3 phases. The details of the study have been published elsewhere . The present study was a post hoc analysis of the TLGS. A total of 12,808 adult participants aged ≥ 20 year from the first and second phases were selected as baseline population and the cohort was re-examined at next phases until the phase 5. Among this participants, we excluded subjects with prevalent CKD (n = 2061) at baseline, and those with missing data on creatinine level (n = 414). Finally, after excluding those without any follow-up data after recruitment (n = 2095), 8238 participants (3795 men and 4443 women) (80% of eligible subjects) were followed until the end of study (Fig. 1). This study was approved by the ethical committee of the Research Institute for Endocrine Sciences of Shahid Beheshti University of Medical Sciences, Tehran, Iran, and conducted in accordance with the Declaration of Helsinki. All participants signed informed consent forms.
Clinical, anthropometric and laboratory measurements
At baseline, information on demographics, education, smoking status, medical and drug history was collected using interview. Anthropometric measures including body weight and height, hip, wrist and waist circumferences (WC) were measured according to a standard protocol. Body mass index (BMI) was calculated as weight in kilogram divided by the square of height in meter. Blood pressure was measured twice and mean of two measurements was considered as the participant’s blood pressure. Blood samples were drawn after an overnight fast of ≥ 12 h to assess the following parameters: fasting plasma glucose (FPG) (measured by the enzymatic colorimetric glucose oxidase); total cholesterol (TC) and triglycerides (TGs) (were assayed using the enzymatic colorimetric method) and the Serum creatinine (cr) level which was assayed by kinetic colorimetric Jaffe using commercial kits (Pars Azmoon Inc., Tehran, Iran) by a Selectra 2 auto analyzer (Vital Scientific, Spankeren, The Netherlands) [20, 21]. The sensitivity of the assay was 0.2 mg/dl. Reference intervals according to manufacturer’s recommendation were 53–97 mmol/l (0.6–1.1 mg/dl) and 80–115 mmol/l (0.9–1.3 mg/dl) in women and men respectively. Both intra-assay and inter-assay CVs were less than 3.1% in both baseline and follow-up phases. Assay performance was monitored after every 25 tests using lyophilized serum controls in normal and abnormal ranges and all samples were assayed when internal quality control met the standard satisfactory criteria [17, 18]. In the first phase, physical activity level (PAL) was assessed by the lipid research clinic questionnaire (LRC) , which was validated for this study . LRC is one of the most common questionnaires to assess PAL in large-scale epidemiological studies . It was replaced by the Modifiable Activity Questionnaire (MAQ) from the 2nd phase for obtaining the quantitative measure of PAL. This questionnaire measures all three forms of activities including leisure time, job, and household activities in the past year .
Definition of variables and outcomes
Kidney function was estimated by the estimated glomerular filtration rate (eGFR) that was calculated using the modification of diet in renal disease (MDRD) equation . The definition of incident CKD was an eGFR of < 60 ml/min per 1.73 m2 (CKD Stage 3–5) occurring at any time during the follow-up period. Educational status was categorized to 4 levels as illiterate, 1–9 years, 10–12 years and over 12 years of schooling. Marital status was categorized as single, married, and widowed/divorced. Different categories of smoking status were defined according to WHO guidelines . A current smoker was defined as a person who smokes cigarettes or other smoking implements (water-pipes, or pipes) at least once a day or occasionally. Ex-smoker was formerly daily or occasional smoker who currently does not smoke and non-smoker defined as people who never smoked before. A family history of premature cardio-vascular diseases (CVDs) was considered as any experience of fatal or non-fatal myocardial infarction, stroke or sudden cardiac arrest in first-degree relatives, if it occurred before 55 year of age in male relatives (father, brother and son) and before 65 year of age in female relatives (mother, sister and daughter). Family history of diabetes (FHD) was defined as having type 2 diabetes in first-degree relatives. In first phase, physically active participants were identified as those who were participating in a vigorous physical activity at least 3 days per week , and in the second phase, it was defined as achieving a minimum of at least 600 MET (metabolic equivalent task)-minutes per week .
Descriptive statistics (mean, SD, proportion) were used to describe baseline characteristics. Comparison of baseline characteristics between men vs. women, between participants with and without incident CKD and between followed and non-followed participants was done using student’s t test, Chi square and Mann–Whitney test, as appropriate. Incidence density rate of CKD and respective 95% confidence interval (CI) were calculated for each gender, by dividing the number of events to person-years at risk. Time to event was defined as time of censoring or date of incidence of CKD, whichever occurred first. The event date for the incident cases of CKD was defined as midtime between the date of follow-up visit at which the CKD was diagnosed for the first time, and the most recent follow-up visit prior to the diagnosis. For censored subjects, the time was the interval between the first and the last observation dates. Study participants were censored due to death, loss to follow-up, or the end of the observation period. The Stepwise Cox PH regression model using Akaike information criteria (AIC), as model selection approach, was used to identify significant predictors of incident CKD by calculating multivariable hazard ratios (HRs) with 95% CI (after confirming the proportionality in the model). Due to significant effect modification of gender on FPG, SBP, diastolic blood pressure (DBP), eGFR and WC (all p < 0.05), we stratified our analysis by gender. Moreover, for examination of the gender effect in the multivariable model we fitted the Cox PH model on whole population. In further analysis, conditional survival tree was implemented with the same outcome and covariates as for Cox method . In step 1, among m covariate, survival tree algorithm selects the variable X i with the highest ability to separate survivors and non-survivors using p-values from permutation distributions. In step 2, in order to split X i into two disjoint sets (dichotomizing), the algorithm first sorts the values of X i in increasing order. Typically, the midpoint between each pair of adjacent values is considered as a possible cut-point. Therefore, given v values of X i , then v-1 possible splits are evaluated. For each possible cut-point for X i , the value of the log-rank statistic associated with the considered cut-point is computed; then, the cut-point associated with the smallest p-value is selected. This procedure is applied recursively until the tree grew to an optimal number of terminal nodes. A survival tree can naturally group subjects according to their survival time and based on their covariates; hence, it automatically detects complex interactions between covariates without the need to specify them beforehand. In the present study, all baseline variables were entered for constructing survival trees. The minimum criterion for node split was defined as p < 0.05 and the minimum final nodes size were defined as 100 persons. Kaplan–Meier (KM) curves were constructed for each subgroup identified by the survival tree. Also, according the number of subgroups were found by survival tree, we specified one categorical variables with k level (k is the number of terminal nodes or subgroups in survival trees) and estimated the HR of CKD events among the identified subgroups by considering the low risk group (highest survival probability) as the reference category. We used two packages of Party and Survival from the R software (http://www.r-project.org/) for the analysis. Two-tailed p-values < 0.05 were considered statistically significant.
Study participants and baseline characteristics
The study sample consisted of 3795 men, aged 20–84 year (mean, 41.4 ± 14.1) and 4443 women, aged 20–83 year (mean, 37.5 ± 12.1). Table 1 shows the baseline characteristics of the participants. There was a gender difference in all baseline variables except for TC, family history of premature CVD in male relatives, PAL and use of blood glucose lowering drugs. Women were younger and had lower eGFR, WC, wrist circumference, FPG, SBP and DBP, but had higher BMI, hip circumference and heart rate, compared with men (Table 1).
A comparison of baseline characteristics between followed and non-followed subjects is shown in Table 2. Followed subjects had a significantly higher BMI (26.5 vs. 26.1 kg/m2), WC (87.4 vs. 86.3 cm), hip circumference (100.2 vs. 99.6 cm) and TC (5.2 vs. 5.1 mmol/l), but lower FPG (5.2 vs. 5.3 mmol/l) and eGFR (75.5 vs. 76.1) compared to non-followed subjects, respectively.
Table 3 shows baseline characteristics of the study participants with and without incident CKD. In both genders, the participants who developed incident CKD were older, had higher BMI, FPG, TG, TC, SBP, DBP, WC and wrist circumferences but lower heart rate and were more likely to be married compared with participants free of CKD. Moreover, frequency of taking lipid, blood glucose and blood pressure lowering drugs as well as aspirin was higher in incident cases of CKD compared to non CKD participants (p < 0.01 for all of these measures).
Overall, 2308 new cases of CKD (men = 793 and women = 1515) were identified after a median follow-up of 12.4 [interquartile range (IQR), 9.8–13.6] years. Among incident cases of CKD, 2294 subjects developed stage 3 CKD (eGFR 30–59 ml/min/1.73 m2), 12 cases developed stage 4 CKD (eGFR 15–29 ml/min/1.73 m2), and 2 cases developed stage 5 CKD (eGFR < 15 ml/min/1.73 m2). The incidence density rates of CKD were 20.2 (95% CI 18.8–21.7) and 35.2 (33.5–37.1) per 1000 person-years in men and women, respectively. Figure 2 depicts the survival curve for men and women. Survival of women was lower than that of men during study period (Log-Rank χ2 176, p < 0.001). From the 8238 baseline population, 269 (men = 196) cases of mortality were documented during the study period; of these, 119 (men = 93) cases died from CVD.
Predictors of incident CKD in men according to Cox model
Among men, age, FPG and DBP were positively associated with the risk of CKD (Table 4). In contrast, WC and TC were negatively associated with the risk of CKD. Also, an inverse association between eGFR and CKD was found in men; every 1-unit increase in eGFR was associated with a 10% reduction in the risk of CKD (HR 0.90; 95% CI 0.89–0.91). The HR for CKD among men with higher education (> 12 years) was 1.56 (1.06–2.28) compared to illiterate group. Being divorced/widower increased the risk of CKD by 204% (1.09–3.84) compared with being married. Interestingly, active men had 21% (1.03–1.42) higher risk of CKD compared to the inactive group. Current smoking was associated with a 24% (1.03–1.48) increase in the risk of CKD. The use of blood glucose lowering drugs at study entry was associated with 59% (1.08–2.35) higher risk of CKD.
Predictors of incident CKD in women according to Cox model
The associations of age, FPG and WC with incidence of CKD in women were similar to those observed for men (Table 4). Every 1-unit increase in eGFR was associated with a 9% reduction in the risk of CKD (HR 0.91; 0.91–0.92). Additionally, SBP and heart rate were significantly associated with CKD incidence. Prevalent CVD at baseline was associated with a 35% (0.99–1.82) increase in the risk of CKD.
Predictors of incident CKD in whole population according to Cox model
In the whole population, women had more than double risk (HR 2.09; 1.82–2.39) of CKD compared with men. Moreover, age, FPG, and DBP were positively associated with CKD incidence. In contrast, eGFR, WC, TC and heart rate were negatively associated with the risk of CKD. Every 1-unit increase in eGFR was associated with a 9% reduction in the risk of CKD (HR 0.91; 0.90–0.92). Prevalent CVD at baseline was associated with a 25% (1.01–1.54) increase in the risk of CKD. The HR for CKD among individuals with higher education (> 12 years) was 1.41 (1.14–1.72) compared to illiterate individuals (Table 5).
Survival tree in men
The results of the survival tree in men are presented in Fig. 3. The eGFR, age, FPG, PAL and WC were selected by the tree to group the male population. The best cutoff values for these predictors were found by the survival tree algorithm. The eGFR was found as the most important predictor for CKD with a cutoff value of 69.8 ml/min/1.73 m2. The group with eGFR ≤ 69.8 ml/min/1.73 m2 (left side of the tree) was further split by age; so that, the group with eGFR < 63.4 ml/min/1.73 m2 and age > 50 year had the highest risk for CKD. The lowest risk was found among younger men aged ≤ 46 year and eGFR > 74 ml/min/1.73 m2. A moderate risk was explored among the groups with eGFR > 69.85 ml/min/1.73 m2 and age > 46 year; survival probability among these groups depended on the SBP and FPG. So that, the group with SBP > 119 mmHg and FPG > 5.8 mmol/l had the lowest survival. Ultimately, 12 groups were identified by the survival tree algorithm. The Kaplan–Meier curves for these groups are shown in Fig. 4a. The curves of nodes 4, 5, 7, 8, 22 and 23 were lower than the whole population’s curve, showing that they covered high risk groups for the CKD events among men. The symmetric remark can be made for nodes (11, 15, 16, 17, 18 and 20) which had a survival probability above the whole population’s curve which covered low risk groups for the CKD events in men. Using node 15 with the lowest risk as the reference group, HRs of CKD was estimated for all other groups (nodes). Accordingly, node 5 had the highest risk for CKD incidence, compared to node 15 (HR 70.6; 34.6–144.5; p < 0.001) (Table 6).
Survival tree in women
The survival tree for women is shown in Fig. 5. Three variables (age, eGFR and SBP) were found as the most significant predictors associated with CKD. Initially, the participants were split by age with the cutoff value of 45 year (p < 0.001). On the right side of the tree, the group aged > 45 year was split by eGFR with a cutoff value of 69.35 ml/min/1.73 m2; then, the group with eGFR > 69.35 ml/min/1.73 m2 was split by SBP with a cutoff value of 133 mmHg. Further inspection of the tree showed that in age > 45 year, a SBP > 133 mmHg can significantly increase the risk of CKD. Among women in younger age (< 45 year), those with eGFR > 83.5 ml/min/1.73 m2 (node 12) and eGFR < 65 ml/min/1.73 m2 (node 4) had the lowest and highest risk, respectively, for incidence of CKD. The Kaplan–Meier curves for the ten groups are shown in Fig. 4b. The survival curves of nodes 16, 4, 19, 15, and 7 were lower than the whole population’s curve, and survival curves of nodes 12, 11, 6, and 9 had a survival probability above the whole population’s curve, showing the high risk and low risk groups for the CKD events among women, respectively. As Table 7 shows, node 16 had the highest risk for CKD incidence, compared to node 12 as the reference group (HR 27.25; 19.88–37.34; p < 0.001).
The results of the present study showed that more than 20 and 35% of men and of women, respectively, developed CKD during 15 years follow-up. Results of the multivariate Cox PH models showed that women had more than double risk for CKD incidence compared to men. The eGFR, age, FPG and WC were significant predictors for incident CKD in both genders and whole population. However, in men, DBP, TC, current smoking, being divorced/widower, higher educational levels (> 12 year), taking blood glucose lowering drugs and PAL were significantly associated with an increased risk of CKD. In women, it was found that SBP, heart rate and prevalent CVD were significantly associated with CKD in addition to the four common predictors.
Compared to other population-based studies in the US, Europe and Japan , the incidence rate of CKD was higher in our study, despite the lower mean age of the population. The higher incidence of CKD in our study population could be due to life style changes, leading to a sharp rise in risk factors of chronic diseases . In line with our previous study , we found female gender and aging as an independent factor associated with CKD, as reported by other studies [30, 33].
Interestingly, Cox models showed that higher WC was associated with lower risks of CKD among both genders. Some studies have shown that a higher WC was associated with the lower risk of CKD ; while a number of studies reported no significant association . Surprisingly, we found a negative association between TC and incident CKD in men and whole population. The association between CKD and dyslipidemia has not been demonstrated in prospective studies, and is still subject to controversy [35, 36].
Hypertension is recognized as a strong predictor for progression of renal dysfunction . We found a positive and significant relationship between hypertension and CKD in both genders and whole population. In line with other studies , we showed a positive association between FPG and incidence of CKD in both genders and whole population. In the present study, among whole population and men, individuals with high educational levels (> 12 years of education), showed increased risk for CKD compared to illiterates. In our previous study, a similar association was found in women . However, inverse association has been reported in other cross sectional and prospective studies . The positive association between high educational levels and incident CKD observed in our study may be attributed to the direct and indirect impact of higher education on higher income. Since high income is almost coincides with high fat, protein and salt consumption, it affects the incident CKD through increasing prevalence of some chronic disease such as hypertension, diabetes, and obesity . Further investigations are needed in order to better clarify this relationships.
Smoking is known as an independent risk factor for progression to CKD [37, 40]. Our study also showed that current smokers had increased risk of CKD compared to never smokers only in men.
To date, studies assessing the association between physical activity and CKD incidence have provided conflicting results . Surprisingly, we found 21% higher risk of CKD in “active” men.
We found significant association between elevated heart rate and lower risks of CKD incidence among women and whole population. This association is contrary to the results of Brotman , who reported an association between higher resting heart rate and CKD incidence. As heart rate is affected by many interior and exterior factors, more studies are needed to explain this unexpected result.
Compared with the general population, patients with CKD have an unacceptably higher risk for premature death, primarily as a result of CVD . In our study, prevalent CVD at baseline was associated with risk of developing CKD among women and whole population.
Interaction patterns identified by the survival trees
According to survival tree analysis, eGFR and age were found as the most important predictors for CKD incidence among men and women, respectively. Current evidences from cross sectional and longitudinal studies suggest that decline in GFR is a universal phenomenon which happens after age 30 in both men and women, and the rate of decline accelerates after age 50–60 years . This decline appears to be a part of the normal physiologic process and is associated with structural changes in the kidney .
Survival tree in woman (Fig. 5) shows that not only aging is the most important predictor for CKD progression, but suggesting that the critical threshold of age for developing CKD is much lower than 30. As Fig. 5 shows, a group of women with age ≤ 29 year and eGFR between 65 and 71 ml/min/1.73 m2 had a moderate risk of developing CKD during the follow-up period (node 6).
According to our findings, the lowest risk was found among women aged ≤ 45 year and with eGFR > 83 ml/min/1.73 m2; suggesting that the threshold of ≥ 83 ml/min/1.73 m2 for eGFR is a safe value for decreasing the risk of CKD among women aged ≤ 45 year.
Figure 5 also shows that among women aged > 45 year, a SBP > 133 mmHg (high-range pre-hypertension and hypertension) was significantly associated with increased risk of CKD even with eGFR > 69 ml/min/1.73 m2. According to a recent study, the overall prevalence of prehypertension was approximately 46% among adult Iranian population ; therefore, the effective strategies for the prevention and treatment of prehypertension is required to be considered among Iranian women.
Overall, the survival tree in women found 10 prognostic groups, defined only by the age, eGFR and SBP; it does not mean that other variables have no predictive ability in the CKD incidence. If the tree is growing in full depth, other variables may be involved in the prediction of CKD; however, the number of subjects in the terminal nodes will be so small for significant results.
Survival tree in male population identified 12 subgroups with different patterns (Fig. 3). The highest risk was found among men with eGFR < 63 ml/min/1.73 m2 (nodes 4 and 5).
As we previously mentioned, the Cox PH model found negative association between WC and PAL in relation to risk of CKD among male population (Table 4). These unexpected results may be explained by the interactions found by the survival tree; as the Fig. 3 shows, low PAL and WC > 100 cm was significantly associated with increased risk of CKD (node 17); while, WC < 100 cm and FPG ≤ 4.7 mmol/l improved survival rate among inactive men (node 15). These findings suggest that there is a particularly strong interaction between these factors in relation to CKD incidence among male population. Survival tree also found a group of men aged > 46 year with a SBP > 119 mmHg (prehypertension and hypertension) and FPG > 5.83 mmol/l (pre-diabetes and diabetes), who had moderate risk of CKD incidence. In line with our findings, Yamagata et al.  showed that hypertension and diabetes were independent predictor for developing CKD. Because of the high prevalence and incidence of pre-hypertension, hypertension, pre-diabetes and diabetes among Iranian population , more attention should be paid to early intervention and risk factor modification for this high risk group.
In summary, survival trees are the helpful exploratory tools that have been applied in recent years in different medical studies [48,49,50,51]. However, investigators should be aware of their drawbacks and limitations. A major issue with survival trees is that they can be built to fit the training data too perfectly (over fit); while it may not perform well on other unknown data . We used the conditional inference survival tree developed by Hothorn et al. , which is shown to be more reliable and less likely to over fit compared to the other survival tree algorithms.
This study is subject to some limitations. First, our study was a post hoc analysis of a prospective cohort that was not able to control for some covariates related to CKD such as family history of kidney disease that is related to decline in kidney function beginning at early ages . Additionally, no quantitative urinary indexes were available at the baseline or follow-up visit, because, the cohort was not originally designed for kidney diseases. Second, this study has been conducted on a sample of Iranian population and further studies should be conducted to determine whether our findings could be applicable to other populations. Third, as inherent to any prospective cohort study, the level of risk factors at the baseline examination might change during the follow up, e.g. some degree of misclassification may have occurred in this study, leading to biased estimated hazard ratios towards the null. Fourth, about 20% of participants were lost to follow-up in our study; we found statistically but not clinically important differences between the followed vs. non-followed population in some baseline variables. In total population, mean values of FPG and eGFR were lower, and TC and WC were higher in followed compared to non-followed population. Since FPG and eGFR were reported to be the most important predictors for CKD among Iranian population , we may underestimate the incidence of CKD in our study. To evaluate the extent to which lost to follow-up may have influenced measures of association in final models, we calculated propensity scores in both genders and in the total population. The PS is the conditional probability that an individual participate in the follow-up, given a set of covariates . To obtain PS, we applied multivariable logistic regression with participation in the follow-up as the outcome and all baseline variables as the predictors. The estimated probability of participation, or the PS, derived from the multivariable logistic regression model described above was added to the final Cox PH models. In general, in both genders and total population, there were no changes in estimated HRs and CIs for predictors in the Cox PH models. We also added PS as a covariate to develop survival tree models along with all baseline covariates; interestingly, adding the PS did not alter the structure of survival tree in both genders and total population (data not shown). Therefore, lost to follow-up did not cause selection bias in our study. And last but not least, our diagnosis of CKD was based on a single estimate of GFR, which we acknowledge tends to overestimate the incidence of kidney disease .
As strengths, this is the first study that identified interactions between CKD risk factors in a well-characterized cohort of adults using the survival tree approach. We used data from a large, long-term, population-based cohort of men and women. In fact, in men, the statistical power of our study to detect a HR of 20, 30 and 50% for incident CKD was 72, 95 and 99%; the corresponding values in women were 94, 99 and 100%, respectively. We assessed the actual measurement of blood pressure and laboratory parameters, rather than self-reported data. In addition, we investigated a wide range of variables as potential predictors.
As, there will never be enough resources to implement every programme for all population, health policy makers prefer interventions that target high-risk groups . Our study, identified not only risk factors of CKD in total population by Cox model but also identified high-risk groups by survival tree analysis, which can be used for implement specific interventions for each group according their level of risk.
Cox regression models is suitable for demonstrating the relation of a predictor to the outcome; while, survival trees are ideally suited for exploration of some prognostic groups of individuals by detecting certain types of interactions without the need to specify them beforehand. In our study, Cox PH models identified the main effects of the several traditional risk factors of CKD; however, survival tree analysis identified a number of interactions between those risk factors. Survival trees created 12 subgroups with different patterns of eGFR, age, FPG, physical activity level and WC in men. Among women, age, eGFR and SBP interacted with each other and identified 10 subgroups. In conclusion, these two statistical methods can be deemed to be somewhat complementary for survival data analysis. The prognostic groups could potentially be valuable for early detection and management of high risk individuals for CKD in the public health setting and at clinic.
Akaike information criteria
body mass index
chronic kidney disease
- Cox PH:
Cox proportional hazards
diastolic blood pressure
estimated glomerular filtration rate
end stage renal disease
family history of diabetes
fasting plasma glucose
inter quartile range
metabolic equivalent of task
modification of diet in the renal disease study
physical activity level
systolic blood pressure
Tehran Lipid and Glucose Study
Jha V, Garcia-Garcia G, Iseki K, Li Z, Naicker S, Plattner B, et al. Chronic kidney disease: global dimension and perspectives. Lancet. 2013;382(9888):260–72.
Tsai W-C, Wu H-Y, Peng Y-S, Ko M-J, Wu M-S, Hung K-Y, et al. Risk factors for development and progression of chronic kidney disease: a systematic review and exploratory meta-analysis. Medicine. 2016;95(11):e3013.
Ramirez-Rubio O, McClean MD, Amador JJ, Brooks DR. An epidemic of chronic kidney disease in Central America: an overview. Postgrad Med J. 2013;89(1049):123–5.
Radhakrishnan J, Remuzzi G, Saran R, Williams DE, Rios-Burrows N, Powe N, et al. Taming the chronic kidney disease epidemic: a global view of surveillance efforts. Kidney Int. 2014;86(2):246–50.
Mangione F, Canton AD. The epidemic of chronic kidney disease: looking at ageing and cardiovascular disease through kidney-shaped lenses. J Intern Med. 2010;268(5):449–55.
Herget-Rosenthal S, Dehnen D, Kribben A, Quellmann T. Progressive chronic kidney disease in primary care: modifiable risk factors and predictive model. Prev Med. 2013;57(4):357–62.
McClellan WM, Ramirez SP, Jurkovitz C. Screening for chronic kidney disease: unresolved issues. J Am Soc Nephrol. 2003;14(suppl 2):81–7.
Johnson RA, Wichern DW. Applied multivariate statistical analysis. Upper Saddle River: Prentice hall; 2002.
Kleinbaum DG, Klein M. Logistic regression: a self-learning text. 3rd ed. New York: Springer; 2010.
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1):1.
Zhang H, Singer B. Recursive partitioning and applications. Berlin: Springer Science & Business Media; 2010.
Su S-L, Lin C, Kao S, et al. Risk factors and their interaction on chronic kidney disease: a multi-centre case control study in Taiwan. BMC Nephrol. 2015;16(16):83.
Liu W-C, Hung C-C, Chen S-C, et al. Association of hyperuricemia with renal outcomes, cardiovascular disease, and mortality. Clin J Am Soc Nephrol. 2012;7(4):541–8.
Loh WY. Fifty years of classification and regression trees. Int Stat Rev. 2014;82(3):329–48.
Ramezankhani A, Pournik O, Shahrabi J, Khalili D, Azizi F, Hadaegh F. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes Res Clin Pract. 2014;105(3):391–8.
Ramezankhani A, Kabir A, Pournik O, Azizi F, Hadaegh F. Classification-based data mining for identification of risk patterns associated with hypertension in Middle Eastern population: a 12-year longitudinal study. Medicine (Baltimore). 2016. https://doi.org/10.1097/MD.0000000000004143.
Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open. 2016. https://doi.org/10.1136/bmjopen-2016-013336.
Parizadeh D, Ramezankhani A, Momenan AA, Azizi F, Hadaegh F. Exploring risk patterns for incident ischemic stroke during more than a decade of follow-up: a survival tree analysis. Comput Methods Programs Biomed. 2017;147:29–36.
Ramezankhani A, Bagherzadeh-Khiabani F, Khalili D, Azizi F, Hadaegh F. A new look at risk patterns related to coronary heart disease incidence using survival tree analysis: 12 Years Longitudinal Study. Sci Rep. 2017. https://doi.org/10.1038/s41598-017-03577-0.
Azizi F, Ghanbarian A, Momenan AA, Hadaegh F, Mirmiran P, Hedayati M, et al. Prevention of non-communicable disease in a population in nutrition transition: Tehran Lipid and Glucose Study phase II. Trials. 2009;10:5.
Hosseinpanah F, Kasraei F, Nassiri AA, Azizi F. High prevalence of chronic kidney disease in Iran: a large population-based study. BMC Public Health. 2009;9:44.
Ainsworth BE, Jacobs DR Jr, Leon AS. Validity and reliability of self-reported physical activity status: the Lipid Research Clinics questionnaire. Med Sci Sports Exerc. 1993;25(1):92–8.
Mirmiran Mohammadi, Allahverdian Azizi. Estimation of energy requirements for adults: Tehran Lipid and Glucose Study. Int J Vitam Nutr Res. 2003;73(3):193–200.
Hu F. Obesity epidemiology. New York: Oxford University Press; 2008.
Momenan AA, Delshad M, Sarbazi N, Rezaei Ghaleh N, Ghanbarian A, Azizi F. Reliability and validity of the modifiable activity questionnaire (MAQ) in an Iranian urban adult population. Arch Iran Med. 2012;15(5):279–82.
Levey ASEK, Tsukamoto Y, Levin A, Coresh J, Rossert J, et al. Definition and classification of chronic kidney disease: a position statement from Kidney Disease: improving Global Outcomes (KDIGO). Kidney Int. 2005;67(6):2089–100.
World Health Organization. Guidelines for controlling and monitoring the tobacco epidemic. Geneva: World Health Organization; 1998.
Jeon CY, Lokken RP, Hu FB, Van Dam RM. Physical activity of moderate intensity and risk of type 2 diabetes a systematic review. Diabetes Care. 2007;30(3):744–52.
Hothorn T, Everitt BS. A handbook of statistical analyses using R. Boca Raton: CRC Press; 2014.
Hippisley-Cox J, Coupland C. Predicting the risk of Chronic Kidney Disease in Men and Women in England and Wales: prospective derivation and external validation of the QKidney® Scores. BMC Fam Pract. 2010;11:49.
Harati H, Hadaegh F, Saadat N, Azizi F. Population-based incidence of type 2 diabetes and its associated risk factors: results from a six-year cohort study in Iran. BMC Public Health. 2009;9:186.
Tohidi M, Hasheminia M, Mohebi R, et al. Incidence of chronic kidney disease and its risk factors, results of over 10 year follow up in an Iranian cohort. PLoS ONE. 2012. https://doi.org/10.1371/journal.pone.0045304.
Imai E, Horio M, Watanabe T, et al. Prevalence of chronic kidney disease in the Japanese general population. Clin Exp Nephrol. 2009;13(6):621–30.
Lucove J, Vupputuri S, Heiss G, North K, Russell M. Metabolic syndrome and the development of CKD in American Indians: the Strong Heart Study. Am J Kidney Dis. 2008;51(1):21–8.
Schaeffner ES, Kurth T, Curhan GC, et al. Cholesterol and the risk of renal dysfunction in apparently healthy men. J Am Soc Nephrol. 2003;14(8):2084–91.
Fox CS, Larson MG, Leip EP, Culleton B, Wilson PW, Levy D. Predictors of new-onset kidney disease in a community-based population. JAMA. 2004;291(7):844–50.
Yamagata K, Ishida K, Sairenchi T, Takahashi H, Ohba S, Shiigai T, et al. Risk factors for chronic kidney disease in a community-based population: a 10-year follow-up study. Kidney Int. 2007;71(2):159–66.
Hossain MP, Goyder EC, Rigby JE, El Nahas M. CKD and poverty: a growing global challenge. Am J Kidney Dis. 2009;53(1):166–74.
World Health Organization. Population nutrient intake goals for preventing diet-related chronic diseases. Geneva: World Health Organization; 2003.
Xia J, Wang L, Ma Z, Zhong L, Wang Y, Gao Y, et al. Cigarette smoking and chronic kidney disease in the general population: a systematic review and meta-analysis of prospective cohort studies. Nephrol Dial Transplant. 2017;32(3):475–87.
Koba S. Physical activity and chronic kidney disease. J Atheroscler Thromb. 2016;23(4):395–6.
Brotman DJ, Bash LD, Qayyum R, et al. Heart rate variability predicts ESRD and CKD-related hospitalization. J Am Soc Nephrol. 2010;21(9):1560–70.
Stenvinkel P, Carrero JJ, Axelsson J, Lindholm B, Heimbürger O, Massy Z. Emerging biomarkers for evaluating cardiovascular risk in the chronic kidney disease patient: how do new pieces fit into the uremic puzzle? Clin J Am Soc Nephrol. 2008;3(2):505–21.
Glassock RJ. Con: thresholds to define chronic kidney disease should not be age dependent. Nephrol Dial Transplant. 2014;29(4):774–9.
Winearls DO. Ageing and the glomerular filtration rate: truths and consequences. Trans Am Clin Climatol Assoc. 2009;120:419–28.
Esteghamati A, Abbasi M, Alikhani S, Gouya MM, Delavari A, Shishehbor MH, et al. Prevalence, awareness, treatment, and risk factors associated with hypertension in the Iranian population: the national survey of risk factors for noncommunicable diseases of Iran. Am J Hypertens. 2008;21(6):620–6.
Hadaegh F, Derakhshan A, Zafari N, Khalili D, Mirbolouk M, Saadat N, et al. Pre-diabetes tsunami: incidence rates and risk factors of pre-diabetes and its different phenotypes over 9 years of follow-up. Diabet Med. 2017;34(1):69–78.
Lobo CL, Nascimento EM, Abelha R, Queiroz AM, Connes P, Cardoso GP, et al. Risk factors of pulmonary hypertension in brazilian patients with sickle cell anemia. PLoS ONE. 2015. https://doi.org/10.1371/journal.pone.0137539.
Sawada S, Yamashita N, Suehisa H, Yamashita M. Risk factors for recurrence after lung cancer resection as estimated using the survival tree method. Chest. 2013;144(4):1238–44.
Ali RA, Dooley C, Comber H, Newell J, Egan LJ. Clinical features, treatment, and survival of patients with colorectal cancer with or without inflammatory bowel disease. Clin Gastroenterol Hepatol. 2011;9(7):584–9.
Vistisen D, Andersen GS, Hansen CS, Hulman A, Henriksen JE, Bech-Nielsen H, et al. Prediction of first cardiovascular disease event in type 1 diabetes: the steno T1 risk engine. Circulation. 2016;133(11):1058–66.
Zhou Y, McArdle JJ. Rationale and applications of survival tree and survival ensemble methods. Psychometrika. 2015;80(3):811–33.
Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat. 2006;15(3):651–74.
Lunyera J, Mohottige D, Von Isenburg M, et al. CKD of uncertain etiology: a systematic review. Clin J Am Soc Nephrol. 2016;11(3):379–85.
Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med. 1997;127(8 Pt 2):757–63.
Epping-Jordan JE, Galea G, Tukuitonga C, Beaglehole R. Preventing chronic diseases: taking stepwise action. Lancet. 2005;366(9497):1667–71.
AR analyzed data and wrote the manuscript. MT participated in the interpretation of results. FA and FH, designed the study protocol, and participated in the coordination and management of the study. All authors read and approved the final manuscript.
We express our appreciation to the participants of district-13 of Tehran for their enthusiastic support in this study.
The authors declare that they have no competing interests.
Availability of data and materials
All datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.
Consent for publication
Ethics approval and consent to participate
Protocol of this study was approved by the ethical committee of the Research Institute for Endocrine Sciences of Shahid Beheshti University of Medical Sciences, Tehran, Iran, and conducted in accordance with the Declaration of Helsinki. All participants signed informed consent forms.
This study was supported by Grant No. 121 from the National Research Council of the Islamic Republic of Iran.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Ramezankhani, A., Tohidi, M., Azizi, F. et al. Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up study. J Transl Med 15, 240 (2017). https://doi.org/10.1186/s12967-017-1346-x
- Chronic kidney disease
- Risk factors
- Survival tree analysis