- Open Access
Development and validation of new glomerular filtration rate predicting models for Chinese patients with type 2 diabetes
Journal of Translational Medicinevolume 13, Article number: 317 (2015)
Previous researches has depicted that the performance of the recommended glomerular filtration rate (GFR)-estimating equations in the type 2 diabetic population is inferior to that in the non-diabetic population. We attempted to develop new GFR-predicting models for use in Chinese patients with type 2 diabetes in this study.
We enrolled 519 type 2 diabetic patients including a development data-set (n = 276), an internal validation data-set (n = 138) and an external validation data-set (n = 105) to establish new GFR-predicting models. 99mTc-DTPA-GFR revised by the dual sample method was referred to as the gold GFR standard.
Based on sex, age, serum creatinine and new predictor variables [body mass index (BMI), hemoglobinA1C, and urinary albumin creatinine ratio], eight new regression models and eight artificial neural network (ANN) models were developed. In the external validation group, only ANN3 was superior in both precision and accuracy over the original CKD-EPI equation (precision, 20.5 vs. 24.2 mL/min/1.73 m2, P < 0.001; 30 % accuracy, 88.6 vs. 80.6 %, P = 0.02).
ANN3 based on sex, age, serum creatinine and BMI is the optimal model for GFR estimation in Chinese patients with type 2 diabetes.
An accurate assessment of glomerular filtration rate (GFR) is crucial for correct drug dosing, as well as for detecting, managing, and predicting the prognosis of chronic kidney disease (CKD) in diabetes . However, analysis has indicated that the Cockcroft–Gault (CG) equation , the modification of diet in renal disease (MDRD) equation  and the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation , which were recommended for the estimation of GFR, are inaccurate in diabetic populations [5–7]. In accordance with the above studies [5–7], our previous result demonstrated that the accuracy of the CKD-EPI equation could be influenced by diabetic status .
There are several explanations for the discrepancy between diabetic and non diabetic population. To our knowledge, the establishment of traditional equations was dependent on the regression model in a sizable number of participants without consideration of physiology, thus the performance of traditional equations can be poor when they are applied in a significantly different population. Only a very small percentage of diabetic patients were involved in establishing the CG and MDRD equations, and only 27 % of the subjects in the CKD-EPI equation were diabetic. Further, attention should be paid to the special physiology of type 2 diabetic patients. For example, in the early stage of diabetic nephropathy, owing to hyperfiltration, diabetic patients often represent with high values of GFR . It is well recognized that GFR can be increased by acute hyperglycemia in type 2 diabetes with normal renal function. A study involving 193 diabetic patients by Rigalleau  showed that the relationship also existed in more advanced stages of renal impairment, and that a reduction in glycosylated hemoglobin (HbA1c) was related to a reduction in GFR after treatment for several months. Moreover, most individuals with type 2 diabetes were overweight or obese. The fact that the obese people tented to have a higher proportion of fat and a lower content of muscle mass compared to the non-obese people has impacts on GFR . In addition, several studies implied that the level of serum creatinine was remarkably lower in diabetes due to the lower muscle mass volume [9, 12]. Special attention should be paid to albuminuria in diabetes. The appearance of albuminuria initiates the loss of GFR  and several reports have identified that the level of albuminuria correlates with the level of the loss of renal function in people with diabetes and obvious renal diseases [14–16]. Babazono  stressed that higher levels of urine albumin-to-creatinine ratio (UACR) are associated with a greater rate of decline in eGFR in Japanese diabetic patients. In the supplementary appendix of a study by Lesley, the performance of the CKD-EPI equation varied in different proteinuria subgroups .
Equations currently estimate GFR on the basis of GFR determinants as well as non-GFR determinants. GFR determinants are referred to as GFR filtration markers, whereas non-GFR determinants include demographic and clinical variables which influence GFR determinants. The combination of novel filtration markers and non-GFR determinants can contribute to the improvement in performance by reducing the bias caused by non-GFR determinants . Besides, advances in statistical techniques and the development of novel strategies can benefit GFR estimation. An artificial neural network (ANN) as a common mathematical modeling method has been employed extensively in engineering prediction and the application of ANN in medicine and biology is also encouraging. Our previous study, which consisted of 1230 patients with chronic kidney disease, verified that the performance of ANN was equal to traditional regression .
Taking the above into account, we aim to develop an optimal GFR-predicting model for Chinese type 2 diabetic patients by applying the traditional regression method as well as an ANN with the addition of new variables [including body mass index (BMI), HbA1c and UACR] as non-GFR determinants.
Subjects and study design
The study recruited consecutive patients who had known type 2 diabetes for whom complete clinical data were available in the Third Affiliated Hospital of Sun Yat-sen University, China. Exclusion criteria were: (1) age less than 18 years old; (2) having acute kidney function deterioration (the level of serum creatinine on the day of undergoing GFR measurement differed more than 15 % compared with that on the day of admission), skeletal muscle atrophy, edema, pleural effusion or ascites, malnutrition, amputation, heart failure, or ketoacidosis; (3) being treated with dialysis at the time of the study; (4) taking cimetidine, trimethoprim and injection of albumin or diuretics intravenously before the measurement of GFR. Finally 519 type 2 diabetic patients were enrolled. Patients treated from January 2005 through December 2012 were randomly divided into the development data-set (n = 276) and the internal data-set (n = 138). Data obtained from January 2013 to June 2013 were used as the external validation data-set (n = 105). Written informed consent was obtained from all subjects. The study was approved by the institutional review board at the Third Affiliated Hospital of Sun Yat-sen University.
The measurement of GFR was obtained by a technetium 99 m diethylene-triaminepentaacetic acid (99mTc-DTPA) renal dynamic imaging method (modified Gate’s method), using a Millennium TMMPR SPECT with the General Electric Medical System (Discovery VH, GE Healthcare, Little Chalfont, UK). The details have been described previously . The measured GFR (mGFR) was calibrated to equal the dual plasma sample 99mTc-DTPA GFR. Applying Open Epi software (Version 2), the minimum sample size was calculated as 36  (95 % confidence level and 80 % power). The calibrated GFR measurement was referred to as the standard GFR (sGFR) in our study. Two and four hours after the injection of 99mTc-DTPA into the opposite forearm, blood samples were drawn intravenously and collected in heparinized tubes. Radioactivity in the separated plasma was recorded by a multi-function well counter (ZD-6000 multi-function instrument from Zhida Technology Company, Xian, China). Serum creatinine analysis was performed by a Hitachi 7180 autoanalyzer (Hitachi, Tokyo, Japan; reagents from Roche Diagnostics, Mannheim, Germany) using the enzymatic method, and after the year 2010 serum creatinine was traceable by isotope dilution mass spectrometry. HbA1c was detected by high performance liquid chromatography, while UACR was determined by the immuneturbidimetric assay.
Metrics for the development of the new regression models
The development of the new regression models was based on both the development and internal validation data sets. The predictor variables involved in the establishment were age, sex, serum creatinine, BMI, HbA1c, and UACR. Age, sex, and serum creatinine were included in all new models, with BMI, HbA1c, and UACR added separately or in combination. In the development of the new equations, sGFR and serum creatinine were transformed to a log scale, while BMI, HbA1c, and UACR were on the natural scale. Least squares linear regression was adopted to relate sGFR to the predictor variables. As the method mentioned in the establishment of the CKD-EPI equation, a nonparametric smoothing spline was used to configure the shape of the relationship of log standard GFR with log creatinine, and the nonlinearity relationship represented in the smoothing splines was characterized by means of piecewise linear splines.
Metrics for the development of the new ANN models
The new ANN models were programmed by MATLAB 2011A (The MathWorks Inc, Natick, MA, USA). A three-layer back-propagation (BP) network consisting of an input layer, a hidden layer and an output layer was established. The predictor variables, the same as the description above, were the input variables with standard GFR as the output variable. Each neuron in the hidden layer took the S function as an exciting function and with different numbers of neurons in the hidden layer (1–11), several networks were programmed. After the random initialization, all net works were trained in the development data-set by learning the rule of back propagation. Their performance in the internal validation data-set determined the optimal network. Performance was assessed by mean square error in the internal validation data-set and the smallest mean square error meant the best performance. With thresholds and weights specified after training, the output of the network was calculated by the weighted summation of each neuron to approximate sGFR. The introduction of the genetic algorithm into the BP network (GABP network) enabled optimized initialization of the weights and thresholds, leading to improvement of performance of the ANN models. In the GABP network, encoded as a chromosome, all weights and thresholds of one network evolved from one generation to another through the progression of mutation and crossing. The initial weights and thresholds were chosen for the next generation if the network demonstrated better performance in the internal validation data-set. Superior initial weights and thresholds were eventually applied in the initialization of the network. Details of the construction of the new ANN model are presented in the Additional file 1.
Determination of the optimal models
All the new models were compared with the new Japanese equations and the CKD-EPI equation in the external validation data set. The performance was defined by bias, precision and accuracy.
Expression of the equations
CKD-EPI equation 
κ is 0.7 for female and 0.9 for male, α is 0.329 for female and 0.411 for male, min indicates the minimum of SC/κ or 1, and max indicates the maximum of SC/κ or 1Japanese equation 1 
Japanese equation 2 
Results are expressed as mean ± SD or as median. Bias was assessed as the median of the difference between sGFR and eGFR, and precision was defined as the inter-quartile range (IQR) of the difference. Accuracy was measured as the percentage of eGFRs within 30 % of the sGFR. The 95 % confidence intervals were calculated by the bootstrap method (2000 bootstraps). The quantitative variables between two data-sets were compared using the independent samples t test or the Mann–Whitney test. Differences and accuracy within the data-set were compared by Wilcoxon signed rank test and McNemar test. All analyses were conducted using SPSS software (version 13.0 SPSS), R (R i386 3.0.2) and MATLAB software (version 2011b).
Calibration of GFR
To calibrate our measured GFR to the dual plasma samples method, 36 type 2 diabetic subjects [mean age was 63.3 ± 12.3 y (range 38–82) with a mean mGFR of 66.9 ± 30.1 ml/min/1.73 m2 (range 16.7–145.6 ml/min/1.73 m2)] were selected randomly. The dual plasma samples method 99mTc-DTPA clearance was performed simultaneously with renal dynamic imaging. The 99mTc-DTPA renal dynamic imaging GFR value can be calibrated equally to the dual plasma samples 99mTc-DTPA clearance, using a linear regression equation: dual plasma sample 99mTc-DTPA-GFR (ml/min/1.73 m2) = 3.706 + 1.039 × 99mTc-DTPA renal dynamic imaging-GFR (ml/min/1.73 m2) (R2 = 0.879, P < 0.001).
Clinical characteristics of the development, internal validation data-sets and external validation data-set
Mean sGFR of the development and internal validation data-sets was 80.9 ± 29.0 mL/min/1.73 m2. The mean age, mean BMI, mean sGFR, and the distribution of HbA1C of the external validation data set were similar to the development and internal validation data-sets. Clinical characteristics of the diabetic patients in each data-set are provided in Table 1.
Development of new regression equations and artificial neural network in patients with diabetes
Variables included in the new regression models for estimating log GFR are log serum creatinine, and sex, age, BMI, HbA1C, and UACR on the natural scale. The relationship between log GFR and log serum creatinine was modeled as a two-slope linear spline with sex-specific knots at 0.7 mg/dL in women and 0.8 mg/dL in men. Applying multiple regression, eight new regression models were developed based on the variable above (Additional file 2: Tables S1, S2). Variables included in artificial neural network models for estimating GFR were the same as in the regression models (Table 2; Additional file 2: Table S3).
Determination of the optimal model
In the external validation group, biases between new GFR estimating models and the CKD-EPI equation were not significant except for the Japanese equations (Japanese equation 1 vs. CKD-EPI, −20.48 vs. 6.00 mL/min/1.73 m2, P < 0.001; Japanese eqution 2 vs. CKD-EPI, −30.67 vs. 6.00 mL/min/1.73 m2, P < 0.001). All models had improve precision (P < 0.001) in comparison to the CKD-EPI equation except ANN6 (P < 0.001). Only ANN3 had higher accuracy than the CKD-EPI equation (30 % accuracy, 88.6 vs. 80.0 %, P = 0.02). The Japanese equations demonstrated much lower accuracy compared with the CKD-EPI equation (Japanese equation 1 vs. CKD-EPI, 54.3 vs. 80 %, P < 0.001; Japanese equation 2 vs. CKD-EPI, 29.5 vs. 80 mL/min/1.73 m2, P < 0.001) (Table 3). To facilitate its clinical use, a detailed equation (Additional file 1) was developed to calculate eGFR by ANN3. A simple EXCEL software program was also created (Additional file 3).
The number of people with diabetes is increasing globally, with 387 million in 2014 , up from 153 million in 1980 . Based on a national survey carried out in 2010, the prevalence of diabetes in China was estimated to be 11.6 %, counting for 113.9 million Chinese adults with diabetes . In all developed and many developing countries, diabetes has become the leading cause of chronic kidney disease . It is therefore essential to accurately assess GFR in diabetic patients. Our analysis indicated ANN3 based on sex, age, serum creatinine and BMI (topological structure as 4-6-1) was the optimal model for GFR estimation in Chinese type 2 diabetic patients. In an investigation by Tsuda and colleagues, a new equation including HbA1c as a new variable was developed. However, the sample size was very small (40 cases) and the new equation had not been externally validated . Further, it showed poor performance in our study.
There are two possible reasons for the superiority of ANN3 over other models. One possible reason is the introduction of BMI as a new variable, which has been explained before [9, 11, 12]. Additionally, Hsu found that BMI was related to the risk for end-stage renal disease, and GFR started to decrease when BMI ≥22.0 kg/m2 . Kawamoto also mentioned the estimated GFR in upper normal body weight (BMI 22.0–24.9 kg/m2), overweight or obese subjects (BMI ≥25 kg/m2) was lower than that in lower normal body weight individuals (BMI, 18.5–21.9 kg/m2) . Another reason is the application of ANN. A multiple regression model is based on a large number of samples, and its predictive performance is restricted by the size and characteristic of the samples. ANN can simulate a relationship accurately between variables even in a small number of samples. Furthermore, multiple regression models fail to solve the high multi-collinearity between variables, whereas ANN is designed to address this problem and is flexible in modeling non-linear problems.
Two factors were assumed to influence the performance of the CKD-EPI equation in the external validation data-set. On the one hand, the development data-set of the CKD-EPI equation was quite different from that in our study. Diabetes only accounted for 29 % of the whole population in the CKD-EPI equation, while all subjects in this study were diagnosed with type 2 diabetes. There were also differences in the clinical characteristics of the two development and internal validation data-sets. The subjects of the development and internal validation data-set in our study were older than that in the CKD-EPI equation (60 ± 13 vs. 47 ± 15 y). The BMI of the development and internal validation data-set in our study were also lower than those in the CKD-EPI equation (25 ± 4 vs. 28 ± 6 kg/m2). The subjects of the development and internal validation data-set in our study had a higher sGFR than that in the CKD-EPI equation (81 ± 29 vs. 68 ± 40 mL/min/1.73 m2). On the other hand, the gold standard of our study was the 99mTc-DTPA renal dynamic imaging GFR calibrated to the dual plasma sample method while the sGFR of the CKD-EPI equation was urinary clearance of 125I-Iothalamate. This difference in the gold standards could result in systematic bias.
Several limitations in our study should be pointed out. One of them is the relatively small number of samples in both the development and validation data-sets and the fact that subjects were restricted to a single center. Another difference is our use of 99mTc-DTPA renal dynamic imaging GFR calibrated to the dual plasma sample method as the gold standard of GFR instead of inulin clearance.
A new GFR estimating model (ANN3) based on sex, age, serum creatinine and BMI was selected as the optimal model for GFR estimation in Chinese patients with type 2 diabetes. However, ANN3 proved its superiority only in the external validation data-set (n = 105 patients). Additional external investigations are still required. With a larger sample size, addition of new filtration markers (Cystatin C, β2-microglobulin and β trace protein), the change of GFR gold standard to inulin clearance, and the perfection of modeling machine learning methods including ANN, the predictive performance of the new models may be further improved.
glomerular filtration rate
body mass index
artificial neural network
modification of diet in renal disease
Chronic Kidney Disease Epidemiology Collaboration
chronic kidney disease
- 99m Tc-DTPA:
technetium 99m diethylene-triaminepentaacetic acid
- GABP network:
genetic algorithm into the BP network
Levey AS, Inker LA, Coresh J. GFR estimation: from physiology to public health. Am J Kidney Dis. 2014;63:820–34.
Cockcroft DW, Gault MH. Prediction of creatinine clearance from serum creatinine. Nephron. 1976;16:31–41.
Levey AS, Bosch JP, Lewis JB, Greene T, Rogers N, et al. A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Modification of Diet in Renal Disease Study Group. Ann Intern Med. 1999;130:461–70.
Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AR, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009;150:604–12.
Nair S, Mishra V, Hayden K, Lisboa PJ, Pandya B, et al. The four-variable modification of diet in renal disease formula underestimates glomerular filtration rate in obese type 2 diabetic individuals with chronic kidney disease. Diabetologia. 2011;54:1304–7.
Silveiro SP, Araujo GN, Ferreira MN, Souza FD, Yamaguchi HM, et al. Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation pronouncedly underestimates glomerular filtration rate in type 2 diabetes. Diabetes Care. 2011;34:2353–5.
Camargo EG, Soares AA, Detanico AB, Weinert LS, Veronese FV, et al. The Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation is less accurate in patients with Type 2 diabetes when compared with healthy individuals. Diabet Med. 2011;28:90–5.
Liu X, Gan X, Chen J, Lv L, Li M, et al. A new modified CKD-epi equation for Chinese patients with type 2 diabetes. PLoS One. 2014;9:e109743.
Tsuda A, Ishimura E, Ohno Y, Ichii M, Nakatani S, et al. Poor glycemic control is a major factor in the overestimation of glomerular filtration rate in diabetic patients. Diabetes Care. 2014;37:596–603.
Rigalleau V, Lasseur C, Raffaitin C, Perlemoine C, Barthe N, et al. Glucose control influences glomerular filtration rate and its prediction in diabetic subjects. Diabetes Care. 2006;29:1491–5.
Chew-Harris JS, Florkowski CM, Elmslie JL, Livesey J, Endre ZH, et al: Lean mass modulates glomerular filtration rate in males of normal and extreme body composition. Intern Med J 2014;44:749-56.
Hjelmesaeth J, Roislien J, Nordstrand N, Hofso D, Hager H, et al. Low serum creatinine is associated with type 2 diabetes in morbidly obese women and men: a cross-sectional study. BMC Endocr Disord. 2010;10:6.
Hovind P, Rossing P, Tarnow L, Smidt UM, Parving HH. Progression of diabetic nephropathy. Kidney Int. 2001;59:702–9.
Hoy WE, Wang Z, VanBuynder P, Baker PR, Mathews JD. The natural history of renal disease in Australian Aborigines. Part 1. Changes in albuminuria and glomerular filtration rate over time. Kidney Int. 2001;60:243–8.
Keane WF, Brenner BM, de Zeeuw D, Grunfeld JP, McGill J, et al. The risk of developing end-stage renal disease in patients with type 2 diabetes and nephropathy: the RENAAL study. Kidney Int. 2003;63:1499–507.
Breyer JA, Bain RP, Evans JK, Nahman NJ, Lewis EJ, et al. Predictors of the progression of renal insufficiency in patients with insulin-dependent diabetes and overt diabetic nephropathy. The Collaborative Study Group. Kidney Int. 1996;50:1651–8.
Babazono T, Nyumura I, Toya K, Hayashi T, Ohta M, et al. Higher levels of urinary albumin excretion within the normal range predict faster decline in glomerular filtration rate in diabetic patients. Diabetes Care. 2009;32:1518–20.
Inker LA, Schmid CH, Tighiouart H, Eckfeldt JH, Feldman HI, et al. Estimating glomerular filtration rate from serum creatinine and cystatin C. N Engl J Med. 2012;367:20–9.
Liu X, Li NS, Lv LS, Huang JH, Tang H, et al. A comparison of the performances of an artificial neural network and a regression model for GFR estimation. Am J Kidney Dis. 2013;62:1109–15.
Xun L, Cheng W, Hua T, Chenggang S, Zhujiang C, et al. Assessing glomerular filtration rate (GFR) in elderly Chinese patients with chronic kidney disease (CKD): a comparison of various predictive equations. Arch Gerontol Geriatr. 2010;51:13–20.
Dean AG, Sullivan KM, Soe MM: OpenEpi: open source epidemiologic statistics for Public Health, version 2.3, vol 2013; 2013.
Matsuo S, Imai E, Horio M, Yasuda Y, Tomita K, et al. Revised equations for estimated GFR from serum creatinine in Japan. Am J Kidney Dis. 2009;53:982–92.
International Diabetes Federration. IDF diabetes atlas update poster. 6th ed. Brussels: International Diabetes Federration; 2014.
Danaei G, Finucane MM, Lu Y, Singh GM, Cowan MJ, et al. National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2.7 million participants. Lancet. 2011;378:31–40.
Xu Y, Wang L, He J, Bi Y, Li M, et al. Prevalence and control of diabetes in Chinese adults. JAMA. 2013;310:948–59.
Jha V, Garcia-Garcia G, Iseki K, Li Z, Naicker S, et al. Chronic kidney disease: global dimension and perspectives. Lancet. 2013;382:260–72.
Hsu CY, McCulloch CE, Iribarren C, Darbinian J, Go AS. Body mass index and risk for end-stage renal disease. Ann Intern Med. 2006;144:21–8.
Kawamoto R, Kohara K, Tabara Y, Miki T, Ohtsuka N, et al. An association between body mass index and estimated glomerular filtration rate. Hypertens Res. 2008;31:1559–64.
JXC researched data, wrote manuscript. XL wrote manuscript, researched data. HT researched data. HH reviewed/edited manuscript. YNW researched data. LSL reviewed/edited manuscript. TQL reviewed/edited manuscript. All authors read and approved the final manuscript.
Thanks to the patients for their good cooperation. This study was supported by the National Natural Science Foundation of China (Grant No. 813770866, Grant No. 81370837, Grant No. 81422011 and Grant No. 81070612), the China Postdoctoral Science Foundation (Grant No. 201104335), Guangdong Science and Technology Plan (Grant No. 2011B031800084 and 2013B021800190), Natural Science Foundation of Guangdong (Grant No. 2014A030313035), the Fundamental Research Funds for the Central Universities (Grant No. 11ykpy38), the National Project of Scientific and Technical Supporting Programs Funded by Ministry of Science & Technology of China (Grant No. 2011BAI10B05).
Compliance with ethical guidelines
Competing interests The authors declare that they have no competing interests.
Jinxia Chen, Hua Tang and Hui Huang contributed equally to this work