Skip to main content

Machine learning for the prediction of acute kidney injury in patients with sepsis



Acute kidney injury (AKI) is the most common and serious complication of sepsis, accompanied by high mortality and disease burden. The early prediction of AKI is critical for timely intervention and ultimately improves prognosis. This study aims to establish and validate predictive models based on novel machine learning (ML) algorithms for AKI in critically ill patients with sepsis.


Data of patients with sepsis were extracted from the Medical Information Mart for Intensive Care III (MIMIC- III) database. Feature selection was performed using a Boruta algorithm. ML algorithms such as logistic regression (LR), k-nearest neighbors (KNN), support vector machine (SVM), decision tree, random forest, Extreme Gradient Boosting (XGBoost), and artificial neural network (ANN) were applied for model construction by utilizing tenfold cross-validation. The performances of these models were assessed in terms of discrimination, calibration, and clinical application. Moreover, the discrimination of ML-based models was compared with those of Sequential Organ Failure Assessment (SOFA) and the customized Simplified Acute Physiology Score (SAPS) II model.


A total of 3176 critically ill patients with sepsis were included for analysis, of which 2397 cases (75.5%) developed AKI during hospitalization. A total of 36 variables were selected for model construction. The models of LR, KNN, SVM, decision tree, random forest, ANN, XGBoost, SOFA and SAPS II score were established and obtained area under the receiver operating characteristic curves of 0.7365, 0.6637, 0.7353, 0.7492, 0.7787, 0.7547, 0.821, 0.6457 and 0.7015, respectively. The XGBoost model had the best predictive performance in terms of discrimination, calibration, and clinical application among all models.


The ML models can be reliable tools for predicting AKI in septic patients. The XGBoost model has the best predictive performance, which can be used to assist clinicians in identifying high-risk patients and implementing early interventions to reduce mortality.


Acute kidney injury (AKI) is a common and complex clinical complication in intensive care unit (ICU) settings [1]. In the ICU, approximately 53% of AKI is caused by sepsis and subsequently contributes to longer hospital stay, higher morbidity, and heavier financial burden to patients [2, 3]. Despite improvements in clinical treatment, the mortality of AKI remains unchanged and reaches as high as 40–44% in patients with sepsis due to multiorgan failure, microvascular dysfunction, and systemic inflammatory response syndrome [4,5,6]. However, AKI can be reversed at the early stage through timely intervention and effective treatment, thereby reducing AKI-related mortality [7]. Therefore, identifying patients with high risk of AKI is of vital importance for the management of patients with sepsis in ICU settings.

The prediction of AKI in patients with sepsis has always been a hot topic in critical care medicine. Some biomarkers, such as microRNA-22-3p [8], neutrophil gelatinase-associated lipocalin [9], procalcitonin [10], urinary miR-26b [11], and soluble thrombomodulin [12], have been reported to be associated with AKI in sepsis. However, they are difficult to popularize in clinical settings due to the high cost and requirement of testing technology. Some scoring systems, including acute physiology and chronic health evaluation-II, the simplified acute physiology score (SAPS) II, and sequential organ failure assessment (SOFA), have also been used in AKI prediction, but their performances are unsatisfactory due to poor specificity and sensitivity [13, 14]. In addition, some multivariate predictive models based on traditional statistical methods, such as logistic regression (LR) and Cox proportional risk model, have been developed for predicting the development of AKI among patients with sepsis. Fan et al. [15] applied LR to develop a prediction model for AKI in 15,726 patients with sepsis, and the model showed a preferable predictive accuracy. Importantly, the relationship between variables is complex, including linear or nonlinear relationship, which is prominent in ICU settings. However, LR is defaulted to handle the linear relationship between independent and dependent variables, and may oversimplify the complex nonlinear relationship. Moreover, LR is prone to be affected by multicollinearity between variables, which may reduce the performance of the model. Therefore, exploring more effective and accurate prediction tools is extremely important in the management of septic patients.

Recently, machine learning (ML) has attracted the attention of and gained recognition from clinicians due to the evolution of statistical theory and computer technology. Novel ML techniques have been widely used in predictive models of various diseases and show better performance compared with those of traditional LR or Cox regression analyses [16, 17]. We can find quite a few efforts on the application of ML algorithms for AKI prediction. For example, Chiofolo et al. [18] developed an AKI prediction model using automatic continuous random forest algorithm in critically ill patients, and achieved a preferable capability for early identification of high-risk patients. Le et al. [19] formulated a prediction system for AKI in the ICU settings using convolutional neural networks (CNNs), and found that the predictive performance of the CNN model outperformed that of SOFA scoring system. Lin et al. [20] found that random forest had greater potential in predicting mortality in patients with AKI rather than support vector machine (SVM), artificial neural network (ANN), and SAPS II. However, evidence showing the advantage of the ML algorithms in the prediction of AKI in septic patients is still lacking. In this study, we aimed to develop and validate multiple ML models to predict AKI in septic patients and to find the model with the best predictive performance.


Data source

Using the Structured Query Language, data of patients with sepsis were extracted from a single-center publicly available database called the Medical Information Mart for Intensive Care III (MIMIC-III) database [21]. The MIMIC-III database is an integrated, de-identified, comprehensive clinical dataset containing all patients admitted to the ICUs of Beth Israel Deaconess Medical Center in Boston, MA, from June 1, 2001, to October 31, 2012. The MIMIC-III includes detailed information about admitted and discharged patients, such as demographic characteristics, monitoring vital signs, laboratory and microbiological examination, imaging examination, observation and recording of intake and output, drug treatment, length of stay, survival data, and discharge or death records. To apply for access to the database, we passed the protection of human research participants examination and obtained the certificate (No. 9983480).


When patients were diagnosed with sepsis using the International Classification of Disease 9th revision (ICD-9) (99591, 99592, 78552) after first ICU admission, the patient eligibility was considered. Then, the Kidney Disease: Improving Global Outcomes (KDIGO) criteria [22] were used to determine whether AKI occurred in patients with sepsis during hospitalization. Patients who left the ICU within 48 h, aged < 18 years old and > 89 years old, or previously had AKI or renal failure were excluded. Moreover, patients with missing > 20% individual data or receiving renal replacement therapy (RRT) or continuous RRT at admission were excluded.

Data extraction

Patient data in the initial 24 h following admission were retrieved from the MIMIC- III database. The following information was used in this study: (1) demographic features, including sex, age, and ethnicity; (2) comorbidities, including congestive heart failure, hypertension, chronic pulmonary, diabetes, and liver disease; (3) vital signs, including heart rate, temperature, oxygen saturation (SpO2), systolic blood pressure (SysBP), and diastolic blood pressure (DiasBP); (4) laboratory parameters, including total bilirubin, anion gap, albumin, chloride, potassium, sodium, lactate, partial thromboplastin time (PTT), prothrombin time (PT), international normalized ratio (INR), creatinine, blood urea nitrogen (BUN), and glucose; (5) therapeutic and clinical managements, including mechanical ventilation and vasopressor use. For some variables with multiple measurements, we included the maximum and minimum values for analysis. For SOFA and SAPS-II scores, we only included the initial test values for analysis. Because this was an epidemiological study based on hypothesis, no attempt was made to estimate the sample size of the study. Instead, all eligible patients in the MIMIC- III database were enrolled to achieve a maximized statistical power.

In order to minimize the bias resulting from missing data, variables with over 20% missing values were excluded in the final cohort, and other variables were duplicated using multiple imputation (MI) method. MI is an excellent and widely used method in dealing with missing values [23]. MI can impute each missing value with multiple plausible possible values. This procedure takes into account uncertainty behind the missing value and can produce several datasets from which parameters of interest can be estimated [24]. For example, if you are interested in coefficient for a covariate in a multivariable model, the coefficients will be estimated from each dataset, resulting in multiple coefficients. Considering the uncertainty in the estimation of missing values, these coefficients are combined to give a valid estimate of the coefficient. The coefficient variance estimated by MI is less likely to be underestimated than that estimated by single imputation [25].

Statistical analysis

Continuous variables were summarized as the median with interquartile range and were compared using the Wilcoxon rank-sum test. Categorial variables were expressed as number and percentage and were compared using the Chi-square tests or Fisher’s exact probability method.

Feature selection is an important step in model construction. The Boruta algorithm was used to identify the most important features by comparing the Z-value of each feature against that of “shadow features”. By duplicating all real features and shuffling them sequentially, the Z-value of each attribute is obtained from a random forest model in each iteration, and the Z-value of shadow is created by random shuffling of the real features. A real feature is regarded as “important” if its Z-value is greater than the maximal Z-value of shadow features in multiple independent trials [26]. After feature selection, seven ML algorithms, including LR, k-nearest neighbors (KNN), SVM, decision tree, random forest, extreme gradient boosting (XGBoost), and ANN, were employed for model construction. A tenfold cross-validation was applied for the training and validation sets to prevent overfitting, and it was also used to formulate predictive models. Accordingly, the whole dataset was randomly divided into 10 folds. Nine of them were used as the training set for model development, and the remaining one was used as the validation set for model validation. Because each of the 10 folds was used as the validation set, the above process was repeated 10 times. Finally, the performance of each model was validated and compared in the validation set. In our cases, the model with the highest area under curve (AUC) of the receiver operating characteristic (ROC) curve was selected as the optimal model of each algorithm. Because SOFA and SAPS II scores were used as common tools for predicting the illness severity and prognosis in critically ill patients, we also compared the predictive abilities of ML-based predictive models with those of the conventional scoring systems.

The performance of the predictive models was performed with respect to discrimination, calibration, and clinical utility. The discrimination was quantitatively evaluated by the AUC of the ROC curve, sensitivity, specificity, recall, accuracy, and F1 score. The calibration was visually assessed through the graphical representations of the consistency of the predictive probabilities and the observed outcomes based on 1000 bootstrap resamples. The clinical application was investigated by decision curve analysis (DCA). The statistical analyses and modeling process were conducted by using R version 4.0.5, and a two-sided P-value < 0.05 was regarded as statistically significant.


Baseline characteristics

A total of 6138 patients were diagnosed with sepsis at admission according to ICD-9. Moreover, 2961 patients were excluded according to the exclusion criteria (Fig. 1). Finally, a total of 3176 patients were included in our analysis, of which 2397 patients (75.5%) had AKI after ICU admission.

Fig. 1
figure 1

The flowchart of patient selection. MIMIC: Medical Information Mort for Intensive Care; ICU: intensive care unit

The differences in characteristics between AKI and non-AKI groups are described in Table 1. Male patients were more likely to develop AKI than female patients during hospitalization. Patients who suffered from AKI had higher age and BMI; higher incidence of congestive heart failure, cardiac arrhythmias, hypertension, liver disease, paralysis, chronic pulmonary disease, diabetes, and coagulopathy; and higher rate of mechanical ventilation and vasopressor use compared with those without AKI. The maximum values of anion gap, albumin, bilirubin, creatinine, chloride, glucose, lactate, potassium, PTT, INR, PT, BUN, heart rate, temperature, and SpO2 and the minimum values of anion gap, albumin, bilirubin, creatinine, chloride, lactate, potassium, PTT, INR, PT, BUN, heart rate, temperature, SpO2, sodium, SysBP, and DiasBP were much higher in septic patients with AKI compared with those without AKI (P < 0.05). However, the levels of urine output and eGFR in the AKI group were lower than those in the non-AKI group (P < 0.05).

Table 1 Baseline characteristics of the patients with sepsis

Feature selection

The result of feature screening based on the Boruta algorithm is shown in Fig. 2. In order of Z-values, the 35 variables most closely associated with AKI were age, BMI, cardiac arrhythmias, liver disease, urine output, eGFR, mechanical ventilation, vasopressor, the maximum values of anion gap, bilirubin, creatinine, chloride, lactate, platelet, potassium, PTT, INR, PT, sodium, BUN, temperature, and the minimum values of anion gap, bilirubin, creatinine, chloride, lactate, platelet, PTT, INR, PT, sodium, BUN, temperature, SysBP, and DiasBP.

Fig. 2
figure 2

Feature selection based on the Boruta algorithm. The horizontal axis is the name of each variable, and the vertical axis is the Z-value of each variable. The box plot shows the Z-value of each variable during model calculation. The green boxes represent the first 35 important variables, the yellow represents tentative attributes, and the red represents unimportant variables. BMI: body mass Index; eGFR: estimated glomerular filtration rate; PT: prothrombin time; PTT: partial thromboplastin time; INR: International Normalized Ratio; BUN: blood urea nitrogen; SysBP: systolic blood pressure; DiasBP: diastolic blood pressure

Model performance comparisons

We generated seven ML models and two scoring systems to predict the development of AKI in patients with sepsis after ICU admission. Figure 3 shows the discriminative performance of nine models in terms of ROC curves. Among the nine models, XGBoost model (AUC = 0.817) had the best predictive effect for AKI in septic patients, followed by random forest (AUC = 0.779), ANN (AUC = 0.755), decision tree (AUC = 0.749), LR (AUC = 0.737), SVM (AUC = 0.735), SAPS II (AUC = 0.702), KNN (AUC = 0.664) and SOFA (AUC = 0.646) models. When using the LR model (AUC = 0.7265) as reference, the XGBoost model, random forest model, the ANN and decision tree outperformed it in the predictive ability of AKI in septic patients. However, the discrimination of SVM model (AUC = 0.735), KNN (AUC = 0.664), SOFA (AUC = 0.646), and SAPS II (AUC = 0.702) models were inferior to that of the LR model. Table 2 presents a set of detailed performance metrics for the nine models. The XGBoost model had the best discrimination with the highest sensitivity (0.945), accuracy (0.832), recall (0.852), F1 score (0.895), and the third highest specificity (0.913). In Additional file 1: Figure S1, the calibration curves showed that the XGBoost model performed best among the seven ML models. According to the DCA curves (Fig. 4), the XGBoost model exhibited greater net benefit along with the threshold probability compared with other models, indicating that the XGBoost model was the optimal model with favorable clinical utility.

Fig. 3
figure 3

Receiver operating characteristic curve of the seven models. LR: logistic regression; KNN, k-nearest neighbors; SVM: support vector machine; XGBoost: Extreme Gradient Boosting; ANN: artificial neural network; SOFA: sequential organ failure assessment; SAPS II: the customized simplified acute physiology score; AUC: area under the curve

Table 2 Model performance metrics
Fig. 4
figure 4

Decision curve analyses of the seven models. The horizontal line indicates no patients develop AKI, and the gray oblique line indicates patients develop AKI. LR logistic regression, KNN k-nearest neighbors, SVM support vector machine, XGBoost Extreme Gradient Boosting, ANN artificial neural network, AKI acute kidney injury

Feature importance in XGBoost models

The ranks of feature importance in the XGBoost model are shown in Fig. 5. Urine output, mechanical ventilation, BMI, eGFR, minimum creatinine, maximum PPT, and minimum BUN were the most important features that contributed to AKI in critically ill patients with sepsis.

Fig. 5
figure 5

Feature importance derived from the XGBoost model. BMI body mass Index, PTT partial thromboplastin time, BUN blood urea nitrogen, eGFR estimated glomerular filtration rate, Max Maximum, Min Minimum, AKI acute kidney injury


Compared with some previous reports on AKI prediction in critically ill patients using the MIMIC- III dataset [18,19,20], our research has several novel contributions. For the first time, our study included seven commonly used ML algorithms for comprehensive analyses and compared their predictive performance with that of traditional scoring systems, including SOFA and SAPS II scoring systems. The ML models showed good predictive accuracy in term of discrimination and calibration, but it was not the same as usefulness in clinical practice. When the threshold probabilities of the net benefit are impractical, a model with good performance may also have limited applicability [27]. Therefore, we applied the DCA curves to validate the clinical applicability of the ML models. Finally, Boruta algorithm can help us fully understand the importance of independent variables, so as to carry out feature selection more effectively.

The incidence rate of sepsis is increasing in critically ill patients worldwide, which is associated with high mortality and economic burden [28]. Sepsis is a well-known risk factor of AKI, as the kidney is very sensitive to hypoperfusion and some interventions, such as mechanical ventilation and excessive fluid resuscitation. At present, the treatment of AKI in sepsis remains reactive and nonspecific, and no preventive treatment is available. The presence of AKI has a significant impact on increased mortality in septic patients, which range from 38.2 to 70.2% [29, 30]. Hernando et al. [31] found that AKI occurs in 40–50% of septic patients with a 6–eightfold increase in mortality. Furthermore, a prospective cohort study including 401 critically ill patients revealed that the incidence of AKI was 50.1% in patients with severe sepsis, which is 7.79 times higher than that in patients without sepsis [32]. However, active treatment at the early stage of AKI can improve the survival rate [33]. Some studies have found that early renal recovery in sepsis-related AKI can not only improve the survival rate, but also contribute to the later recovery of patients after discharge [34,35,36]. Unfortunately, it is difficult for clinicians to identify patients at high risk of AKI in the ICU. Therefore, developing and promoting reliable prediction models is particularly urgent for identifying these patients and providing them with timely and effective interventions to improve their prognosis.

In this study, the traditional severity scoring systems, such as SOFA and SAPS II scores, showed an unfavorable performance compared with the ML model, suggesting that they might not be effective tools for AKI prediction in critically ill patients with sepsis. Although SOFA and SAPS II scoring systems can be used to assess the risk of adverse outcomes in critically ill patients, these scores largely depended on the experience of the practitioners [37]. Moreover, these scoring systems preclude the analysis of a large number of valuable variables, resulting in a worse predictive performance than that of multivariate models [38]. Previous studies have revealed that SOFA and SAPS II scoring systems have some disadvantages, such as poor prediction performance, low sensitivity and specificity, wide fluctuation range, and cumbersome process, compared with ML models [16].

Our results showed that the XGBoost model had a better capability than the LR model for predicting AKI in septic patients. On the one hand, the LR algorithm requires researchers to manually select independent variables, cannot detect the complex nonlinear relationship and interaction between independent variable X and response value Y, and is sensitive to the multicollinearity of independent variables, which may result in an underfit and inaccurate model. On the other hand, the XGBoost model could efficiently and flexibly deal with missing data and combine weak prediction models to establish accurate prediction models. Due to its excellent precision and performance, the XGBoost algorithm is increasingly emphasized as a competitive alternative to LR analysis in predicting clinical adverse outcomes.

Among all ML models, the XGBoost model performs best in AKI prediction, which were consistent with some previous studies. Liu et al. [39] demonstrated that the predictive performance of the XGBoost model superior to three other ML models, including LR, SVM, and random forest, for predicting mortality in patients with AKI. Zhu et al. [40] found that the XGBoost model outperformed the KNN, LR, decision tree, random forest, and ANN models in prediction of hospital mortality for mechanically ventilated patients. Moreover, a meta-analysis revealed that XGBoost was more effective than LR and other ML algorisms, including ANN, SVM, and Bayesian network, in the prediction of AKI [41].

This study is the first to apply ML algorithms for predicting the development of AKI during hospitalization in patients with sepsis. Through the sophisticated XGBoost model, we identified that urine output, mechanical ventilation, BMI, eGFR, minimum creatinine, maximum PPT, and minimum BUN were mostly associated with the development of AKI in patients with sepsis. Among these features, urine output was considered to be the most important indicator of AKI, which is in accordance with the KDIGO recommendations. In addition to urine output, some measures of renal function, such as eGFR, BUN, and serum creatinine, also played an important role in the prediction of kidney disease. These results have been confirmed in many clinical studies. Mertoglu et al. [42] found that serum creatinine and BUN have greater diagnostic value compared with other novel markers including myo-inositol oxygenase and cystatin C. Laranja et al. [4] revealed that septic patients with AKI had lower urine output compared with patients with AKI from other cases or chronic kidney disease. Grams et al. [43] demonstrated that low eGFR was a reliable risk factors for AKI through a meta-analysis including more than 1 million participants from eight countries. Notably, mechanical ventilation was also significantly associated with AKI in septic patients. Positive-pressure mechanical ventilation (PPV) is commonly used in critically ill patients to provide oxygenation, ventilation, and airway protection support. However, PPV has long been considered to have potentially harmful effects on the kidney [30]. This may be due to the following three reasons. First, PPV may increase intrathoracic pressure and thus reduce venous reflux, cardiac output, and renal perfusion. Second, mechanical ventilation may induce the release of some neurohormones, affect the renin-angiotensin system, and decrease renal blood flow and eGFR. Third, mechanical ventilation at any volume or pressure might create a cascade of inflammation, including multiple interleukins, tumor necrosis factor-α, and Fas ligand, that may contribute to AKI. Moreover, PTT is common indicators to judge coagulation function. Recently, a retrospective study [44] showed that more than half of patients with septic AKI had at least one abnormal coagulation index, and coagulation dysfunction may predict poor outcome of patients. BMI is a simple and useful index for obesity according to the height and weight of patients. BMI has been widely studied in patients with sepsis and AKI. Our findings showed that the AKI group had higher BMI compared with the non-AKI group. Obesity can lead to glomerular hyperperfusion and hyperfiltration, increase the hemodynamic and metabolic burden of a single glomerulus, and activate inflammation of adipocytes and oxidative stress [45], increasing the risk and progression of AKI. As these indexes can be evaluated easily at hospital admission, they can be used as convenient predictors for the development of AKI in critically ill patients with sepsis.

In this study, the in-hospital incidence of AKI in septic patients was 75.5%, which was similar to some previous studies. According to the report by Fan et al. [15], the incidence rate of AKI was 61% in 15,508 patients with sepsis. Tejera et al. [32] conducted a retrospective study in 401 critically ill patients and found that the incidence of AKI was as high as 75.3%. The pathogenesis of AKI in sepsis is complex and has not been clarified yet. Hemodynamic instability, impaired endothelial function, infiltration of inflammatory cells in the renal parenchyma, renal thrombosis, and renal tubular necrosis have been hypothesized to contribute to the development of AKI in septic patients [46]. The hyperactivation of immune response caused by sepsis is particularly important in the pathogenesis process, including the proinflammatory and anti-inflammatory stages. In the proinflammatory stage, humoral and cellular immunity can cause a storm of inflammatory factors, leading to the excessive secretion of inflammatory factors (such as interleukin 1 and tumor necrosis factor-α), the activation of complement and coagulation system, the activation of hyaluronic acid and elastase, and eventually the reduction of renal blood flow and the occurrence of AKI and septic shock [6, 47]. Subsequently, patients will have a compensatory anti-inflammatory response, which is an immunosuppressive state, manifested by increased secretion of cytokines (such as interleukin 10), weakened endocytosis, reduced proliferation of lymphocytes, and increased apoptosis [29]. Thus, patients with sepsis are the high-risk group for AKI during hospitalization. Once AKI occurs, the prognosis is significantly worse, and even RRT cannot improve the prognosis.

We should acknowledge some limitations of this research. First, the retrospective and observational nature of our study may lead to inevitable selection bias. Second, the data of MIMIC- III came from a single center in the United States, which may affect the extension of the prediction model to other populations. Therefore, further research with large samples and multiple centers is necessary to externally verify the application of models. Third, we used the filling method to estimate some missing data, which may lead to deviation from the true value. However, we still believe that the constructed model is helpful for clinicians to timely treat ICU patients with sepsis at high risk for developing AKI.


In conclusion, the ML models can be reliable tools for predicting AKI in septic patients. Among all of the predictive models, the XGBoost model is the most effective model, which may assist clinicians in tailoring precise management and implementing early interventions for septic patients at risk of AKI to reduce mortality.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Acute kidney injury


Intensive care unit


Medical Information Mart for Intensive Care III


EXtreme Gradient Boosting


Logistic regression


k-Nearest neighbors


Support vector machine


Artificial neural network


Sequential organ failure assessment


The customized simplified acute physiology score


Machine leaning


Convolutional neural networks


International Classification of Diseases and Ninth Revision


The Kidney Disease: Improving Global Outcomes


Renal replacement therapy

SpO2 :

Oxygen saturation


Body mass index


Systolic blood pressure


Diastolic blood pressure


Partial thromboplastin time


Prothrombin time


International normalized ratio


Blood urea nitrogen




White blood cell








Estimated glomerular filtration rate


Area under curve


The receiver operating characteristic curves


Odds ratio


Confidence interval


Decision curve analysis


Positive-pressure mechanical ventilation


  1. Büttner S, Stadler A, Mayer C, Patyna S, Betz C, Senft C, Geiger H, Jung O, Finkelmeier F. Incidence, risk factors, and outcome of acute kidney injury in neurocritical care. J Intensive Care Med. 2020;35(4):338–46.

    PubMed  Article  Google Scholar 

  2. Hobson C, Ozrazgat-Baslanti T, Kuxhausen A, Thottakkara P, Efron PA, Moore FA, Moldawer LL, Segal MS, Bihorac A. Cost and mortality associated with postoperative acute kidney injury. Ann Surg. 2015;261(6):1207–14.

    PubMed  Article  Google Scholar 

  3. Hoste EA, Bagshaw SM, Bellomo R, Cely CM, Colman R, Cruz DN, Edipidis K, Forni LG, Gomersall CD, Govil D, Honoré PM, Joannes-Boyau O, Joannidis M, Korhonen AM, Lavrentieva A, Mehta RL, Palevsky P, Roessler E, Ronco C, Uchino S, Vazquez JA, Vidal Andrade E, Webb S, Kellum JA. Epidemiology of acute kidney injury in critically ill patients: the multinational AKI-EPI study. Intensive Care Med. 2015;41(8):1411–23.

    PubMed  Article  Google Scholar 

  4. Pinheiro KHE, Azêdo FA, Areco KCN, Laranja SMR. Risk factors and mortality in patients with sepsis, septic and non-septic acute kidney injury in ICU. J Bras Nefrol. 2019;41(4):462–71.

    PubMed  PubMed Central  Article  Google Scholar 

  5. Manrique-Caballero CL, Del Rio-Pertuz G, Gomez H. Sepsis-associated acute kidney injury. Crit Care Clin. 2021;37(2):279–301.

    PubMed  PubMed Central  Article  Google Scholar 

  6. Peerapornratana S, Manrique-Caballero CL, Gómez H, Kellum JA. Acute kidney injury from sepsis: current concepts, epidemiology, pathophysiology, prevention and treatment. Kidney Int. 2019;96(5):1083–99.

    PubMed  PubMed Central  Article  Google Scholar 

  7. Coelho S, Cabral G, Lopes JA, Jacinto A. Renal regeneration after acute kidney injury. Nephrology (Carlton). 2018;23(9):805–14.

    Article  Google Scholar 

  8. Zhang H, Che L, Wang Y, Zhou H, Gong H, Man X, Zhao Q. Deregulated microRNA-22-3p in patients with sepsis-induced acute kidney injury serves as a new biomarker to predict disease occurrence and 28-day survival outcomes. Int Urol Nephrol. 2021;53(10):2107–16.

    CAS  PubMed  Article  Google Scholar 

  9. Park HS, Kim JW, Lee KR, Hong DY, Park SO, Kim SY, Kim JY, Han SK. Urinary neutrophil gelatinase-associated lipocalin as a biomarker of acute kidney injury in sepsis patients in the emergency department. Clin Chim Acta. 2019;495:552–5.

    CAS  PubMed  Article  Google Scholar 

  10. Zhou X, Liu J, Ji X, Yang X, Duan M. Predictive value of inflammatory markers for acute kidney injury in sepsis patients: analysis of 753 cases in 7 years. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2018;30(4):346–50.

    PubMed  Google Scholar 

  11. Zhang J, Wang CJ, Tang XM, Wei YK. Urinary miR-26b as a potential biomarker for patients with sepsis-associated acute kidney injury: a Chinese population-based study. Eur Rev Med Pharmacol Sci. 2018;22(14):4604–10.

    CAS  PubMed  Google Scholar 

  12. Katayama S, Nunomiya S, Koyama K, Wada M, Koinuma T, Goto Y, Tonai K, Shima J. Markers of acute kidney injury in patients with sepsis: the role of soluble thrombomodulin. Crit Care. 2017;21(1):229.

    PubMed  PubMed Central  Article  Google Scholar 

  13. Wang H, Kang X, Shi Y, Bai ZH, Lv JH, Sun JL, Pei HH. SOFA score is superior to APACHE-II score in predicting the prognosis of critically ill patients with acute kidney injury undergoing continuous renal replacement therapy. Ren Fail. 2020;42(1):638–45.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Hu H, Li L, Zhang Y, Sha T, Huang Q, Guo X, An S, Chen Z, Zeng Z. A Prediction model for assessing prognosis in critically ill patients with sepsis-associated acute kidney injury. Shock. 2021;56(4):564–72.

    CAS  PubMed  Article  Google Scholar 

  15. Fan C, Ding X, Song Y. A new prediction model for acute kidney injury in patients with sepsis. Ann Palliat Med. 2021;10(2):1772–8.

    PubMed  Article  Google Scholar 

  16. Hou N, Li M, He L, Xie B, Wang L, Zhang R, Yu Y, Sun X, Pan Z, Wang K. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18(1):462.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the tree-based machine learning algorithms to Cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on SEER database. Cancers (Basel). 2020;12(10):2802.

    CAS  Article  Google Scholar 

  18. Chiofolo C, Chbat N, Ghosh E, Eshelman L, Kashani K. Automated continuous acute kidney injury prediction and surveillance: a random forest model. Mayo Clin Proc. 2019;94(5):783–92.

    PubMed  Article  Google Scholar 

  19. Le S, Allen A, Calvert J, et al. Convolutional neural network model for intensive care unit acute kidney injury prediction. Kidney Int Rep. 2021;6(5):1289–98.

    PubMed  PubMed Central  Article  Google Scholar 

  20. Lin K, Hu Y, Kong G. Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model. Int J Med Inform. 2019;125:55–61.

    PubMed  Article  Google Scholar 

  21. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012;120(4):c179-184.

    PubMed  Google Scholar 

  23. Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30.

    PubMed  PubMed Central  Google Scholar 

  24. Lee KJ, Simpson JA. Introduction to multiple imputation for dealing with missing data. Respirology. 2014;19(2):162–7.

    PubMed  Article  Google Scholar 

  25. Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.

    PubMed  PubMed Central  Article  Google Scholar 

  26. Lei J, Sun T, Jiang Y, et al. Risk identification of bronchopulmonary dysplasia in premature infants based on machine learning. Front Pediatr. 2021;9:719352.

    PubMed  PubMed Central  Article  Google Scholar 

  27. Yue S, Li S, Huang X, et al. Construction and validation of a risk prediction model for acute kidney injury in patients suffering from septic shock. Dis Markers. 2022;2022:9367873.

    PubMed  PubMed Central  Google Scholar 

  28. Yang S, Su T, Huang L, Feng LH, Liao T. A novel risk-predicted nomogram for sepsis associated-acute kidney injury among critically ill patients. BMC Nephrol. 2021;22(1):173.

    PubMed  PubMed Central  Article  Google Scholar 

  29. Bellomo R, Kellum JA, Ronco C, Wald R, Martensson J, Maiden M, Bagshaw SM, Glassford NJ, Lankadeva Y, Vaara ST, Schneider A. Acute kidney injury in sepsis. Intensive Care Med. 2017;43(6):816–28.

    CAS  PubMed  Article  Google Scholar 

  30. Poston JT, Koyner JL. Sepsis associated acute kidney injury. BMJ. 2019;364:k4891.

    PubMed  PubMed Central  Article  Google Scholar 

  31. Gómez H, Kellum JA. Sepsis-induced acute kidney injury. Curr Opin Crit Care. 2016;22(6):546–53.

    PubMed  PubMed Central  Article  Google Scholar 

  32. Tejera D, Varela F, Acosta D, Figueroa S, Benencio S, Verdaguer C, Bertullo M, Verga F, Cancela M. Epidemiology of acute kidney injury and chronic kidney disease in the intensive care unit. Rev Bras Ter Intensiva. 2017;29(4):444–52.

    PubMed  PubMed Central  Article  Google Scholar 

  33. Joannidis M, Druml W, Forni LG, Groeneveld ABJ, Honore PM, Hoste E, Ostermann M, Oudemans-van Straaten HM, Schetz M. Prevention of acute kidney injury and protection of renal function in the intensive care unit: update 2017: expert opinion of the working group on prevention, AKI section, European Society of Intensive Care Medicine. Intensive Care Med. 2017;43(6):730–49.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Sood MM, Shafer LA, Ho J, Reslerova M, Martinka G, Keenan S, Dial S, Wood G, Rigatto C, Kumar A. Cooperative Antimicrobial Therapy in Septic Shock (CATSS) database research group. Early reversible acute kidney injury is associated with improved survival in septic shock. J Crit Care. 2014;29(5):711–7.

    PubMed  Article  Google Scholar 

  35. Fiorentino M, Tohme- FA, Wang S, Murugan R, Angus DC, Kellum JA. Long-term survival in patients with septic acute kidney injury is strongly influenced by renal recovery. PLoS ONE. 2018;13(6):e0198269.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. Kellum JA, Sileanu FE, Bihorac A, Hoste EA, Chawla LS. Recovery after acute kidney injury. Am J Respir Crit Care Med. 2017;195(6):784–91.

    PubMed  PubMed Central  Article  Google Scholar 

  37. Majdan M, Brazinova A, Rusnak M, Leitgeb J. Outcome prediction after traumatic brain injury: comparison of the performance of routinely used severity scores and multivariable prognostic models. J Neurosci Rural Pract. 2017;8(1):20–9.

    PubMed  PubMed Central  Article  Google Scholar 

  38. Wu J, Huang L, He H, Zhao Y, Niu D, Lyu J. Red cell distribution width to platelet ratio is associated with increasing in-hospital mortality in critically ill patients with acute kidney injury. Dis Markers. 2022;2022:4802702.

    PubMed  PubMed Central  Google Scholar 

  39. Liu J, Wu J, Liu S, Li M, Hu K, Li K. Predicting mortality of patients with acute kidney injury in the ICU using XGBoost model. PLoS ONE. 2021;16(2):e0246306.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Zhu Y, Zhang J, Wang G, Yao R, Ren C, Chen G, Jin X, Guo J, Liu S, Zheng H, Chen Y, Guo Q, Li L, Du B, Xi X, Li W, Huang H, Li Y, Yu Q. Machine learning prediction models for mechanically ventilated patients: analyses of the MIMIC-III database. Front Med. 2021;8:662340.

    Article  Google Scholar 

  41. Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: a systematic review and meta-analysis. Int J Med Inform. 2021;151:104484.

    PubMed  Article  Google Scholar 

  42. Mertoglu C, Gunay M, Gurel A, Gungor M. Myo-inositol oxygenase as a novel marker in the diagnosis of acute kidney injury. J Med Biochem. 2018;37(1):1–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. Grams ME, Sang Y, Ballew SH, et al. A Meta-analysis of the association of estimated GFR, Albuminuria, age, race, and sex with acute kidney injury. Am J Kidney Dis. 2015;66(4):591–601.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. Pan L, Mo M, Huang A, Li S, Luo Y, Li X, Wu Q, Yang Z, Liao Y. Coagulation parameters may predict clinical outcomes in patients with septic acute kidney injury. Clin Nephrol. 2021;96(5):253–62.

    CAS  PubMed  Article  Google Scholar 

  45. Ju S, Lee TW, Yoo JW, Lee SJ, Cho YJ, Jeong YY, Lee JD, Kim JY, Lee GD, Kim HC. Body mass index as a predictor of acute kidney injury in critically ill patients: a retrospective single-center study. Tuberc Respir Dis (Seoul). 2018;81(4):311–8.

    Article  Google Scholar 

  46. Zhi DY, Lin J, Zhuang HZ, Dong L, Ji XJ, Guo DC, Yang XW, Liu S, Yue Z, Yu SJ, Duan ML. Acute kidney injury in critically ill patients with sepsis: clinical characteristics and outcomes. J Invest Surg. 2019;32(8):689–96.

    PubMed  Article  Google Scholar 

  47. Opal SM, Ellis JL, Suri V, Freudenberg JM, Vlasuk GP, Li Y, Chahin AB, Palardy JE, Parejo N, Yamamoto M, Chahin A, Kessimian N. Pharmacological SIRT1 activation improves mortality and markedly alters transcriptional profiles that accompany experimental sepsis. Shock. 2016;45(4):411–8.

    CAS  PubMed  Article  Google Scholar 

Download references


The authors would like to thank MIMIC-III for open access to their database. The opinions expressed in this study are those of the authors and do not represent the opinions of the Beth Israel Deaconess Medical Center.


This study was supported by Guangdong Basic and Applied Basic Research Foundation (2020B01515020004, 2018A0303130269), Guangdong Province Medical Scientific Research Fund Project (A2019537), Competitive Project of Financial Special Funds for Science and Technology of Zhanjiang City (2018A01026), and Guangdong Medical University Scientific Research Fund Program (GDMUM201806).

Author information




Literature search: SY, SL, YF, and JW; Study design: WT and JW; Data collection: SL, JL, XH, and XH; Data analysis: SY, JL, and YF; Model construction: XH, WT, and JW; Manuscript writing: SY, SL, XH, and JL. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Wenkai Tan or Jiayuan Wu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1

. Calibration curves of the seven models. The x-axis represents the predicted probability calculated by models, and the y-axis is the observed actual probability of AKI. LR logistic regression, KNN k-nearest neighbors, SVM support vector machine, XGBoost Extreme Gradient Boosting, ANN artificial neural network, AKI acute kidney injury.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yue, S., Li, S., Huang, X. et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med 20, 215 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Acute kidney injury
  • Sepsis
  • Machine learning
  • Prediction model
  • MIMIC- III database