Participants
This study collected data on over one million patients from the Chinese Stroke Center Alliance (CSCA), a national, hospital-based, multi-center program initiated in August 2015. The CSCA requires participating hospitals to only enroll patients who meet the following criteria: (1) over 18 years old; (2) had the primary diagnosis of acute stroke/transient ischemic attacks (TIA) confirmed by brain CT or MRI, including acute AIS, TIA, intracerebral hemorrhage, or subarachnoid hemorrhage (SAH); (3) within seven days of symptom onset; and (4) admitted to hospital either directly or through emergency departments. Patients with cerebral venous sinus thrombosis or non-cerebrovascular diseases were excluded. For ensuring the accuracy of diagnosis and the quality of stroke care, performance metrics were used over the whole controlling process by strictly following the national standards and guideline recommendations prespecified or updated by the Steering Committee of CSCA. Detailed information about the CSCA design and methodology can be found in previous publications [24]. This study had been approved by the Central Institutional Review Board of Beijing Tiantan Hospital.
Patients with intracranial hemorrhagic stroke were selected, resulting in a total of 83,063 patients as our study cohort. Among the selected patients, 61,869 patients had no pneumonia (74.47%), while 21,194 patients had pneumonia (25.52%). There are more than 500 characteristic variables, including clinical variables on admission such as blood pressure, blood sugar, uric acid, pneumonia, National Institute of Health stroke scale (NIHSS), and modified Rankin Scale (mRS), as well as external variables such as hospital level, education level, and family income status.
Definition and indicators of pneumonia
Pneumonia can be diagnosed by a typical chest X-ray, clinical symptoms, signs such as a cough, purulent sputum, fever, and laboratory tests such as white blood cell count. SAP after ICH can be diagnosed by a treating physician who uses clinical and laboratory indicators of respiratory infections such as fever, cough, and auscultation of respiratory cracks, new purulent sputum, or positive sputum culture, together with typical chest X-ray findings from PISCES (Pneumonia in Stroke ConsEnsuS) [25]. Hospital-acquired pneumonia was documented by excluding those cases that occurred before the stroke. Data on the development of SAP after ICH were prospectively collected.
Study procedure
To validate our model for predicting the likelihood of pneumonia, we used data collected from two multi-center cohorts in this study – the internal prospective research cohort of CSCA and the external independent verification cohort of CNSR II. In our experiments, we allocated the data from 2015 to 2018 for training with an internal verification ratio of 8:2, and the 2019 data for testing. After that, data records with missing values were filled in through data processing. The feature selection was then performed to select features that have important impacts on pneumonia. To this end, we first trained our model by using the two classic models—XGboost and logistic regression. ICH-LR2S2 was then calculated using the feature weight coefficients of logistic regression. After consulting doctors, the score interval was slightly modified according to the medical risk values to comply with the medical consensus [16, 26]. Additionally, we examined its performance on an external verification cohort. For the benefit of clinical practice, we stratified the patient population and analyzed the whole population cohort. The flow diagram is shown in Fig. 1.
Data processing
Additional file 1: Table S3 shows the proportion of missing data for the selected variables. If a variable with a missing value was a continuous one, we filled it with the median value of that variable in the dataset. If it was binary, we filled it with 0, which means that there is no such disease history (our binary variables only include disease history and gender, and the gender variable is not missing). We finally obtained the data with the training set = 56,432, internal validation cohort = 14,108, and test set = 12,523 in our experiments.
Feature selection
Considering medical variables from the perspective of clinical practice, we focused on screening medical variables related to human physiological characteristics and disease history conditions. We tried to select as few variables as possible without reducing the prediction accuracy for pneumonia. Feature selection was performed using the permutation method [27], which is suitable for tree models. The importance of a feature can be measured by how much the objective score decreases as a result of removing the feature. Specifically, the variable weights were calculated through the permutation mechanism provided by XGboost [28], which is a boosting tree model with the capacity to handle missing values. Ten-fold cross-validation tests on the training set were conducted to calculate the feature weights.
We filtered out the feature variables in turn, according to their weight order. A newly added feature must increase the overall score of the internal verification cohort by at least 0.005 in the cross-validation. Considering the features selected in the previous studies [16,17,18,19,20,21] as well as recommended by the doctors, we further added three new variables—gender, current smoking, and C-reactive protein. Finally, we ended up with 12 variables of dysphagia, Glasgow Coma Score (GCS), age, gender, fasting blood glucose, uric acid, COPD, National Institutes of Health Stroke Scale admission score (NIHSS score), mRS, current smoking, serum creatinine, and C-reactive protein. Detailed descriptions of these variables are provided in Additional file 1: Figure S3.
Baseline scores
In Additional file 1: Tables S1 and S2, we list the scoring scales that can be used in ICH as the predictions of SAP in recent years. The aged variable is used by all scores, so is NIHSS except for the ACDD4 score [19]. In the following experiments, we mainly considered clinical variables that are easy to obtain. Therefore, we screened ICH-APS-A (from now on referred to as ICH-APS), PASS, ISAN, and PNA. It is worth mentioning that we made a compromise for ICH-APS; that is, we used drinking history instead of excessive drinking in ICH-APS. And considering the acquisition of variables, we set a score of 0 for the three medical variables (hematoma volume, infratentorial location, and extension into ventricles) that are not included in our data cohorts.
ICH-LR2S2
We used classic machine learning models of logistic regression. Calculating the medical risk score by using the regression coefficient [29] and the prior medical consensus [16, 19, 21], we developed the ICH-LR2S2 risk score shown in Table 1. We excluded features with scores of less than one point. As such, ICH-LR2S2 used the nine patient features: age, mRS, fasting blood glucose, NIHSS score, GCS, C-reactive protein, dysphagia, COPD, and current smoking. Compared with previous risk scores, ICH-LR2S2 has two new variables—fasting blood glucose and C-reactive protein.
External validation cohort
The performance of our model was tested on an independent cohort from the China National Stroke Registry II (CNSR II) [30]. As a nationwide initiative, the CNSR II, launched in 2012 by the Ministry of Health of China, established a reliable national stroke database for evaluating the delivery of stroke care in clinical practice. The CNSR II cohort included patients recruited from all 219 urban hospitals that voluntarily participated in the General Administration of Stroke Registration of China from June 2012 to January 2013. The study had been approved by the Central Institutional Review Board of Beijing Tiantan Hospital. Each participant provided written informed consent before participating.
Statistical analysis
Continuous variables are described by means and standard deviations (SD), while categorical variables are described by counts and percentages. The prediction performances of the models are measured by the area under the receiver operating characteristic curve (AUC), with a 95% confidence interval (CI). The AUCs of these models were compared using the DeLong test [31]. The student’s t-test was used for continuous variables and the chi-square test for categorical variables. Two-sided p < 0.01 was considered to be statistically significant.
Based on logistic regression, the risk score used weight coefficients. By taking ten years as the interval, the ratio of a feature weight to the age weight coefficient was calculated to obtain the corresponding feature score and numerical interval. For a binary variable, the presence or absence of a feature was used as a scoring criterion (gender features give scores to men). For a continuous variable, in addition to considering the weight coefficient from the model, the actual meaning of the medical feature (medical risk range for this feature) must also be considered. In particular, the minimum unit of the score was 1 point, and features with less than 1 point were not scored. Based on the predictive score of the model, we stratified the risk of the population cohort and specified the risk threshold. We analyzed different risk groups by calculating the number of patients, the pneumonia rate, accuracy, sensitivity, specificity, PPV, and NPV (for more detailed information, refer to Additional file 1: Tables S7–S14).