Integration of machine learning to identify diagnostic genes in leukocytes for acute myocardial infarction patients
Journal of Translational Medicine volume 21, Article number: 761 (2023)
Acute myocardial infarction (AMI) has two clinical characteristics: high missed diagnosis and dysfunction of leukocytes. Transcriptional RNA on leukocytes is closely related to the course evolution of AMI patients. We hypothesized that transcriptional RNA in leukocytes might provide potential diagnostic value for AMI. Integration machine learning (IML) was first used to explore AMI discrimination genes. The following clinical study was performed to validate the results.
A total of four AMI microarrays (derived from the Gene Expression Omnibus) were included in bioanalysis (220 sample size). Then, the clinical validation was finished with 20 AMI and 20 stable coronary artery disease patients (SCAD). At a ratio of 5:2, GSE59867 was included in the training set, while GSE60993, GSE62646, and GSE48060 were included in the testing set. IML was explicitly proposed in this research, which is composed of six machine learning algorithms, including support vector machine (SVM), neural network (NN), random forest (RF), gradient boosting machine (GBM), decision trees (DT), and least absolute shrinkage and selection operator (LASSO). IML had two functions in this research: filtered optimized variables and predicted the categorized value. Finally, The RNA of the recruited patients was analyzed to verify the results of IML.
Thirty-nine differentially expressed genes (DEGs) were identified between controls and AMI individuals from the training sets. Among the thirty-nine DEGs, IML was used to process the predicted classification model and identify potential candidate genes with overall normalized weights > 1. Finally, two genes (AQP9 and SOCS3) show their diagnosis value with the area under the curve (AUC) > 0.9 in both the training and testing sets. The clinical study verified the significance of AQP9 and SOCS3. Notably, more stenotic coronary arteries or severe Killip classification indicated higher levels of these two genes, especially SOCS3. These two genes correlated with two immune cell types, monocytes and neutrophils.
AQP9 and SOCS3 in leukocytes may be conducive to identifying AMI patients with SCAD patients. AQP9 and SOCS3 are closely associated with monocytes and neutrophils, which might contribute to advancing AMI diagnosis and shed light on novel genetic markers. Multiple clinical characteristics, multicenter, and large-sample relevant trials are still needed to confirm its clinical value.
Acute myocardial infarction (AMI), the most severe form of cardiovascular disease, is associated with [1, 2] millions of deaths annually around the world [3, 4]. Generally, the diagnosis of AMI includes clinical syndrome, electrocardiogram, and serum changes in enzyme levels . However, AMI is easily misdiagnosed because of the following three aspects: nonclassic clinical symptoms [6, 7], atypical underappreciation , and an untimely serum peak. Because of the above three problems, a previous study  reported that the missed diagnosis rate of AMI is higher than 0.9%. The diagnosis and treatment of AMI must be prompt; otherwise, it may trigger irreversible results. Therefore, exploring new markers of AMI to decrease missed diagnoses is essential and urgent.
Leukocytes play an important and varied role in the entire evolution of AMI. During the acute injury phase of AMI, leukocytes promote a severe inflammatory cascade response through the polarization of M1 macrophages . During the repair phase of AMI, M2 macrophages in leukocytes suppress inflammation and mediate the repair of injured myocardium . Furthermore, leukocyte alteration positively correlates with AMI severity and, inversely, with patient survival [12, 13].
RNAs are involved in the evolution of AMI. For example, miR-155 correlated positively with the concentration of inflammatory cytokines, such as IL-6 and TNF-α , in AMI. Neutrophil-derived S100A8/A9 amplify granulopoiesis and cardiac injury in AMI mice . Conversely, M2 macrophage-derived exosomes carry miR-1271-5p  to alleviate AMI-related cardiac injury. In conclusion, RNA on leukocytes plays a different role in the evolution of AMI, possibly related to different leukocyte subtypes. However, numerous studies have focused on integrating target interventions [12, 17] and leukocyte complications [17, 18]. Few studies have focused on the diagnostic value of leukocytes' RNA. Because the leukocytes' RNA is involved in the evolution of AMI, these RNA might have diagnosing value for AMI patients. The diagnosis value might be related to various leukocyte subtypes.
Machine learning (ML) helps humans learn patterns from complex data to predict future behavioural outcomes and trends. ML was widely utilized in variable filtering. A previous study used a single ML algorithm or two integrated ML algorithms (e.g., support vector machine  or least absolute shrinkage and selection operator ) to optimize variables. Still, these approaches may have missed potential genes . Compared with a single ML algorithm, the integrated ML (IML) approach [21,22,23] we developed is more advantageous in variable screening and model building. IML helps identify potential genes mistakenly deleted by a single ML and find more meaningful variables . IML integrates the advantages of a single ML, and its predictive classification value is better . Based on a favourable filtration value in transcriptomics, IML might be used to comprehensively explore the diagnostic value in AMI patients.
In summary, we aim to explore the potential diagnostic value of transcriptome within leukocytes for identifying AMI patients. Because of IML's good variable screening and excellent predictive value, IML was first used to mine diagnostic genes in AMI leukocytes with multiple microarrays. Single microarray data might have inherent biases in capturing the entire transcriptomic landscape, so multiple microarrays are integrated after resolving batch effects to reduce bias and validate each other. And clinical validation was added to confirm the result. The relationship between transcriptome and leukocyte subtypes was unclear, so the correlation between immune cells and target transcriptome was subsequently accomplished. We expect to explore the functional roles of the identified genes in AMI pathophysiology, investigating their potential as therapeutic targets.
The raw data were obtained from the Gene Expression Omnibus (GEO, March 27, 2022). AMI patients have similar symptoms to stable coronary artery disease (SCAD), which were set as the controls. An increasing leukocyte may influence the result of other cardiovascular diseases (e.g., stroke [24, 25] and heart failure ), which will be excluded. Based on the above, the following inclusion and exclusion criteria were set: (I) inclusion criteria—(i) diagnosed as AMI patients on admission; (ii) transcriptome was obtained from leukocytes in blood; (iii) initial data were free and accessible; and (iv) the control individuals were diagnosed with health or SCAD; and (II) exclusion criteria—(i) other cardiovascular diseases suspected and (ii) blood were taken more than one day after hospitalization.
To ensure the reliability of the data, the R package sva (version 3.46.0) was applied to data integration to minimize the branch effects with the ComBat function and parametric adjustments. Regarding the distribution ratio of previous literature (1.64:1  to 5:1 ) and to minimize the branching effect, this research was distributed in the training or testing sets at a ratio of 5:2. GSE59867 was included in the training set. In contrast, GSE60993, GSE62646, and GSE48060 were included in the testing set. In brief, the training set was applied to explore candidate diagnostic genes, and the testing set was used for validation. Based on the differential DEGs, three functional enrichment analyses were developed via the Kyoto Encyclopedia of Genes and Genomes Gene Set Enrichment Analysis (KEGG-GSEA), Gene Ontology (GO), and Disease Ontology (DO). In addition, the GO terms included three branches: molecular function (MF), biological process (BP), and cellular components (CC). Notably, the novel IML served two functions: developing classification ML and exploring the candidate variable. Finally, the above candidate genes were verified in the testing group and clinical study, and an immune analysis among the candidate genes was performed. CIBERSORT was processed for immune correlation analysis in the corrplot R package (version 0.92). And the primary code was linked with https://github.com/Linzhang-BiuBiuBiu/ML-for-diagnosis-genes.
Searching for DEGs
Because the same gene may have multiple sequences, the transcriptome will appear to have several expression data for the same genes. For the same genes, limma (version 3.54.0) was employed to identify the DEGs with the average gene expression. According to the Benjamini and Hochberg method, two thresholds were established: a log of fold change (|logFC|) > 0.7 (previous studies were 0.5 –1 ) and a false discovery rate  < 0.05.
IML of six ML algorithms
Classification models of IML, composed of six ML algorithms, were processed, covering support vector machine (SVM), neural network (NN), random forest (RF), gradient boosting machine (GBM), decision trees (DT), and least absolute shrinkage and selection operator (LASSO). In brief, IML was used to identify candidate genes with the overall normalized weights. The six ML algorithms were developed to optimize parameter settings, model development in the training sets, and validation in the testing sets. For stability, all ML algorithms were tenfold cross-validated. Notably, an accuracy value was applied to evaluate the predictive classification value, and a higher accuracy value showed a better classification value of the six ML algorithms.
LASSO was processed with the glmnet (version 4.1-6) R package. cv.glmnet was utilized to majorize lambda. For the parameters, the scale of "lambda" was set between 0 and 100 with "binomial" and "class".Based on the minimum lambda, glmnet was processed to the LASSO with alpha and a "binomial" method in training sets.
SVM was developed with the e1071 R package (version 1.7–12). tune.svm was adopted to optimize the settings parameter with the kernel of "linear", and the cost between 1 and 20. Next, based on the optimized number of support vectors in the training set, the SVM model was finished.
DT was processed with rpart (version 4.1.19) and rpart.plot (version 3.1.1). Based on the "class" method and a cp value of 0.001, the rpart function was adopted for the DT model.
RF was completed with the R package randomForest (version 4.7-1.1). First, the tuneRF function was adopted to optimize 0-700 trees with one step size. RF was developed based on the minimum error rate to optimize the number of trees.
NN was processed with neuralnet (version 1.44.2) with neuralnet function, five layers (an input, an output, and three hidden layers), err.fct of "sse", and the linear.
Compared with the other 5 ML algorithms, GBM processed more steps and was prone to making mistakes. The GMB was developed with h2o (version 220.127.116.11). First, the Java operating environment was installed, which is the virtual environment of GBM. Essential for running the memory setting in h2o.init, the model memory of GBM was adjusted to 8G. The h2o data type in GBM was inevitable, and the as.h2o function was utilized to transform the data format. Next, h2o.gbm tuned the parameters and developed the model with the "Bernoulli" distribution, 200 trees, a learning rate of 0.001, and a sample rate of 90%.
Furthermore, with the weights of the above six ML algorithms in DEGs, the normalized sum weight of IML was calculated as follows: overall weights = abs(RF)/abs(RFmax) + abs(SVM)/abs(SVMmax) + abs(LASSO)/abs(LASSOmax) + abs(NN)/abs(NNmax) + abs(GBM)/abs(GBMmax) + abs(DT)/abs(DTmax). For instance, if the weight of interleukin-6 in six ML algorithms was 30, -22, 20, -2, 320, and -8, the maximum absolute value weights in the six ML algorithms were 60, 88, 80, 8, 640, and 16. Therefore, the overall weight of interleukin-6 was |30|/60 +|− 22|/88 +|20|/80 +|− 2|/8 +|320|/640 +|− 8|/16 = 2.25. With normalized overall weights > 1, the candidate genes were estimated by the area under the curve (AUC).
The clinical trial was performed according to the Declaration of Helsinki guidelines. All AMI and SCAD patients provided individual written informed consent from October 10, 2022, to December 31, 2022, and the Ethics Review Committee of Jinghai District Hospital approved the study. There was no increase in the cost of treatment for the patients, no addition of other intervention in the treatment, and the blood samples used were taken from the discarded blood of the patients after their routine blood tests on the same day. If the patient did not have a routine blood test on that day, then the patient was excluded. All AMI patients underwent coronary angiography, and blood samples were collected in anticoagulant tubes on admission. Density gradient centrifugation [31, 32] was performed for leukocyte isolation (mainly monocytes and lymphocytes). In brief, 8 mL of Ficoll solution was added to 8 mL of anticoagulated blood, and the upper plasma layer was discarded after centrifugation. The white cell layer at the isolate's junction was aspirated, added to 10 mL of saline, and centrifuged; the bottom layer was the leukocytes (mainly monocytes and lymphocytes). RNA, isolated from leukocytes, was synthesized with reverse transcription kits (Takara, Shiga, Japan). Quantitative reverse transcription PCR was executed on an ABI7900HI (Thermo Fisher Scientific). According to previous literature, the relative content of the candidate genes was scaled to the reference gene (GAPDH ), and Table 1 lists the primer sequences.
A total of 4 datasets (Table 2) (220 sample sizes), namely, GSE59867, GSE60993, GSE62646, and GSE48060, were integrated for this study. The training set was obtained from GSE59867 (46 controls and 111 AMI patients) based on a raw ratio of 5:2. Furthermore, the testing set was integrated with the other three datasets (28 controls and 35 AMI patients), namely, GSE60993, GSE62646, and GSE48060. The following analysis is presented in Fig. 1.
Thirty-nine DEGs were identified (Additional file 1: Table S1) in a training set from 17,049 RNAs. Compared to the control group (SCAD), 28 genes were upregulated (SOCS3, HP, ECRP, AQP9, FAM20A, CES1, STAB1, NRG1.1, NRG1, DYSF, RNASE1, RNASE2, ASGR2, CYP1B1, MERTK, FCGR1A.2, MIR21, FCGR1A.1, TCN2, VSIG4, PPARG, FCGR1A, SLED1, S100A9, FMN1.1, CD163, TMEM176A, and SERPINB2) and 11 genes were downregulated (KLRC3, KLRD1, KLRA1P, DTHD1, KLRC4, MYBL1, CLC, KLRC2, KLRC4-KLRK1, SNORD20, and SNORD45B) in AMI individuals (Fig. 2).
Based on the above DEGs, 45 GSEA terms (Additional file 1: Table S2) were identified, and the top 5 are shown in Fig. 3A, B; 160 GO terms (Additional file 1: Table S3) were identified, and the top 5 are shown in Fig. 3C; and the top 15 of 57 DO terms (Additional file 1: Table S4) are shown in Fig. 3D. In GSEA-KEGG of AMI, the top 3 were Fc gamma R-mediated phagocytosis, Huntington disease, and Leishmania infection. In GO, the top 3 in BP were the stimulatory C-type lectin receptor signalling pathway, response to lectin, and cellular response to lectin. In DO terms, the top 3 were atherosclerosis, arteriosclerotic cardiovascular disease, and arteriosclerosis.
IML of six ML algorithms
Six ML algorithms (Fig. 4) and their accuracies (Table 3) were assessed. Eight genes were identified in LASSO (Fig. 4A), and the training and testing sets' accuracy value was 70.70% (Table 3). In SVM, 13 genes were filtered (Fig. 4B), and the accuracies were 88.46% and 91.84%, respectively. The error rate of RF (Fig. 4C) decreased with an increasing number of trees. Until 161 trees, the error rate of RF was minimized, and the accuracy of the two sets was 98.09% and 100%. In DT (Fig. 4D), the gene expression of 9.8 in AQP9 could discriminate the control and AMI groups, while the accuracies were unstable, 94.27%, and 75.52%. In GBM (Fig. 4E), sixfold methods were established to optimize the diagnosis genes, but unstable accuracies, such as the above ML algorithms, were 93.30% and 85.71%. In the NN (Fig. 4F), although sufficient for discriminating the controls and AMI patients with three hidden layers, the accuracy was either 83.74% or 71.43%. Among the above ML algorithms, the primary weights of 39 DEGs were identified (Additional file 1: Table S5). Interestingly, RF had the highest and most stable accuracy value among all ML algorithms. The normalized overall weights (Table 4) were calculated to filter the candidate variables. Twenty-six genes (ASGR2, SOCS3, AQP9, PPARG, RNASE1, DYSF, S100A9, FCGR1A, VSIG4, STAB1, MYBL1, KLRD1, ECRP, TCN2, FAM20A, MERTK, HP, RNASE2, DTHD1, CLC, SNORD20, CD163, NRG1, SNORD45B, CYP1B1, and KLRC2) were identified because of overall weights > 1 (Table 4).
With the basis of overall normalized weights > 1, 26 candidate genes were filtered for subsequent diagnosis in AMI and control groups in the training and testing sets. Among the 26 genes, 10 were excluded because of no differentiation in the testing set. Sixteen genes were significant in the two sets (Fig. 5).
Diagnosis value of candidate genes
Sixteen candidate genes were included in the following ROC analysis. The AUC values of SOCS3, AQP9, and ASGR2 were greater than 0.85 in both the training and testing sets. In particular, 2 genes, SOCS3 and AQP9, were greater than 0.9 (Fig. 6). The AUC value of the two genes indicated a potential diagnostic value in AMI.
Immune correlation was performed with the 220 samples (Fig. 7). The infiltration landscape (Fig. 7A) showed 22 immune distributions in the control and AMI groups. Nine types of immune cells (T cells CD8, T cells CD4 naive, T cells regulatory (Tregs), NK cells resting, monocytes, macrophages M0, macrophages M2, eosinophils, and neutrophils) infiltrated significantly between the control and AMI groups (Additional file 1: Fig. S1). Moreover, the correlations between 22 immunized cells and the two diagnostic genes, AQP9 and SOCS3, based on Spearman analysis (Fig. 7B, C) showed significant correlations with 9 immune cells (monocytes, neutrophils, T cells CD8, NK cells resting, T cells CD4 naive, eosinophils, macrophages M2, dendritic cells activated, and B cells memory). More importantly, two immune cell types (monocytes and neutrophils) possessed a higher correlation coefficient (Fig. 7B, C) than the other 7 immune cell types (Additional file 1: Figs. S2, S3). In particular, the correlation coefficients of monocytes (Fig. 7B, C) were highest for the two genes (0.56 for SOCS3 and 0.76 for AQP9).
Finally, 40 individuals (20 SCAD and 20 AMI patients) were recruited. The general information of these individuals was shown in Table 5. Among 39 clinical characteristics were summarized, and 13 had significance between the SCAD and AMI patients, including WBC, NeP, MonP, Lym, GAT, D-dimer, CRP, SOCS3, AQP9, LDH, cTnT, CK-MB, and Albumin.
The relative RNA levels (Fig. 8A) of AQP9 and SOCS3 were both significant. The SOCS3 content of coronary arteries differed by the number of lesions (Fig. 8B): three lesions showed significantly higher SOCS3 than two and one (Fig. 8B). In Fig. 8C, two AMI patients were excluded because of unclear Killip grading. Patients with III-IV Killip classification had higher SOCS3 compared to those with I-II (Fig. 8C). Although more stenotic coronary arteries were associated with higher levels of AQP9, the difference was less significant than for SOCS3 (Fig. 8B). In addition, different Killip classifications associated with AQP9 possessed no significant differences (Fig. 8C). Furthermore, the 9 significant clinical features were analysed with Pearson correlation test (Additional file 1: Fig. S4). And SOCS3 had a positive correlation with AQP9. Both genes had a negative correlation with Albumin.
To our knowledge, our work is the first to filter AMI diagnosis genes based on the overall normalized weights of IML. Four microarrays with 220 samples were adopted for data analysis, and further clinical studies were performed to validate the results. Two genes, AQP9 and SOCS3, showed an AUC > 0.9 in both the training set and testing set (Fig. 6). Both genes showed a typical and highest correlation coefficient (Fig. 7) in monocytes. The clinical study verified the significance between AMI and SCAD controls, indicating a potential diagnostic value of AQP9 and SOCS3. Compared with previous studies, we reached similar conclusions that AQP9 presented diagnostic value for AMI [34, 35], and we further explored the immune correlation of AQP9. Additionally, Prof. Zhu  identified SOCS3 as an immune-related gene in AMI, and we expanded it to have diagnostic value. More importantly, this study is the first to reveal the RNA correlation of AQP9 and SOCS3, especially SOCS3, between the number of stenotic coronary arteries and the Killip classification.
AQP9, a cell membrane protein, transports water down the concentration gradient. ERK1/2 can be reversed in AMI rats by silencing AQP9, attenuating cardiomyocytes' inflammatory response and apoptosis and upregulating cardiac function . The above research indicated the crucial role of AQP9 in the pathogenesis of AMI. In human polymorphonuclear leukocytes, AQP9-related inflammation may result from the NK-κB  and F-actin polymerization . In our work, the ROC curve of AQP9 was > 0.9. Therefore, AQP9 might be a potential genetic marker for diagnosing AMI with SCAD.
SOCS3 is increased in AMI mice  and regulates the T-cell repertoire with STAT3/SOCS3 signalling . More importantly, cardiac-specific silencing of SOCS3 triggers sustained STAT3 and decreases myocardial apoptosis . Therefore, SOCS3 is the dominant negative modulator  of Th17 via STAT3 . Apoptosis regulates the pathophysiological evaluation of AMI . In vitro, SOCS3 can trigger the apoptosis of mammary cells , and knocking out SOCS3 regulates the expression of apoptosis in 3T3-L1 preadipocytes . The above research emphasized the immune regulation of SOCS3 and the regulation of apoptosis with STAT3. In our work, the ROC curve of SOCS3 was > 0.9. Therefore, SOCS3 might be an effective genetic marker for diagnosing AMI.
Additionally, the CIBERSORT algorithm showed that the proportion of neutrophils and monocytes in the AMI group was higher than in the control group. The progression of AMI is correlated with immune disorder. For example, the white blood cell count correlates highly with in-hospital mortality after AMI . Neutrophils are increased in peripheral blood, and researchers have emphasized that neutrophils-lymphocytes [48, 49] and monocytes/macrophages  can be easily acquired factors for the prognosis of AMI. Macrophages were dominant in infarcted myocardium, especially over the first week of AMI . However, NK cells have diminished cytotoxic function , and the targeted regulation of NK cells may indicate a dominant role in the cure of AMI. At the beginning of AMI, inflammation deteriorates with increased neutrophils and monocytes , and inflammation decreases over time with the reduced function of NK cells. Innate immunity is a vital regulatory factor in the inflammatory, proliferative, and maturation phases [3, 54, 55]. AMI leads to a deteriorated inflammatory process. Currently, novel therapeutic interventions targeting the immune system may regulate slant inflammation, which is conducive to resolving pathological conditions. In a previous clinical trial of 182 NSTEMI patients (a subtype of AMI), the patient's intake of IL-1 blockers decreased acute inflammation . Another immune study showed that short-term blockade of S100A9 downregulates inflammation  in permanent coronary ischemia mice. However, the above immune interventions are still experimental and not in the clinic. In summary, regulating immune cells along with the progression of AMI and immune intervention in AMI might be a potential target.
AQP9 expression was highest in human polymorphonuclear leukocytes  compared with the spleen and liver, suggesting a possible correlation between AQP9 and immunity or inflammation. AQP9 regulates water flow on leukocytes , which regulates cellular morphology and motility, a change that facilitates the migration of leukocytes to inflammatory sites. Similar to our result, Hawang  indicated the correlation between AQP9 and neutrophile granulocytes. Research [29, 60, 61] emphasizes the correlation between SOCS3 and neutrophils in inflammation. In our research, both genes had a higher correlation with two immune cells, neutrophils and monocytes. The immune cell correlation indicated that the targeted gene therapy of immune cells may benefit the course of AMI—potential feasibility of using AQP9 and SOCS3 as therapeutic targets or predictors of treatment response.
ML algorithms are widely performed for various cardiovascular diseases, such as optimizing variables, classification, and congression. For variable filtration, numerous studies take only single or double ML algorithms (e.g., weighted gene coexpression network analysis , LASSO, and SVM). However, only the single or double ML algorithms might unconsciously delete the potential genes. For example, AQP9 will be ignored if we only take DT because the weights of AQP9 were zero in DT (Table 4). Taking only a single ML might miss some potential genes. For example, although LASSO can detect candidate genes with big data when highly correlated features exist, the LASSO regression method tends to select one of them and ignore all the other features, leading to the instability of the results . In pigmented skin lesions , SVM and NN displayed their talent classification value. In preoperative postsurgical mortality , GBM is optimized rather than DT, RF, and SVM. Various ML algorithms may show different weights even in the same variable (Table 4). Necessarily, the overall normalized weights of IML were taken to filter genes. Surprisingly, IML explores two potential, unreported diagnostic genes in AMI. In our study, IML has good value in both variable screening and model prediction.
Inevitably, four limitations exist in this work, although the best efforts were taken to eliminate them. Primarily, small sample size verification might possess some bias. So, multicentre collaborations or leveraging larger external datasets is crucial for further verification. Although testing sets and clinical validation were developed to assess the stability of the diagnostic value, the bias of single-centre validation might exist. More confirmation, clinical trials and animal experiments are indispensable for solid verification. Next, the ML algorithms contained limitations (e.g., the black box phenomenon ), especially NN, which has numerous layers . The set of operations an ML performs in making a prediction is unknown, even if a human knows precisely what the model is doing at each step of the decision-making process. The operations performed cannot be described in terms of human-understandable semantics. And the interpretability techniques for ML models always catch the eye of developers, which enhances the transparency and reliability of the ML. Thirdly, because of the limitations of our laboratory extraction process, clinically validated acquired leukocytes are predominantly lymphocytes and monocytes. Finally, limited clinical features were obtained (e.g., age , ethnicity, and race ). Clinical features could potentially enhance the predictive accuracy of the diagnostic model and provide a more comprehensive understanding of AMI. For example, various combinations (e.g., sex, smoking or not, and laboratory indicators) of clinical variables  are calibrated to analyze the relationship between the target variable and the outcome.
Based on the overall normalized weights of IML, the research successfully merges four microarrays and uncovers hidden diagnostic genes AQP9 and SOCS3 for leukocytes of AMI patients. AQP9 and SOCS3 are closely associated with monocytes and neutrophils, which might contribute to advancing AMI diagnosis and shedding light on novel genetic markers, including AMI pathogenesis, targeted therapies, and potential precision medicine. Although clinical validation copies the result again. Multiple clinical characteristics, multicenter, and large-sample relevant trials are still needed to confirm its clinical value.
Availability of data and materials
The datasets presented in this study can be found online. The names of the repositories and GEO numbers can be found below: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59867; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60993;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62646;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48060.
Area under the curve
Acute myocardial infarction
Stable coronary artery disease
Integration machine learning
Differently expressed genes
Kyoto Encyclopedia of Genes and Genomes-Gene Set Enrichment Analysis
Support vector machine
Least absolute shrinkage and selection operator
Out of bag
Gradient boosting machine
Red blood cell count
White blood cell count
Red blood cell distribution width
Platelet distribution width
Mean corpuscular haemoglobin concentration
Cardiac troponin t
Creatine kinase isoenzymes
Total bile acid
GBD 2017 DALYs and HALE Collaborators. Global, regional, and national disability-adjusted life-years (DALYs) for 359 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1859–922.
Murray CJ, Barber RM, Foreman KJ, Abbasoglu Ozgoren A, Abd-Allah F, Abera SF, Aboyans V, Abraham JP, Abubakar I, Abu-Raddad LJ, et al. Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition. Lancet. 2015;386:2145–91.
Reed GW, Rossi JE, Cannon CP. Acute myocardial infarction. Lancet. 2017;389:197–210.
Anderson JL, Morrow DA. Acute myocardial infarction. N Engl J Med. 2017;376:2053–64.
Levine GN, Bates ER, Bittl JA, Brindis RG, Fihn SD, Fleisher LA, Granger CB, Lange RA, Mack MJ, Mauri L, et al. 2016 ACC/AHA guideline focused update on duration of dual antiplatelet therapy in patients with coronary artery disease: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines: an update of the 2011 ACCF/AHA/SCAI Guideline for Percutaneous Coronary Intervention, 2011 ACCF/AHA Guideline for Coronary Artery Bypass Graft Surgery, 2012 ACC/AHA/ACP/AATS/PCNA/SCAI/STS Guideline for the Diagnosis and Management of Patients With Stable Ischemic Heart Disease, 2013 ACCF/AHA Guideline for the Management of ST-Elevation Myocardial Infarction, 2014 AHA/ACC Guideline for the Management of Patients With Non-ST-Elevation Acute Coronary Syndromes, and 2014 ACC/AHA Guideline on Perioperative Cardiovascular Evaluation and Management of Patients Undergoing Noncardiac Surgery. Circulation. 2016;134:e123-155.
Huang CC, Liao PC. Heart attack causes head-ache - cardiac cephalalgia. Acta Cardiol Sin. 2016;32:239–42.
Di Stefano R, Di Bello V, Barsotti MC, Grigoratos C, Armani C, Dell’Omodarme M, Carpi A, Balbarini A. Inflammatory markers and cardiac function in acute coronary syndrome: difference in ST-segment elevation myocardial infarction (STEMI) and in non-STEMI models. Biomed Pharmacother. 2009;63:773–80.
Wei EY, Hira RS, Huang HD, Wilson JM, Elayda MA, Sherron SR, Birnbaum Y. Pitfalls in diagnosing ST elevation among patients with acute myocardial infarction. J Electrocardiol. 2013;46:653–9.
Moy E, Barrett M, Coffey R, Hines AL, Newman-Toker DE. Missed diagnoses of acute myocardial infarction in the emergency department: variation by patient and facility characteristics. Diagnosis (Berl). 2015;2:29–40.
Peet C, Ivetic A, Bromage DI, Shah AM. Cardiac monocytes and macrophages after myocardial infarction. Cardiovasc Res. 2020;116:1101–12.
Heidt T, Courties G, Dutta P, Sager HB, Sebas M, Iwamoto Y, Sun Y, Da Silva N, Panizzi P, van der Laan AM, et al. Differential contribution of monocytes to heart macrophages in steady-state and after myocardial infarction. Circ Res. 2014;115:284–95.
Jiang K, Tu Z, Chen K, Xu Y, Chen F, Xu S, Shi T, Qian J, Shen L, Hwa J, et al. Gasdermin D inhibition confers antineutrophil-mediated cardioprotection in acute myocardial infarction. J Clin Invest. 2022;132: e151268.
Liang Y, Chen H, Wang P. Correlation of leukocyte and coronary lesion severity of acute myocardial infarction. Angiology. 2018;69:591–9.
Kazimierczyk E, Eljaszewicz A, Zembko P, Tarasiuk E, Rusak M, Kulczynska-Przybik A, Lukaszewicz-Zajac M, Kaminski K, Mroczko B, Szmitkowski M, et al. The relationships among monocyte subsets, miRNAs and inflammatory cytokines in patients with acute myocardial infarction. Pharmacol Rep. 2019;71:73–81.
Sreejit G, Abdel-Latif A, Athmanathan B, Annabathula R, Dhyani A, Noothi SK, Quaife-Ryan GA, Al-Sharea A, Pernes G, Dragoljevic D, et al. Neutrophil-derived S100A8/A9 amplify granulopoiesis after myocardial infarction. Circulation. 2020;141:1080–94.
Long R, Gao L, Li Y, Li G, Qin P, Wei Z, Li D, Qian C, Li J, Yang G. M2 macrophage-derived exosomes carry miR-1271-5p to alleviate cardiac injury in acute myocardial infarction through down-regulating SOX6. Mol Immunol. 2021;136:26–35.
Li R, Jin J, Liu E, Zhang J. A novel circulating biomarker lnc-MALAT1 for acute myocardial infarction: its relationship with disease risk, features, cytokines, and major adverse cardiovascular events. J Clin Lab Anal. 2022;36: e24771.
GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1789–858.
Lu Y, Wu Q, Liao J, Zhang S, Lu K, Yang S, Wu Y, Dong Q, Yuan J, Zhao N, Du Y. Identification of the distinctive role of DPT in dilated cardiomyopathy: a study based on bulk and single-cell transcriptomic analysis. Ann Transl Med. 2021;9:1401.
Di Z, Di M, Fu W, Tang Q, Liu Y, Lei P, Gu X, Liu T, Sun M. Integrated analysis identifies a nine-microrna signature biomarker for diagnosis and prognosis in colorectal cancer. Front Genet. 2020;11:192.
Wang K, Zhang L, Li L, Wang Y, Zhong X, Hou C, Zhang Y, Sun C, Zhou Q, Wang X. Identification of Drug-Induced Liver Injury Biomarkers from Multiple Microarrays Based on Machine Learning and Bioinformatics Analysis. Int J Mol Sci. 2022;23:11945.
Zhang L, Lin Y, Wang K, Han L, Zhang X, Gao X, Li Z, Zhang H, Zhou J, Yu H, Fu X. Multiple-model machine learning identifies potential functional genes in dilated cardiomyopathy. Front Cardiovasc Med. 2022;9:1044443.
Zhang L, Mao R, Lau CT, Chung WC, Chan JCP, Liang F, Zhao C, Zhang X, Bian Z. Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods. Sci Rep. 2022;12:9962.
DeLong JH, Ohashi SN, O’Connor KC, Sansing LH. Inflammatory Responses After Ischemic Stroke. Semin Immunopathol. 2022;44:625–48.
Denorme F, Portier I, Rustad JL, Cody MJ, de Araujo CV, Hoki C, Alexander MD, Grandhi R, Dyer MR, Neal MD, et al: Neutrophil extracellular traps regulate ischemic stroke brain injury. J Clin Invest 2022, 132.
Swirski FK, Nahrendorf M. Leukocyte behavior in atherosclerosis, myocardial infarction, and heart failure. Science. 2013;339:161–6.
Hiremath A, Shiradkar R, Fu P, Mahran A, Rastinehad AR, Tewari A, Tirumani SH, Purysko A, Ponsky L, Madabhushi A. An integrated nomogram combining deep learning, Prostate Imaging-Reporting and Data System (PI-RADS) scoring, and clinical variables for identification of clinically significant prostate cancer on biparametric MRI: a retrospective multicentre study. Lancet Digit Health. 2021;3:e445–54.
Wang Y, Guan Q, Lao I, Wang L, Wu Y, Li D, Ji Q, Wang Y, Zhu Y, Lu H, Xiang J. Using deep convolutional neural networks for multi-classification of thyroid tumor by histopathology: a large-scale pilot study. Ann Transl Med. 2019;7:468.
Zhu X, Yin T, Zhang T, Zhu Q, Lu X, Wang L, Liao S, Yao W, Zhou Y, Zhang H, Li X. Identification of immune-related genes in patients with acute myocardial infarction using machine learning methods. J Inflamm Res. 2022;15:3305–21.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 2005;21:171–8.
Jaatinen T, Laine J: Isolation of mononuclear cells from human cord blood by Ficoll-Paque density gradient. Curr Protoc Stem Cell Biol 2007, Chapter 2:Unit 2A.1.
Tan YS, Lei YL. Isolation of tumor-infiltrating lymphocytes by ficoll-paque density gradient centrifugation. Methods Mol Biol. 2019;1960:93–9.
Sugiyama Y, Yamazaki K, Kusaka-Kikushima A, Nakahigashi K, Hagiwara H, Miyachi Y. Analysis of aquaporin 9 expression in human epidermis and cultured keratinocytes. FEBS Open Bio. 2014;4:611–6.
Chen J, Yu L, Zhang S, Chen X. Network analysis-based approach for exploring the potential diagnostic biomarkers of acute myocardial infarction. Front Physiol. 2016;7:615.
Shao G. Integrated RNA gene expression analysis identified potential immune-related biomarkers and RNA regulatory pathways of acute myocardial infarction. PLoS ONE. 2022;17: e0264362.
Yang Y, Liu P, Teng R, Liu F, Zhang C, Lu X, Ding Y. Integrative bioinformatics analysis of potential therapeutic targets and immune infiltration characteristics in dilated cardiomyopathy. Ann Transl Med. 2022;10:348.
Huang X, Yu X, Li H, Han L, Yang X. Regulation mechanism of aquaporin 9 gene on inflammatory response and cardiac function in rats with myocardial infarction through extracellular signal-regulated kinase1/2 pathway. Heart Vessels. 2019;34:2041–51.
Takeuchi K, Hayashi S, Matumoto T, Hashimoto S, Takayama K, Chinzei N, Kihara S, Haneda M, Kirizuki S, Kuroda Y, et al. Downregulation of aquaporin 9 decreases catabolic factor expression through nuclear factor-κB signaling in chondrocytes. Int J Mol Med. 2018;42:1548–58.
Matsushima A, Ogura H, Koh T, Shimazu T, Sugimoto H. Enhanced expression of aquaporin 9 in activated polymorphonuclear leukocytes in patients with systemic inflammatory response syndrome. Shock. 2014;42:322–6.
Baker BJ, Akhtar LN, Benveniste EN. SOCS1 and SOCS3 in the control of CNS immunity. Trends Immunol. 2009;30:392–400.
Negoro S, Kunisada K, Fujio Y, Funamoto M, Darville MI, Eizirik DL, Osugi T, Izumi M, Oshima Y, Nakaoka Y, et al. Activation of signal transducer and activator of transcription 3 protects cardiomyocytes from hypoxia/reoxygenation-induced oxidative stress through the upregulation of manganese superoxide dismutase. Circulation. 2001;104:979–81.
Yoshimura A, Naka T, Kubo M. SOCS proteins, cytokine signalling and immune regulation. Nat Rev Immunol. 2007;7:454–65.
Chen Z, Laurence A, Kanno Y, Pacher-Zavisin M, Zhu BM, Tato C, Yoshimura A, Hennighausen L, O’Shea JJ. Selective regulatory function of Socs3 in the formation of IL-17-secreting T cells. Proc Natl Acad Sci U S A. 2006;103:8137–42.
Scarabelli TM, Stephanou A, Pasini E, Comini L, Raddino R, Knight RA, Latchman DS. Different signaling pathways induce apoptosis in endothelial cells and cardiac myocytes during ischemia/reperfusion injury. Circ Res. 2002;90:745–8.
Le Provost F, Miyoshi K, Vilotte JL, Bierie B, Robinson GW, Hennighausen L. SOCS3 promotes apoptosis of mammary differentiated cells. Biochem Biophys Res Commun. 2005;338:1696–701.
Chhabra JK, Chattopadhyay B, Paul BN. SOCS3 dictates the transition of divergent time-phased events in granulocyte TNF-α signaling. Cell Mol Immunol. 2014;11:105–6.
Dutta P, Nahrendorf M. Monocytes in myocardial infarction. Arterioscler Thromb Vasc Biol. 2015;35:1066–70.
Lin G, Dai C, Xu K, Wu M. Predictive value of neutrophil to lymphocyte ratio and red cell distribution width on death for ST segment elevation myocardial infarction. Sci Rep. 2021;11:11506.
Sasmita BR, Zhu Y, Gan H, Hu X, Xue Y, Xiang Z, Huang B, Luo S. Prognostic value of neutrophil-lymphocyte ratio in cardiogenic shock complicating acute myocardial infarction: a cohort study. Int J Clin Pract. 2021;75: e14655.
Kervinen H, Mänttäri M, Kaartinen M, Mäkynen H, Palosuo T, Pulkki K, Kovanen PT. Prognostic usefulness of plasma monocyte/macrophage and T-lymphocyte activation markers in patients with acute coronary syndromes. Am J Cardiol. 2004;94:993–6.
Yan X, Anzai A, Katsumata Y, Matsuhashi T, Ito K, Endo J, Yamamoto T, Takeshima A, Shinmura K, Shen W, et al. Temporal dynamics of cardiac immune cell accumulation following acute myocardial infarction. J Mol Cell Cardiol. 2013;62:24–35.
Ortega-Rodríguez AC, Marín-Jáuregui LS, Martínez-Shio E, Hernández Castro B, González-Amaro R, Escobedo-Uribe CD, Monsiváis-Urenda AE. Altered NK cell receptor repertoire and function of natural killer cells in patients with acute myocardial infarction: a three-month follow-up study. Immunobiology. 2020;225: 151909.
Leuschner F, Rauch PJ, Ueno T, Gorbatov R, Marinelli B, Lee WW, Dutta P, Wei Y, Robbins C, Iwamoto Y, et al. Rapid monocyte kinetics in acute myocardial infarction are sustained by extramedullary monocytopoiesis. J Exp Med. 2012;209:123–37.
Nahrendorf M. Myeloid cell contributions to cardiovascular health and disease. Nat Med. 2018;24:711–20.
Swirski FK, Nahrendorf M. Cardioimmunology: the immune system in cardiac homeostasis and disease. Nat Rev Immunol. 2018;18:733–44.
Yellon DM, Hausenloy DJ. Myocardial reperfusion injury. N Engl J Med. 2007;357:1121–35.
Ridker PM, Everett BM, Thuren T, MacFadyen JG, Chang WH, Ballantyne C, Fonseca F, Nicolau J, Koenig W, Anker SD, et al. Antiinflammatory therapy with canakinumab for atherosclerotic disease. N Engl J Med. 2017;377:1119–31.
Moniaga CS, Watanabe S, Honda T, Nielsen S, Hara-Chikuma M. Aquaporin-9-expressing neutrophils are required for the establishment of contact hypersensitivity. Sci Rep. 2015;5:15319.
Wang H, Dou S, Wang C, Gao W, Cheng B, Yan F. Identification and experimental validation of Parkinson’s disease with major depressive disorder common genes. Mol Neurobiol. 2023;60:6092–108.
Radulescu E, Jaffe AE, Straub RE, Chen Q, Shin JH, Hyde TM, Kleinman JE, Weinberger DR. Identification and prioritization of gene sets associated with schizophrenia risk by co-expression network analysis in human brain. Mol Psychiatry. 2020;25:791–804.
Choi BY, Bair E, Lee JW. Nearest shrunken centroids via alternative genewise shrinkages. PLoS ONE. 2017;12: e0171068.
Dreiseitl S, Ohno-Machado L, Kittler H, Vinterbo S, Billhardt H, Binder M. A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J Biomed Inform. 2001;34:28–36.
Chiew CJ, Liu N, Wong TH, Sim YE, Abdullah HR. Utilizing machine learning methods for preoperative prediction of postsurgical mortality and intensive care unit admission. Ann Surg. 2020;272:1133–9.
Regazzoni F, Chapelle D, Moireau P. Combining data assimilation and machine learning to build data-driven models for unknown long time dynamics-Applications in cardiovascular modeling. Int J Numer Method Biomed Eng. 2021;37: e3471.
Peng JC, Ran ZH, Shen J. Seasonal variation in onset and relapse of IBD and a model to predict the frequency of onset, relapse, and severity of IBD based on artificial neural network. Int J Colorectal Dis. 2015;30:1267–73.
Kalkan IH, Dağli U, Oztaş E, Tunç B, Ulker A. Comparison of demographic and clinical characteristics of patients with early vs. adult vs. late onset ulcerative colitis. Eur J Intern Med. 2013;24:273–7.
Jiang L, Xia B, Li J, Ye M, Deng C, Ding Y, Luo H, Ren H, Hou X, Liu H, et al. Risk factors for ulcerative colitis in a Chinese population: an age-matched and sex-matched case-control study. J Clin Gastroenterol. 2007;41:280–4.
Adler ED, Voors AA, Klein L, Macheret F, Braun OO, Urey MA, Zhu W, Sama I, Tadel M, Campagnari C, et al. Improving risk prediction in heart failure using machine learning. Eur J Heart Fail. 2020;22:139–47.
We thank Suzhou Science & Technology Development Plan.
The research was funded by Suzhou Science & Technology Development Plan (SYSD2019222). Zhangjiagang science and technology plan project (ZKS2135), Youth science and technology project of Zhangjiagang Municipal Health Commission (ZJGQNKJ202211).
Ethics approval and consent to participate
The clinical trial part was approved by the Ethics Review Committee of Jinghai District Hospital (No. JHYYLL-2022-0307).
Consent for publication
This study has not been published before, and this publication has been approved by all authors.
The authors declare that they have no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The 39 DEGs in healthy controls and AMI patients. Table S2. GSEA enrichment of 45 terms. Table S3. GO enrichment of 160 terms. Table S4. DO enrichment of 41 terms. Table S5. Primary weight of DEGs in the six classification ML algorithms. Fig. S1. Difference between the 22 immune cells. Fig. S2. Correlation analysis of SOCS3 in 7 immune cell types. Fig. S3. Correlation analysis of AQP9 in 7 immune cell types. Fig. S4. The Correlation analysis of 9 clinical variables.
About this article
Cite this article
Zhang, L., Liu, Y., Wang, K. et al. Integration of machine learning to identify diagnostic genes in leukocytes for acute myocardial infarction patients. J Transl Med 21, 761 (2023). https://doi.org/10.1186/s12967-023-04573-x