Mathematical models of amino acid panel for assisting diagnosis of children acute leukemia
Journal of Translational Medicinevolume 17, Article number: 38 (2019)
The altered concentrations of amino acids were found in the bone marrow or blood of leukemia patients. Metabolomics technology combining mathematical model of biomarkers could be used for assisting the diagnosis of pediatric acute leukemia (AL).
The concentrations of 17 amino acids was measured by targeted liquid chromatograph–tandem mass spectrometry in periphery blood collected using dried blood spots. After evaluation, the mathematical models were further evaluated by prospective clinical validation cohort for AL diagnosis.
The concentrations of 13 in 17 amino acids were statistically different between the periphery blood dried serum dots measured by targeted LC–MS/MS. The receiver operating characteristic analysis for the models of amino acid panel showed that the area under curve for AL diagnosis were 0.848, 0.834 and 0.856 by SVM, RF and XGBoost. The Kappa values in further prospectively evaluated clinical cohort were 0.697, 0.703 and 0.789 (p > 0.05) respectively, and the accuracies for the models were 84.86%, 85.20% and 89.46% respectively with further clinical validation.
The established mathematical model is a faster, cheaper and more convenient way than conventional methods, and no significant difference on the effect of diagnosis comparing with conventional methods. The mathematical model can be clinically useful for assisting pediatric AL diagnosis.
Acute leukemia (AL) is the most common cancer in children under 15 years of age, divided into acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML), which ALL accounts for 60–70% and AML for 30–40% . The diagnosis of AL is dependent on the multiple laboratory tests, which require the combination of assays of morphological, immunological, cytogenetic and molecular (MICM) inputs . The current procedure (MICM assays) of using bone marrow cells from AL patients is painful and inconvenient for children . The immunological tests rely on flow cytometry, while the molecular tests, such as, reverse transcription polymerase reaction (RT-PCR) and high throughput sequencing are used to measure fusion genes and key mutations of the driven genes. All the tests are instrument-dependent and the proper interpretation of results is required. There are increasing interests in discovering the new sensitive and specific biomarkers in the peripheral blood (PB) as an easy way to assist AL diagnosis.
The connection between nutrient metabolites and cancers has been reported extensively . The metabolic environment is essential for cancer cell growth  and the metabolomics analysis of samples from cancer patients, including leukemia, enables the identification of novel specific biomarkers . Although most scientists focused on determining the relationship between glucose metabolism and different cancers , the occurrence and development of leukemia has been shown to be closely related to amino acid metabolism that affects the protein synthesis. For example, proline disturbs several key metabolic pathways to promote the disease progress and affects the treatment of leukemia . Besides, others’ report have proven that the amino acids were related with cell proliferation, apoptosis or drug treatment of different cancers [9,10,11,12,13,14,15]. Therefore, in this study, we aimed to determine whether the alterations of amino acid concentrations could be useful for the diagnosis of AL.
For measuring multiple amino acids simultaneously, the targeted liquid chromatograph–tandem mass spectrometry (LC–MS/MS), which is widely used in studying the metabolism of cancer and other diseases , was used based on its sensitivity, repeatability and high-throughput . Moreover, the mathematical model of biomarkers, based on the alteration of multiple metabolites and analyzing the data by R programing, was reported to help diagnosis of breast cancer, and chronic graft-versus-host disease [18, 19]. It is feasible to establish the mathematical model of amino acid panel for AL diagnosis.
For establishing mathematical model of biomarkers, compared with R programing [18, 19], eXtreme Gradient Boosting (XGBoost), established by Chen, is proved to be higher accuracy and excellent generalization ability . The number of the document of XGBoost folked on Github was more than 20,000. As it spreaded more and more, XGBoost was used to predict positive urinary tract infections and chemical-induced respiratory toxicity [21, 22].
Here, we used targeted LC–MS/MS to measure the amino acid profiles of PB between AL children and their matched control. The mathematical models were established and optimized using XGBoost algorithm. We then evaluated the models in another clinical cohort to assess their sensitivities, specificities and accuracies, to prove the advantageous performance of our model for distinguishing between children with AL and children with non-malignant hematologic diseases, who had similar clinical symptoms.
Enrolled patients and matched controls
There were 520 newly diagnosed acute leukemia (AL) patients (ALL/AML = 358/162) recruited for this study and the inclusion criteria followed the AL diagnosis criteria in the 2016 edition of the World Health Organization (WHO) , and 592 children in their matched control group from April 2016 to March 2018. AL children, who were newly diagnosed and received normal diet (just avoiding high protein diet intake) 3 days before admission, were chosen in our study during the period. Children with missing clinical information related to MICM classification were not included in the study. The matched children controls were randomly chosen from patients with a non-malignant hematologic diseases, including anemia, infectious mononucleosis or thrombocytopenia, and received normal diet 3 days before admission in the same period and healthy children were chosen randomly from those who came to receive physical examination in the same period. Both matched healthy children (n = 220) and children with non-malignant hematologic diseases (n = 592) were used as controls to compare whether there was a difference among AL children, healthy children and children with non-malignant hematologic diseases. The sample size of controls were slightly larger than that of AL children (10–20% more) to ensure the data characteristics of control group were matched with that of leukemia group. The experimental design for this study was shown in Fig. 1. This project was approved by the institutional ethics board of the Children’s Hospital of Chongqing Medical University (CHCMU2015031). Informed consents were signed and obtained from the legal guardians of all patients.
Briefly, the French–American–British (FAB) classification standard for the morphological examination was used in this study . For the immunological flow cytometry tests, BM cells from AL patients were incubated with specific antibodies (BD Biosciences, USA; Additional file 1: Table S1) and measured by Canto II flow cytometer (BD Biosciences, USA). The cytogenetic features of bone marrow cells were detected with Giemsa staining and karyotyping, and the tests for fusion genes were performed according to the manufacturer’s instructions (Yuanqi Bio-Pharm., Shanghai, China). The regime guidelines for AL patients were based on the 2016 edition of the World Health Organization (WHO) .
Amino acid quantitation using targeted LC–MS/MS
Seventeen amino acids were quantified using LC–MS/MS (API 3200, Applied Biosystems) according to Turgeon’s report . To ensure the quality of each dried blood spot, when the sample was collected, the standards and quality control were also spotted on filter papers at the same time. All the internal standards were prepared to achieve a series of gradient concentrations standards and spotted on filter paper (Whatman ProteinSaver 903). The standards, quality control and the sample was placed in a clean area of our laboratory for 2 h (1 h in summer) to dry, after that, it would be saved in a zip-lock bag at 4 °C until the experiment (no more than 3 h). The standards and quality control products were synchronized with the specimen. For experiment operations, briefly, metabolites from a dried blood spots were extracted with methanol. Internal standards (Cambridge Isotope Lab, USA) were added and samples were then dried under flowing nitrogen. The samples were butylated with HCl (50 µl) in each well. After evaporation under nitrogen, the samples were re-constituted in 100 µl of 80% acetonitrile. The samples (20 µl) were injected at 2-min intervals into a flowing stream of 80% acetonitrile. A neutral loss scan was used (m/z 102) for amino acids with a mass range of m/z 140–280. For the quality control of LC–MS/MS, all the internal standards and quality control products are kept with records to avoid overdue, and the internal standards and quality control products for each the amino acid were purchased from Cambridge Isotope Lab, and synchronizedly dealed with the specimen, to get the data for drawing the Levey-Jennings curve . If the experiment was out of control, we perform it again. And if the deviation of the experiment was increased, it was adjusted according to quality control deviation. Because there were standard substance with isotope labelling for 17 amino acids and our targeted LC–MS/MS could only recognize isotope signals, we only detected 17 amino acids (shown in Table 1) in our study.
Mathematical models establishment and feature selection
The mathematical models of the profile of 17 amino acids in dried serum dots from AL patients and matched controls, were established by support vector machine (SVM) , random forest (RF)  and XGBoost subsequently . We only used the training set for the feature selection because it is critical for a model’s efficiency and performance. The concentrations of all the amino acids were normalized by zero-mean normalization. Considering the sample size we collected and avoiding overly complex model, any amino acid with Pearson correlation coefficient higher than 0.2 corresponding to the groups of children was chosen as a feature in the model. Simultaneously, if colinearity exhibit among different amino acids, we would choose only one amino acid, which had best Pearson correlation, as a feature.
To establish the best model, three classification algorithms (SVM, RF and XGBoost) were used and evaluated [20, 27, 28]. The classifiers were trained and evaluated by a tenfold cross-validation . The final performance of each model was evaluated based on the averaging performance. The model would be chosen based on the comprehensive consideration of sensitivity, specificity, accuracy and volatility among cross-validation.
Model development and validation
All clinical information and the altered concentration of amino acid panel determined by LC–MS/MS were analyzed using the Python-sklearn and SPSS. For models development, the patients of Group A (Fig. 1) were enrolled to establish models. The patients were randomly divided into training (80% samples) and validation (20% samples) sets. The models were trained using the training sets and subsequently used to predict a child with leukemia using the validation sets. The prediction accuracy was used to evaluate models by a tenfold cross-validation. To avoid over-fitting, learning_curve was introduced to evaluate whether algorithm was over-fitting at the statistical level firstly.
The models were used to predict the patients of Group B (Fig. 1) to evaluate the models whether they were over-fitting depending on the accuracy of each model on Group B. There were 280 children with AL and 308 children with non-malignant hematologic disease included in the assessment. The stability of the final model, which was defined as “the ratio of the accuracy of Out-Sample Test to that of In-Sample Test”, was used to assess the performance of the final model.
Analysis and statistics
The concentrations of amino acids in different groups were analyzed by one-way ANOVA. The efficacy of the models was further evaluated by McNemar’s test and ROC analysis. SPSS version 13.0 and Python version 3.6 were used, and the packages employed included “sklearn”, “seaborn”, “pandas”, “numpy” and “matplotlib”.
Patients and clinical characteristics
The experimental design for this study and the characteristics of a total of 1332 children were enrolled in this study, including 520 newly diagnosed AL patients (ALL/AML = 358/162), 592 children in their matched control group and 220 healthy children, were also given (Fig. 1 and Additional file 1: Table S2). The initial 240 AL children and 284 children with a non-malignant hematologic diseases were assigned to Group A, and the 220 healthy children were also chosen in the same period. After model establishment, another 280 AL children and 308 children with a non-malignant hematologic diseases were chosen and assigned to Group B. There were no significant differences in the patients’ gender ratio and ages between the groups of AL and the matched control, nor WBC account and the percentage of blast cells in peripheral blood (BIPB) in the AL group. All related data were collected for each patient and control, and evaluated based on the same procedure.
Feature selection and model selection
The concentrations of 17 amino acids in the serum from another 240 newly diagnosed AL patients (ALL/AML = 174/66), 284 matched control children and 220 healthy children were measured by targeted LC–MS/MS (Table 1). The levels of 13 amino acids (aspartic acid, glutamic acid, methionine, phenylalanine, tyrosine, leucine, tryptophane, valine, citrulline, glycine, ornithine, glutamine and serine) were statistically different among the AL children, controls and healthy children group, whereas other four amino acids (alanine, argnine, histidine and threonine), which didn’t show any statistical differences, were not enrolled in mathematical model.
The eight amino acids (aspartic acid, glutamic acid, phenyl alanine, tryptophan, glycine, valine, citrulline and ornithine) were chosen to be included in the model for clinical diagnosis as each Pearson correlation coefficient was higher than 0.2 (Fig. 2) and each was related with cell proliferation, apoptosis or drug treatment of different cancers [9,10,11,12,13,14,15].
The data of the eight amino acids were used to develop models based on the three classification algorithms (SVM, RF and XGBoost). Accuracy, sensitivity, specificity and area under the curve (AUC) of the three algorithms were shown in Table 2. Although XGBoost had the best sensitivity, accuracy and AUC, and its specificity was also better than RF, but each indicator of XGBoost was not better than SVM and RF. All the three algorithms should be optimized and evaluated further.
Parameter optimization in models for AL diagnosis and validation
To establish a better model for AL diagnosis, we focused on optimizing several key parameters. For SVM, the parameters included C, kernel, degree, gamma, coef0, max_iter; For RF, the parameters included n_estimators, max_depth, min_samples_split, min_samples_leaf, max_leaf_nodes; For XGBoost, the parameters included learnin_rate, n_estimators, max_depth, gamma, subsample, colsample_bytree and nthread. The optimized parameters were confirmed by performing tenfold cross validation on the training and validation data sets . The final models were also verified with ROC and AUC by cross-validation (Table 3). The mean AUC was 0.848 (95% CI 0.819 to 0.877) for SVM. The mean AUC was 0.834 (95% CI 0.811 to 0.857) for RF. The mean AUC was 0.856 (95% CI 0.809 to 0.923) for XGBoost.
Evaluation of amino acid panels for AL diagnosis
Before assess the accuracy of the models, all of them should be proved whether they were over-fitting by learning_curve (Fig. 3). It was obvious that the difference of errors between the testing samples and training samples in each model converged as the number of samples increased, which mean all the models we built were not over-fitting at the statistical level.
To further assess the accuracy of the models, they were evaluated according to the reported protocol . We further validated the models on Group B. There were 280 newly diagnosed AL patients (ALL/AML = 184/96) and 308 children in their matched control group, who were included in Group B (Table 1). There was no significant difference between the conventional methods and each model on AL diagnosis according to Table 4 (p > 0.05). The sensitivity, specificity, accuracy and AUC of the models were shown in Table 4. The sensitivity of SVM, RF and XGBoost for Out-Sample Test was 84.64%, 82.50% and 90.00% respectively. The specificity of SVM, RF and XGBoost for Out-Sample Test was 85.06%, 87.66% and 88.96% respectively. The accuracy of SVM, RF and XGBoost for Out-Sample Test was 84.86%, 85.20% and 89.46% respectively. The AUC of SVM, RF and XGBoost for Out-Sample Test were 0.797, 0.803 and 0.830 respectively. Comparing with the accuracies of these models for In-Sample Test (Table 3), the accuracies of SVM, RF and XGBoost for Out-Sample Test were all in 95% confidence interval. It was another evidence to prove that all of our models were not over-fitting. The sensitivity, specificity and accuracy of XGBoost were the best among the three models (Table 4). The generalization ability of each model, which was defined as “the accuracy of Out-Sample Test/the mean accuracy of In-Sample Test” in our study, was 0.945 (84.86%/89.84%), 0.945 (85.20%/90.12%), 0.979 (89.46%/91.35%) respectively. XGBoost model also had the best generalization ability.
Next, we compared the true positive and negative prediction performance on XGBoost model with morphological tests (Table 5). The performance of XGBoost was much better than that of morphological tests alone. Furthermore, if we combine morphological tests and XGBoost model to diagnose AL in clinical application, it would greatly reduce the false negative ratio of morphological tests and improve the diagnosis efficacy of XGBoost model.
The classical diagnosis of AL is usually based on the MICM information of patients’ bone marrow  and the relationship between amino acid profile and AL diagnosis has not been established previously. Here, we developed new strategies to diagnose AL by measuring concentrations of PB amino acids with LC–MS/MS and further data mining. Additionally, all the models for AL diagnosis were verified by tenfold cross validation and used to assist AL diagnosis.
As others’ report, SVM maps the input data into a high-dimensional feature space through some kernel functions and constructs an optimal separating hyperplane in this space , but it could require more computation time; RF is considered to be more accurate and robust than decision trees and the most important advantages of it is that it can handle a large number of features without overfitting, and can give an estimate of the importance of the features ; XGBoost is a new implementation of the gradient tree boosting technique and has been tested in a series of datasets, achieving high accuracy and requiring much less computation time than deep neural nets , so we chose these three algorithms as candidates. Because XGBoost algorithm used the second order Taylor expansion , it could get a more accurate result on predicting than normal gradient tree boosting algorithm and it has a better convergence effect than SVM and RF. In our study, all the three models were not overfitting and the generalization ability of each of them (more than 94% samples would be correctly predicted) deserved further clinical application. According to our data, there was no significant differences on accuracy and AUC among the three models after parameter optimization during training process, but the sensitivity, specificity and accuracy of XGBoost were better than SVM and RF (Table 4). XGBoost had the best generalization ability among them, which is the most important character of model, in the Out-Sample Test. Above all, we recommend XGBoost to be the auxiliary diagnostic model at present. Combining the three models but not limited to them to establish artificial neural network for the diagnosis of AL would be our next step.
According to Table 4, the sensitivity and specificity of XGBoost were more than 88.96% comparing with traditional protocol on AL diagnosis and there was no statistic significant difference between them (p > 0.05). Simultaneously, the new model we established does not aim to replace the conventional methods. The most important contribution of the strategy is that it could help doctors distinguish acute leukemia patients from others hematological diseases which may appear similar phenotype as leukemia in an easier way and faster, so that they can determine treatment plan in time, not waiting for days to make a decision. It would be helpful for doctors from the department of hematology to screen suspicious patients, especially for outpatient. Considering the accuracy of our model (88.96%), it is good enough to help doctors from the department of hematology as an auxiliary diagnostic method.
There were three advantages of our new model comparing with conventional assays. Firstly, for the time-consuming of assays, the conventional laboratory assays to diagnose AL including morphological tests, karyotype, flow cytometry and molecular detections . It usually needs at least 3 days to diagnose AL. Our new strategy based on LC–MS/MS and mathematical model, which only took 4–6 h to complete analysis; Secondly, for the expense, different kinds of antibodies and professional assay kits were needed for flow cytometry and molecular detections (The prices for antibodies and kits could refer to BD Biosciences and Yuanqi Bio-Pharm), it took approximate $250 for each child to complete the assays in China, however, the main expense of our new strategy is approximate $20 for each child in China; At last, for sample collection and operation, bone marrow should be collected to perform karyotype, flow cytometry and molecular detections for conventional laboratory assays, and karyotype would consume a lot of manual operation, but only PB sample should be collected for our model, which is much easier to collect and less painful, especially for children , and the main assay in our model, LC–MS/MS, is a automation technique requiring little manual operation. Based on the statement above, our strategy is faster, cheaper and more convenient way than conventional strategy (Table 6). As the combination results shown in Table 5, combining XGBoost model and morphological tests would gain a better predictive power. It was another evidence to prove that our model was absolutely related to AL, only the exact mechanism between the amino acid profile and AL had not been clarified.
We also tried to establish models to predict the prognosis of AL patients, but the result was unsatisfied with the following reasons. Firstly, the prognosis of AL patients was not only determined by risk classification, but also influenced by the status of compliance of medical treatment. Our model could not take the therapeutic status into account. Secondly, the prognosis of AL has improved to a long-term survival rate of 89% . Our results showed no significant difference because there were few ALL patients die during our observation stages.
We also attempted to establish a mathematical model of amino acid profile to separate ALL and AML. However, the model was not able to evaluate its actual performance. There were two main reasons that our model could not distinguish ALL and AML. Firstly, AML samples were dispersed because of the high heterogeneity of AML , resulting in few samples (< 25) in each subtype of AML (Additional file 1: Table S2); Secondly, there was a high abandon rate among AML patients with less clinical information. Based on the above reasons, the sample size of AML was not enough for establishing model. Moreover, we tried to investigate if there was a difference on amino acid concentration among various karyotyping or fusion gene groups in ALL, but there was no significant difference among them (Additional file 1: Tables S3 and S4). There was no significant difference among them, so we did not build model to analyze it through SVM, RF or XGBoost algorithm.
The new biomarkers using small molecule metabolites for diagnosis is a hot area for different cancers. For example, a biomarker panel including phenylacetic acid, l-fucose, caprylic acid, acetic acid, propionic acid and glycine achieved good performance with the sensitivity of 80% and specificity of 100% for predicting small cell lung cancer . A diagnosis panel containing circulating tumor cell number and lactate dehydrogenase level was found to be a surrogate for survival at the individual-patient level in metastatic castration-resistant prostate cancer . A series of metabolites, including d-mannose, palmitic acid, stearic acid, etc., which are present in the disease state, were identified as candidate biomarkers for B-ALL diagnosis, but no prediction model was used . To our best knowledge, there was no report focused on amino acid panel for the diagnosis of leukemia. Our study is the first attempt to establish a model to link amino acids profile and children acute leukemia.
This study mainly focused on the amino acids profile to establish the mathematical models for AL diagnosis. However, the underlying mechanism of amino acid metabolism in AL needs further investigation. According to the WHO guidelines for diagnosis and genotype of leukemia (2016 edition)  and previous reports , the molecular variation of patients is very important for predicting the prognosis of AL. It is necessary to get more information of AL patients by next-generation sequence, including whole genome sequencing, transcriptome sequencing, and RNA sequencing [36, 37], to create new cross-omics models, which integrate genomics and metabolomics to provide all the information of enzymes in the pathways related to leukemia.
In addition, combining metabolomics approach and data mining to establish prediction models has been demonstrated as a strategy potentially useful for diagnosis or prognosis in different diseases [38, 39]. Although we demonstrated the precise diagnosis of leukemia in this study using the same approach, the model will be more accurate and reliable if a larger sample size is used, especially multi-center study, to refine the models in the future.
In summary, based on the PB amino acids profile, we developed a mathematical model to diagnose children AL. There was no significant difference on the effect of children AL diagnosis between our new model and the traditional protocol. Simultaneously, the model is a faster, cheaper and more convenient way than conventional methods. It could benefit the clinical practice for children AL diagnosis and treatment.
acute lymphoblastic leukemia
acute myelocytic leukemia
morphological, immunological, cytogenetic and molecular
blast cells in peripheral blood
white blood cell
receiver operating characteristic
liquid chromatography mass spectrometry
dried blood spot
World Health Organization
China Children’s Leukemia Group
eXtreme Gradient Boosting
support vector machine
area under curve
Jemal A, Siegel R, Xu J, Ward E. Cancer statistics, 2010. CA Cancer J Clin. 2010;60:277–300.
Arber DA, Orazi A, Hasserjian R, Thiele J, Borowitz MJ, Le Beau MM, et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood. 2016;127:2391–405.
de Godoy NS, Andrino ML, de Souza RM, Gakiya E, Amato VS, Lindoso JÂ, et al. Could kDNA-PCR in peripheral blood replace the examination of bone marrow for the diagnosis of visceral leishmaniasis? J Parasitol Res. 2016. https://doi.org/10.1155/2016/1084353.
Wang YH, Israelsen WJ, Lee D, Yu VW, Jeanson NT, Clish CB, et al. Cell-state-specific metabolic dependency in hematopoiesis and leukemogenesis. Cell. 2014;158:1309–23.
Brown DG, Rao S, Weir TL, O’Malia J, Bazan M, Brown RJ, et al. Metabolomics and metabolic pathway networks from human colorectal cancers, adjacent mucosa, and stool. Cancer Metab. 2016;4:11.
Spratlin JL, Serkova NJ, Eckhardt SG. Clinical applications of metabolomics in oncology: a review. Clin Cancer Res. 2009;15:431–40.
Dunn WB, Lin W, Broadhurst D, Begley P, Brown M, Zelena E, et al. Molecular phenotyping of a UK population: defining the human serum metabolome. Metabolomics. 2015;11:9–26.
Loayza-Puch F, Rooijers K, Buil LC, Zijlstra J, Oude Vrielink JF, Lopes R, et al. Tumour-specific proline vulnerability uncovered by differential ribosome codon reading. Nature. 2016;530:490–4.
Kumar K, Kaur J, Walia S, Pathak T, Aggarwal D. l-Asparaginase: an effective agent in the treatment of acute lymphoblastic leukemia. Leuk Lymphoma. 2014;55:256–62.
Gu Y, Chen T, Fu S, Sun X, Wang L, Wang J, et al. Perioperative dynamics and significance of amino acid profiles in patients with cancer. J Transl Med. 2015;13:35.
Wiggins T, Kumar S, Markar SR, Antonowicz S, Hanna GB. Tyrosine, phenylalanine, and tryptophan in gastroesophageal malignancy: a systematic review. Cancer Epidemiol Biomarkers Prev. 2015;24:32–8.
Jain M, Nilsson R, Sharma S, Madhusudhan N, Kitami T, Souza AL, et al. Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation. Science. 2012;336:1040–4.
Song G, Shi L, Guo Y, Yu L, Wang L, Zhang X, et al. A novel PAD4/SOX4/PU.1 signaling pathway is involved in the committed differentiation of acute promyelocytic leukemia cells into granulocytic cells. Oncotarget. 2016;7:3144–57.
Gao M, Huang ZL, Tao K, Xiao Q, Wang X, Cao WX, et al. Depression of oncogenecity by dephosphorylating and degrading BCR-ABL. Oncotarget. 2017;8:3304–14.
Kwak EY, Shim WS, Chang JE, Chong S, Kim DD, Chung SJ, et al. Enhanced intracellular accumulation of a non-nucleoside anti-cancer agent via increased uptake of its valine ester prodrug through amino acid transporters. Xenobiotica. 2012;42:603–13.
Poulogiannis G. Deconstructing the metabolic networks of oncogenic signaling using targeted liquid chromatography-tandem mass spectrometry (LC–MS/MS). Methods Mol Biol. 2017;1636:405–14.
Hedman CJ, Wiebe DA, Dey S, Plath J, Kemnitz JW, Ziegler TE. Development of a sensitive LC/MS/MS method for vitamin D metabolites: 1,25Dihydroxyvitamin D2&3 measurement using a novel derivatization agent. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;953–954:62–7.
Yee J, Sadar MD, Sin DD, Kuzyk M, Xing L, Kondra J, et al. Connective tissue-activating peptide III: a novel blood biomarker for early lung cancer detection. J Clin Oncol. 2009;27:2787–92.
Yu J, Storer BE, Kushekhar K, Abu Zaid M, Zhang Q, Gafken PR, et al. Biomarker panel for chronic graft-versus-host disease. J Clin Oncol. 2016;34:2583–90.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. pp. 785–94.
Taylor RA, Moore CL, Cheung KH, Brandt C. Predicting urinary tract infections in the emergency department with machine learning. PLoS ONE. 2018;13:e0194085.
Zhang L, Ai H, Chen W, Yin Z, Hu H, Zhu J, et al. CarcinoPred-EL: novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci Rep. 2017;7:2118.
Campo E, Swerdlow SH, Harris NL, Pileri S, Stein H, Jaffe ES. The 2008 WHO classification of lymphoid neoplasms and beyond: evolving concepts and practical applications. Blood. 2011;117:5019–32.
Pleyer L, Burgstaller S, Stauder R, Girschikofsky M, Sill H, Schlick K, et al. Azacitidine front-line in 339 patients with myelodysplastic syndromes and acute myeloid leukaemia: comparison of French–American–British and World Health Organization classifications. J Hematol Oncol. 2016;9:39.
Turgeon C, Magera MJ, Allard P, Tortorelli S, Gavrilov D, Oglesbee D, et al. Combined newborn screening for succinylacetone, amino acids, and acylcarnitines in dried blood spots. Clin Chem. 2008;54:657–64.
Eckels J, Nathe C, Nelson EK, Shoemaker SG, Nostrand EV, Yates NL, et al. Quality control, analysis and secure sharing of Luminex® immunoassay data using the open source LabKey Server platform. BMC Bioinform. 2013;14:145.
Hajiloo M, Rabiee HR, Anooshahpour M. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays. BMC Bioinform. 2013;14(Suppl 13):S4.
Lin Z, Vicente Gonçalves CM, Dai L, Lu HM, Huang JH, Ji H, et al. Exploring metabolic syndrome serum profiling based on gas chromatography mass spectrometry and random forest models. Anal Chim Acta. 2014;827:22–7.
Mattocks CJ, Morris MA, Matthijs G, Swinnen E, Corveleyn A, Dequeker E, et al. A standardized framework for the validation and verification of clinical molecular genetic tests. Eur J Hum Genet. 2010;18:1276–88.
Hulleman E, Kazemier KM, Holleman A, VanderWeele DJ, Rudin CM, Broekhuis MJ, et al. Inhibition of glycolysis modulates prednisolone resistance in acute lymphoblastic leukemia cells. Blood. 2009;113:2014–21.
Li S, Garrett-Bakelman FE, Chung SS, Sanders MA, Hricik T, Rapaport F, et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat Med. 2016;22:792–9.
O’Shea K, Cameron SJ, Lewis KE, Lu C, Mur LA. Metabolomic-based biomarker discovery for non-invasive lung cancer screening: a case study. Biochim Biophys Acta. 2016;1860(11 Pt B):2682–7.
Scher HI, Heller G, Molina A, Attard G, Danila DC, Jia X, et al. Circulating tumor cell biomarker panel as an individual-level surrogate for survival in metastatic castration-resistant prostate cancer. J Clin Oncol. 2015;33:1348–55.
Musharraf SG, Siddiqui AJ, Shamsi T, Naz A. SERUM metabolomics of acute lymphoblastic leukaemia and myeloid leukaemia for probing biomarker molecules. Hematol Oncol. 2017;35:769–77.
Haferlach T, Kohlmann A, Wieczorek L, Basso G, Kronnie GT, Béné MC, et al. Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. J Clin Oncol. 2010;28:2529–37.
Lindqvist CM, Nordlund J, Ekman D, Johansson A, Moghadam BT, Raine A, et al. The mutational landscape in pediatric acute lymphoblastic leukemia deciphered by whole genome sequencing. Hum Mutat. 2015;36:118–28.
Suzuki K, Okuno Y, Kawashima N, Muramatsu H, Okuno T, Wang X, et al. MEF2D-BCL9 fusion gene is associated with high-risk acute B-cell precursor lymphoblastic leukemia in adolescents. J Clin Oncol. 2016;34:3451–9.
Carter TC, Rein D, Padberg I, Peter E, Rennefahrt U, David DE, et al. Validation of a metabolite panel for early diagnosis of type 2 diabetes. Metabolism. 2016;65:1399–408.
Bro R, Kamstrup-Nielsen MH, Engelsen SB, Savorani F, Rasmussen MA, Hansen L, et al. Forecasting individual breast cancer risk using plasma metabolomics and biocontours. Metabolomics. 2015;11:1376–80.
LZou obtained fundings for this research. LZhou and LZou made the Study concept and design. ZL, XH, and TL collected and analyze data. ZL and RWB completed the statistical analysis. ZL, TZ, LZhou and LZou drafted the manuscript. SL, PZ, KW, LZhang, HL, JY and LC were the one who gave administrative, technical or material support. All authors read and approved the final manuscript.
We thank all patients and guardians involved in the study.
The authors declare that they have no competing interests.
Availability of data and materials
Data sharing is applicable to this article.
Consent for publication
Ethics approval and consent to participant
The project was approved by the institutional ethics board of the Children’s Hospital of Chongqing Medical University (CHCMU2015031). Informed consent was obtained from the legal guardians of all patients.
This work is partially financial supported by National Natural Scientific Foundation of China (81373444, 81570142), and the Ministry of Science and Technology of the People’s Republic of China (2016YFA0101300).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.