The appropriate division of data in training of computers to predict physicians’ decision on blood transfusions: a reply to Dr. Sander de Bruyne

Yao, Yuanyuan; Cifuentes, Jenny; Zheng, Bin; Yan, Min

doi:10.1186/s12967-021-02929-9

Letter to the Editor
Open access
Published: 27 June 2021

The appropriate division of data in training of computers to predict physicians’ decision on blood transfusions: a reply to Dr. Sander de Bruyne

Yuanyuan Yao¹,
Jenny Cifuentes²,
Bin Zheng³ &
…
Min Yan¹

Journal of Translational Medicine volume 19, Article number: 275 (2021) Cite this article

753 Accesses
Metrics details

The Original Article was published on 28 April 2021

Dr. Francesco Marincola

Editor-in-Chief

Journal of Translational Medicine

We have read the letter to the editor, written by Dr. Sander de Bruyne about our paper entitled “Computer algorithm can match physicians’ decisions about blood transfusions” [1]. In this study, as mentioned in the letter [2], we used a multilayer perceptron neural network to predict the appropriateness of intra-operative blood transfusion cases. In this preliminary report, the deep learning algorithm yielded a promising accuracy of 96.8% in a dataset of 4946 patients. Expert anesthesiologists classified 3604 cases as appropriate and 1342 as inappropriate in this dataset. This was completed based on the World Health Organization’s (WHO) guidelines.

In his letter, Dr. Bruyne mentioned a well-known adequate practice to prevent the computer algorithm from overfitting and to accurately evaluate machine learning strategies, which is the separation of the sets of training and validation/test. The danger of not dividing the dataset in the training process is that the model may learn an overly specific function that performs well on the training data, but is less effective in generalizing to data outside training. In lieu of this concern, Dr. Bruyne suggested that not splitting data was a problem in this study. Reading through Python scripts, it seemed to Dr. Bruyne that the model was trained and validated on the same data entries. However, importantly, this was not true. As it can be seen in the supplementary material, the files associated with the training and testing have different names. The data were divided and processed before the neural network implementation and consequently saved in different files. In the study published in the Journal of Translational Medicine, no further description about data division was included. This is because the work was focused on providing an exploratory analysis on clinical data, designed for a wider healthcare audience; it also focused on implementing a general, machine learning based classifier to demonstrate how this algorithm could help physicians to make decisions.

We have provided an in-depth analysis on the machine learning strategies, and their implementation details in another paper [3]. In our second paper, the training and validation datasets were split in 70% and 30% of the total data, similar to our first paper [1, 3]. The second paper was focused on a computer science audience; an analysis of the optimal hyper-parameters for different classifiers (Random Forest, Support Vector Machine, MultiLayer Perceptron Neural Network and Decision Tree Classifier) was included [3]. In addition, the training and cross-validation scores were provided and analyzed in the hyper-parameter setting procedure to avoid any overfitting or bias in the evaluation results [3]. We encourage readers to view this latest study if details regarding the deep learning implementation are required.

Availability of data and materials

Not applicable.

References

Yao Y, Cifuentes J, Zheng B, Yan M. Computer algorithm can match physicians’ decisions about blood transfusions. J Transl Med. 2019;17:340.
Article Google Scholar
De Bruyne S. Comment on “Computer algorithm can match physicians’ decisions about blood transfusions”. J Transl Med. 2021;19:175.
Article Google Scholar
Cifuentes J, Yao Y, Yan M, Zheng B. Blood transfusion prediction using restricted Boltzmann machines. Comput Methods Biomech Biomed Eng. 2020;23:510–7.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

No funding was received.

Author information

Authors and Affiliations

Department of Anesthesiology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310009, China
Yuanyuan Yao & Min Yan
Program of Electrical Engineering, Universidad De La Salle, Bogotá, Colombia
Jenny Cifuentes
Department of Surgery, University of Alberta, Edmonton, Canada
Bin Zheng

Authors

Yuanyuan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Cifuentes
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Min Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have seen the manuscript and approved to submit to your journal. MY, YY and BZ designed the study, wrote the first article and this letter; JC helped develop a custom-designed computer algorithm. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Min Yan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Yao, Y., Cifuentes, J., Zheng, B. et al. The appropriate division of data in training of computers to predict physicians’ decision on blood transfusions: a reply to Dr. Sander de Bruyne. J Transl Med 19, 275 (2021). https://doi.org/10.1186/s12967-021-02929-9

Download citation

Received: 26 May 2021
Accepted: 05 June 2021
Published: 27 June 2021
DOI: https://doi.org/10.1186/s12967-021-02929-9

The appropriate division of data in training of computers to predict physicians’ decision on blood transfusions: a reply to Dr. Sander de Bruyne

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Journal of Translational Medicine

Contact us