Exploring the performance of ChatGPT on acute pancreatitis-related questions

Du, Ren-Chun; Liu, Xing; Lai, Yong-Kang; Hu, Yu-Xin; Deng, Hao; Zhou, Hui-Qiao; Lu, Nong-Hua; Zhu, Yin; Hu, Yi

doi:10.1186/s12967-024-05302-8

Letter to the Editor
Open access
Published: 01 June 2024

Exploring the performance of ChatGPT on acute pancreatitis-related questions

Ren-Chun Du¹^na1,
Xing Liu¹^na1,
Yong-Kang Lai^1,2,3^na1,
Yu-Xin Hu¹,
Hao Deng⁴,
Hui-Qiao Zhou⁵,
Nong-Hua Lu¹,
Yin Zhu¹^na2 &
…
Yi Hu ORCID: orcid.org/0000-0003-2708-7612^1,6^na2

Journal of Translational Medicine volume 22, Article number: 527 (2024) Cite this article

488 Accesses
Metrics details

Letter to the editor:

Acute pancreatitis (AP) is a serious gastrointestinal disease with an incidence rate of approximately 34 cases per 100,000 individuals annually, the overall burden of AP remains high with the aging population [1]. There is a notable trend among the public to acknowledge AP-related information to improve awareness.

Artificial intelligence (AI) is a large language model providing updated and useful information. The Chat Generative Pre-trained Transformer (ChatGPT, https://openai.com), developed by OpenAI and launched on November 30, 2022, stands out in this field. Various studies have explored its utility in responding to medical questions. This study aims to evaluate and compare the capabilities of ChatGPT-3.5 and ChatGPT-4.0 in answering test questions about AP, employing both subjective and objective metrics.

Methods

As shown in Table S1, we conducted our study using 18 subjective test questions derived from the Atlanta AP classification consensus and the American Gastroenterological Association (AGA) guidelines (Strength of recommendation: Strong) [2,3,4]. Additionally, we selected 73 objective questions with the highest number of tested times from the Chinese professional physician test database, categorizing them into four subfields (Table S2). These questions were submitted to ChatGPT in two separate sessions on February 1, 2024, and February 8, 2024, respectively. Two independent reviewers evaluated the subjective questions using a 5-point Likert Scale. Any discordance was resolved by the third author. The flowchart of overall study design is presented in Figure S1. The response accuracy was analyzed using the Chi-squared and Mann–Whitney U tests, with a P-value of < 0.05 indicating statistical significance.

Results

As shown in Table 1, ChatGPT-3.5 correctly answered 80% of subjective questions, while ChatGPT-4.0 achieved an accuracy rate of 94%. For objective questions, ChatGPT-4.0 outperformed ChatGPT-3.5 with a 78.1% accuracy rate compared to 68.5% (P = 0.01) (Figure S2A). Across all questions tested in the study, the concordance rate between ChatGPT-3.5 and ChatGPT-4.0 was 80.8% and 83.6% (Figure S2B), respectively, with the mean number of words per response being 218.5 for ChatGPT-3.5 and 246.0 for ChatGPT-4.0 (Table 1). Notably, correct answers showed higher concordance rates than incorrect ones across both versions of ChatGPT (95.7% and 91.1% vs. 55.6% and 58.8%) (Table 2). Notably, both ChatGPT-3.5 and ChatGPT-4.0 demonstrated high accuracy rates, particularly in the etiology category.

Table 1 Quality indicators (scientific adequacy) for answers from ChatGPT version 3.5 and 4.0

Full size table

Table 2 Performance of ChatGPT 3.5, ChatGPT 4.0 and medical college examinees on acute pancreatitis test questions and by different subfields

Full size table

Discussion

Our findings indicate that ChatGPT-4.0 outperformed ChatGPT-3.5 in answering both subjective and objective test questions related to AP, demonstrating a superior total accuracy. The accuracy of both ChatGPT-3.5 and the examinees in responding to clinical feature test questions was generally low, which suggests that clinical features associated with AP are complex, often involving numerous complications, which makes identifying the optimal solution challenging.

In addressing subjective questions, ChatGPT tends to provide a range of answers, mixing relevant with irrelevant information, making it challenging to discern the most accurate answer for healthcare professionals and patients. This discrepancy highlights the lower accuracy rate for objective choice questions compared to subjective ones. However, ChatGPT-4.0 showed improvements in providing more precise, concise, and focused answers.

Although ChatGPT answered most subjective questions correctly, the standard answers were conducted based on early guideline evidence. A significant limitation of artificial intelligence is its inability to update information in real-time. Recent randomized controlled trials focusing on AP have presented evidence that questions existing management strategies, such as the use of antibiotics, fluid resuscitation, the handling of infected necrosis, and the early application of ERCP [5]. It is imperative to reevaluate the current management guidelines to ensure they reflect the latest evidence.

This study has several limitations. Firstly, although we conducted two separate evaluations, the results might be influenced by the timing of the assessments of ChatGPT. Secondly, we did not incorporate patient perspectives, which are crucial as they are the ultimate recipients of AP-related information. Thirdly, the study participants were medical students, and we lacked data from practicing doctors.

In conclusion, ChatGPT-4.0 exhibited superior performance compared to ChatGPT-3.5. However, both versions of ChatGPT tended to provide broad and generalized answers across various topics and aspects, rather than offering optimal solutions. Therefore, ChatGPT excels at addressing subjective questions and offering a wide range of options, but it is not suitable for providing optimal management strategies, and cannot adjust treatment plans based on the latest evidence, where enhancements in training are required.

Availability of data and materials

Not applicable.

References

Petrov MS, Yadav D. Global epidemiology and holistic prevention of pancreatitis. Nat Rev Gastroenterol Hepatol. 2019;16(3):175–84.
Article PubMed PubMed Central Google Scholar
Banks PA, et al. Classification of acute pancreatitis–2012: revision of the Atlanta classification and definitions by international consensus. Gut. 2013;62(1):102–11.
Article PubMed Google Scholar
Baron TH, et al. American gastroenterological association clinical practice update: management of pancreatic necrosis. Gastroenterology. 2020;158(1):67-75.e1.
Article CAS PubMed Google Scholar
Crockett SD, et al. American gastroenterological association institute guideline on initial management of acute pancreatitis. Gastroenterology. 2018;154(4):1096–101.
Article PubMed Google Scholar
de Madaria E, Buxbaum JL. Advances in the management of acute pancreatitis. Nat Rev Gastroenterol Hepatol. 2023;20(11):691–2.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by the National Natural Science Foundation of China (NO. 82000531, 82360118 and 82170580); the Project for Academic and Technical Leaders of Major Disciplines in Jiangxi Province (NO. 20212BCJL23065); the Key Research and Development Program of Jiangxi Province (NO. 20212BBG73018).

Author information

Ren-Chun Du, Xing Liu and Yong-Kang Lai contributed equally to this work and shared first authorship.
Yin Zhu and Yi Hu contributed equally to this work and shared last authorship.

Authors and Affiliations

Department of Gastroenterology, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, 17 Yong Waizheng Street, Donghu District, Nanchang, 330006, Jiangxi Province, China
Ren-Chun Du, Xing Liu, Yong-Kang Lai, Yu-Xin Hu, Nong-Hua Lu, Yin Zhu & Yi Hu
Department of Gastroenterology, Ganzhou People’s Hospital Affiliated to Nanchang University, Ganzhou, China
Yong-Kang Lai
Department of Gastroenterology, Shanghai Changhai Hospital, Naval Medical University, Shanghai, China
Yong-Kang Lai
School of Math and Computer, Nanchang University, Nanchang, Jiangxi Province, China
Hao Deng
Faculty of Medicine, Macau University of Science and Technology, Macau, China
Hui-Qiao Zhou
Department of Surgery, The Chinese University of Hong Kong, Shatin NT, Hong Kong, China
Yi Hu

Authors

Ren-Chun Du
View author publications
You can also search for this author in PubMed Google Scholar
Xing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Kang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Xin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Deng
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Qiao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Nong-Hua Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Hu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ren-Chun Du, Xing Liu and Yong-Kang Lai performed the statistical analysis and wrote the manuscript. Yu-Xin Hu, Hao Deng and Hui-Qiao Zhou collected the data. Yin Zhu and Yi Hu designed the study. Nong-Hua Lu, Yin Zhu and Yi Hu revised the manuscript. All authors contributed to the article and approved the final manuscript.

Corresponding authors

Correspondence to Yin Zhu or Yi Hu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1: Figure S1: Flowchart of overall study design

12967_2024_5302_MOESM2_ESM.jpg

Supplementary Material 2: Figure S2: Comparison of accuracy of ChatGPT-4.0, ChatGPT-3.5 and examinees on acute pancreatitis test objective questions (A); Comparison of concordance of ChatGPT-4.0, ChatGPT-3.5 on acute pancreatitis test objective questions (B)

Supplementary Material 3.

Supplementary Material 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Du, RC., Liu, X., Lai, YK. et al. Exploring the performance of ChatGPT on acute pancreatitis-related questions. J Transl Med 22, 527 (2024). https://doi.org/10.1186/s12967-024-05302-8

Download citation

Received: 07 May 2024
Accepted: 13 May 2024
Published: 01 June 2024
DOI: https://doi.org/10.1186/s12967-024-05302-8

Exploring the performance of ChatGPT on acute pancreatitis-related questions

Methods

Results

Discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary Material 1: Figure S1: Flowchart of overall study design

12967_2024_5302_MOESM2_ESM.jpg

Supplementary Material 3.

Supplementary Material 4.

Rights and permissions

About this article

Cite this article

Share this article

Journal of Translational Medicine

Contact us