Skip to main content
  • Letter to the Editor
  • Open access
  • Published:

Exploring the performance of ChatGPT on acute pancreatitis-related questions

Letter to the editor:

Acute pancreatitis (AP) is a serious gastrointestinal disease with an incidence rate of approximately 34 cases per 100,000 individuals annually, the overall burden of AP remains high with the aging population [1]. There is a notable trend among the public to acknowledge AP-related information to improve awareness.

Artificial intelligence (AI) is a large language model providing updated and useful information. The Chat Generative Pre-trained Transformer (ChatGPT, https://openai.com), developed by OpenAI and launched on November 30, 2022, stands out in this field. Various studies have explored its utility in responding to medical questions. This study aims to evaluate and compare the capabilities of ChatGPT-3.5 and ChatGPT-4.0 in answering test questions about AP, employing both subjective and objective metrics.

Methods

As shown in Table S1, we conducted our study using 18 subjective test questions derived from the Atlanta AP classification consensus and the American Gastroenterological Association (AGA) guidelines (Strength of recommendation: Strong) [2,3,4]. Additionally, we selected 73 objective questions with the highest number of tested times from the Chinese professional physician test database, categorizing them into four subfields (Table S2). These questions were submitted to ChatGPT in two separate sessions on February 1, 2024, and February 8, 2024, respectively. Two independent reviewers evaluated the subjective questions using a 5-point Likert Scale. Any discordance was resolved by the third author. The flowchart of overall study design is presented in Figure S1. The response accuracy was analyzed using the Chi-squared and Mann–Whitney U tests, with a P-value of < 0.05 indicating statistical significance.

Results

As shown in Table 1, ChatGPT-3.5 correctly answered 80% of subjective questions, while ChatGPT-4.0 achieved an accuracy rate of 94%. For objective questions, ChatGPT-4.0 outperformed ChatGPT-3.5 with a 78.1% accuracy rate compared to 68.5% (P = 0.01) (Figure S2A). Across all questions tested in the study, the concordance rate between ChatGPT-3.5 and ChatGPT-4.0 was 80.8% and 83.6% (Figure S2B), respectively, with the mean number of words per response being 218.5 for ChatGPT-3.5 and 246.0 for ChatGPT-4.0 (Table 1). Notably, correct answers showed higher concordance rates than incorrect ones across both versions of ChatGPT (95.7% and 91.1% vs. 55.6% and 58.8%) (Table 2). Notably, both ChatGPT-3.5 and ChatGPT-4.0 demonstrated high accuracy rates, particularly in the etiology category.

Table 1 Quality indicators (scientific adequacy) for answers from ChatGPT version 3.5 and 4.0
Table 2 Performance of ChatGPT 3.5, ChatGPT 4.0 and medical college examinees on acute pancreatitis test questions and by different subfields

Discussion

Our findings indicate that ChatGPT-4.0 outperformed ChatGPT-3.5 in answering both subjective and objective test questions related to AP, demonstrating a superior total accuracy. The accuracy of both ChatGPT-3.5 and the examinees in responding to clinical feature test questions was generally low, which suggests that clinical features associated with AP are complex, often involving numerous complications, which makes identifying the optimal solution challenging.

In addressing subjective questions, ChatGPT tends to provide a range of answers, mixing relevant with irrelevant information, making it challenging to discern the most accurate answer for healthcare professionals and patients. This discrepancy highlights the lower accuracy rate for objective choice questions compared to subjective ones. However, ChatGPT-4.0 showed improvements in providing more precise, concise, and focused answers.

Although ChatGPT answered most subjective questions correctly, the standard answers were conducted based on early guideline evidence. A significant limitation of artificial intelligence is its inability to update information in real-time. Recent randomized controlled trials focusing on AP have presented evidence that questions existing management strategies, such as the use of antibiotics, fluid resuscitation, the handling of infected necrosis, and the early application of ERCP [5]. It is imperative to reevaluate the current management guidelines to ensure they reflect the latest evidence.

This study has several limitations. Firstly, although we conducted two separate evaluations, the results might be influenced by the timing of the assessments of ChatGPT. Secondly, we did not incorporate patient perspectives, which are crucial as they are the ultimate recipients of AP-related information. Thirdly, the study participants were medical students, and we lacked data from practicing doctors.

In conclusion, ChatGPT-4.0 exhibited superior performance compared to ChatGPT-3.5. However, both versions of ChatGPT tended to provide broad and generalized answers across various topics and aspects, rather than offering optimal solutions. Therefore, ChatGPT excels at addressing subjective questions and offering a wide range of options, but it is not suitable for providing optimal management strategies, and cannot adjust treatment plans based on the latest evidence, where enhancements in training are required.

Availability of data and materials

Not applicable.

References

  1. Petrov MS, Yadav D. Global epidemiology and holistic prevention of pancreatitis. Nat Rev Gastroenterol Hepatol. 2019;16(3):175–84.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Banks PA, et al. Classification of acute pancreatitis–2012: revision of the Atlanta classification and definitions by international consensus. Gut. 2013;62(1):102–11.

    Article  PubMed  Google Scholar 

  3. Baron TH, et al. American gastroenterological association clinical practice update: management of pancreatic necrosis. Gastroenterology. 2020;158(1):67-75.e1.

    Article  CAS  PubMed  Google Scholar 

  4. Crockett SD, et al. American gastroenterological association institute guideline on initial management of acute pancreatitis. Gastroenterology. 2018;154(4):1096–101.

    Article  PubMed  Google Scholar 

  5. de Madaria E, Buxbaum JL. Advances in the management of acute pancreatitis. Nat Rev Gastroenterol Hepatol. 2023;20(11):691–2.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by the National Natural Science Foundation of China (NO. 82000531, 82360118 and 82170580); the Project for Academic and Technical Leaders of Major Disciplines in Jiangxi Province (NO. 20212BCJL23065); the Key Research and Development Program of Jiangxi Province (NO. 20212BBG73018).

Author information

Authors and Affiliations

Authors

Contributions

Ren-Chun Du, Xing Liu and Yong-Kang Lai performed the statistical analysis and wrote the manuscript. Yu-Xin Hu, Hao Deng and Hui-Qiao Zhou collected the data. Yin Zhu and Yi Hu designed the study. Nong-Hua Lu, Yin Zhu and Yi Hu revised the manuscript. All authors contributed to the article and approved the final manuscript.

Corresponding authors

Correspondence to Yin Zhu or Yi Hu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1: Figure S1: Flowchart of overall study design

12967_2024_5302_MOESM2_ESM.jpg

Supplementary Material 2: Figure S2: Comparison of accuracy of ChatGPT-4.0, ChatGPT-3.5 and examinees on acute pancreatitis test objective questions (A); Comparison of concordance of ChatGPT-4.0, ChatGPT-3.5 on acute pancreatitis test objective questions (B)

Supplementary Material 3.

Supplementary Material 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, RC., Liu, X., Lai, YK. et al. Exploring the performance of ChatGPT on acute pancreatitis-related questions. J Transl Med 22, 527 (2024). https://doi.org/10.1186/s12967-024-05302-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12967-024-05302-8