Applications of large language models in radiology: a systematic review.



Дәйексөз келтіру

Толық мәтін

Аннотация

Introduction Modern large language models have the potential to be used in radiological diagnostics to address a number of routine tasks: generating structured reports, extracting information from radiological reports, and making diagnoses. To realize this potential, it is necessary to assess the diagnostic effectiveness and reproducibility of the results of large language models.

Objective: To analyze the worldwide literature on the application of large language models in radiological diagnostics, evaluate the diagnostic effectiveness and accuracy of these models in addressing existing tasks, and identify potential problems that may hinder the implementation of large language models in radiological practice.

Materials and methods Searching of relevant works was conducted in the PubMed and RSCI databases, as well as in the reference lists (2023-2025). The quality of the selected studies was assessed with QUADAS-CAD questionnaire.

Results Nine studies were included. The most commonly encountered tasks were the diagnosis based on radiological reports (3 studies) and the detection of clinically significant findings in reports (2). GPT-4 (5) and BERT (3) were the most frequently used large language models, GPT-3.5, Llama 2, Med42, GPT-4V, and Gemini Pro were also appearing. GPT-4 demonstrated high effectiveness and accuracy in diagnosing brain tumors (accuracy 73,0%), in diagnosing myocarditis (83,0%), and in decision-making regarding invasive procedures for acute coronary syndrome (86,0%). The diagnostic effectiveness and accuracy of the GPT-4 model were not high in diagnosing pathologies of the nervous system of various origins (50,0%) and musculoskeletal disorders (43,0%). The BERT model showed high diagnostic effectiveness and accuracy in tasks related to the detection of pulmonary nodules (99,0%) and signs of intracranial hemorrhage (sensitivity 97,0% and specificity 90,0%), and in the task of classifying reports (accuracy 84,3%).

Most of the studies (88,9%) contain the probability of systematic error. The main reasons for this include small and imbalanced samples, overlap between training and test datasets, and insufficiently accurate preparation and description of reference standards.

Discussion The diagnostic effectiveness parameters of large language models vary between different studies. For the implementation of large language models in the practice, it is necessary to standardize and improve the quality of methods in AI research.

 

Толық мәтін

Funding source. The publication of this work was supported by the Moscow Government Grant "Research on the application of large language models in the field of healthcare based on artificial intelligence technologies" in accordance with the Moscow Government Resolution of April 1, 2025 No. 656-PP.

Competing interests. The authors declare that they have no competing interests.

Author contribution. All authors confirm that their authorship meets the international ICMJE criteria (all authors have made a significant contribution to the development of the concept, research and preparation of the article, read and approved the final version before publication). The largest contribution is distributed as follows: Yu.A. Vasiliev, A.V. Vladzymyrskyy, O.V. Omelyanskaya – development of the research concept, approvement of the final version of the manuscript. R.V. Reshetnikov, O.G. Nanova, K.M. Arzamasov, M.R. Kodenko, R.A. Erizhokov  – literature review, data analysis, writing the text of the manuscript.

TABLES

Table 1. List of the included studies and their basic characteristics

Table 2. Diagnostic parameters of large language models and medical workers: sensitivity, specificity, and accuracy

 

FIGURES

Fig. 1. Systematic literature search flowchart

Fig. 2. Risk of bias estimation by QUADAS-CAD

 

SUPPLEMENTARY

Table S1. List of the included studies and their basic characteristics (continuation of table 1)

Table S2. QUADAS-CAD domain questions: the italic font denotes key questions.

 

 

×

Авторлар туралы

Yuriy Vasilev

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: npcmr@zdrav.mos.ru
ORCID iD: 0000-0002-5283-5961
SPIN-код: 4458-5608

MD, Dr. Sci. (Medicine)

Ресей, Moscow

Roman Reshetnikov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: r.reshetnikov@npcmr.ru
ORCID iD: 0000-0002-9661-0254

Cand. Sci. (Physical and Mathematical), Department Head of Medical Research, Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department, Moscow, 127051, Russian Federation

e-mail: r.reshetnikov@npcmr.ru

Ресей

Olga Nanova

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow, Russian Federation
Petrovka Street, 24, Building 1, 127051 Moscow, Russia

Хат алмасуға жауапты Автор.
Email: nanova@mail.ru
ORCID iD: 0000-0001-8886-3684
SPIN-код: 6135-4872

Leading Researcher

Ресей, 24/1 Petrovka street, 127051 Moscow, Russia

Anton Vladzymyrskyy

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: vladzimirskijAV@zdrav.mos.ru
ORCID iD: 0000-0002-2990-7736
SPIN-код: 3602-7120

MD, Dr. Sci. (Medicine)

Ресей, Moscow

Kirill Arzamasov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: ArzamasovKM@zdrav.mos.ru
ORCID iD: 0000-0001-7786-0349
SPIN-код: 3160-8062

MD, Cand. Sci. (Medicine)

Ресей, Moscow

Olga Omelyanskaya

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: OmelyanskayaOV@zdrav.mos.ru
ORCID iD: 0000-0002-0245-4431
SPIN-код: 8948-6152
Ресей, Moscow

Maria Kodenko

Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: m.r.kodenko@yandex.ru
ORCID iD: 0000-0002-0166-3768
SPIN-код: 5789-0319
Ресей

Rustam Erizhokov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: ErizhokovRA@zdrav.mos.ru
ORCID iD: 0009-0007-3636-2889
SPIN-код: 2274-6428

Junior Research Fellow, Head of Department

24/1 Petrovka street, 127051 Moscow, Russia

Anastasia Pamova

State Budget-Funded Health Care Institution of the City of Moscow "Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department"

Email: PamovaAP@zdrav.mos.ru
ORCID iD: 0000-0002-0041-3281
SPIN-код: 5146-4355
Ресей, 24/1 Petrovka street, 127051 Moscow, Russia

Әдебиет тізімі

  1. Cherif H., Moussa C., Missaoui A.M., Salouage I., Mokaddem S., Dhahri B. Appraisal of ChatGPT's aptitude for medical education: comparative analysis with third-year medical students in a pulmonology examination // JMIR Med Educ. 2024. e52818. doi: 10.2196/52818.
  2. Kim W., Kim B.C., Yeom H.G. Performance of Large Language Models on the Korean dental licensing examination: a comparative study // Int Dent J. 2025 Vol. 75, N 1. P. 176-184. doi: 10.1016/j.identj.2024.09.002.
  3. Busch F., Hoffmann L., dos Santos D.P. et al. Large language models for structured reporting in radiology: past, present, and future // Eur Radiol. 2024. https://doi.org/10.1007/s00330-024-11107-6.
  4. Lecler A., Duron L., Soyer P. Revolutionizing radiology with GPT-based models: current applications, future possibilities and limitations of ChatGPT // Diagnostic and Interventional Imaging. 2023 Vol. 104, N. 6. P. 269-274. https://doi.org/10.1016/j.diii.2023.02.003.
  5. Методические рекомендации по подготовке систематического обзора. – Москва: Государственное бюджетное учреждение здравоохранения города Москвы «Научно-практический клинический центр диагностики и телемедицинских технологий Департамента здравоохранения города Москвы», 2023. – 34 с.
  6. Kodenko M.R., Vasilev Y.A., Vladzymyrskyy A.V., Omelyanskaya O.V., Leonov D.V., Blokhin I.A., Novik V.P., Kulberg N.S., Samorodov A.V., Mokienko O.A., Reshetnikov R.V. Diagnostic accuracy of AI for opportunistic screening of abdominal aortic aneurysm in CT: a systematic review and narrative synthesis // Diagnostics. 2022. Vol. 12. 3197. doi: 10.3390/diagnostics12123197.
  7. Horiuchi D., Tatekawa H., Oura T. et al. ChatGPT’s diagnostic performance based on textual vs. visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology // Eur Radiol. 2025. Vol. 35. P. 506–516. https://doi.org/10.1007/s00330-024-10902-5.
  8. Mitsuyama Y., Tatekawa, H., Takita H. et al. Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors // Eur. Radiol. 2024. https://doi.org/10.1007/s00330-024-11032-8.
  9. Kaya K., Gietzen C., Hahnfeldt R. et al. Generative pre-trained transformer 4 analysis of cardiovascular magnetic resonance reports in suspected myocarditis: a multicenter study // J Cardiovasc Magn Reson. 2024. Vol. 26, N. 2. 101068. doi: 10.1016/j.jocmr.2024.101068.
  10. Grolleau E., Couraud S., Jupin Delevaux E., Piegay C., Mansuy A., de Bermont J., Cotton F., Pialat J.B., Talbot F., Boussel L. Incidental pulmonary nodules: Natural language processing analysis of radiology reports // Respir Med Res. 2024. Vol. 86. 101136. doi: 10.1016/j.resmer.2024.101136.
  11. Khoruzhaya A.N., Kozlov D.V., Arzamasov K.M., Kremneva E.I. Comparison of an ensemble of machine learning models and the BERT language model for analysis of text descriptions of brain CT reports to determine the presence of intracranial hemorrhage // Sovrem Tekhnologii Med. 2024. Vol. 16, N. 1. P. 27-34. doi: 10.17691/stm2024.16.1.03.
  12. Han T., Adams L.C., Bressem K.K., Busch F., Nebelung S., Truhn D. Comparative analysis of multimodal large language model performance on clinical vignette questions // JAMA. 2024. Vol. 331, N. 15. P. 1320-1321. doi: 10.1001/jama.2023.27861.
  13. Horiuchi D., Tatekawa H., Shimono T. et al. Accuracy of ChatGPT generated diagnosis from patient's medical history and imaging findings in neuroradiology cases // Neuroradiology. 2024. Vol. 66. P. 73–79. https://doi.org/10.1007/s00234-023-03252-4.
  14. Wataya T., Miura A., Sakisuka T. et al. Comparison of natural language processing algorithms in assessing the importance of head computed tomography reports written in Japanese // Jpn J Radiol. 2024. V. 42. P. 697–708. https://doi.org/10.1007/s11604-024-01549-9.
  15. Cagnina A., Salihu A., Meier D., Luangphiphat W., Faltin B., Skalidis I., Zimmerli A., Rotzinger D., Dine Qanadli S., Muller O., Abbe E., Fournier S. Assessing the need for coronary angiography in high-risk non-ST-elevation acute coronary syndrome patients using artificial intelligence and computed tomography // Int J Cardiovasc Imaging. 2025. Vol. 41, N. 1. P. 55-61. doi: 10.1007/s10554-024-03283-9.
  16. Bonferroni C.E. Il Calcolo Delle Assicurazioni su Gruppi di Teste. Studi in Onore del Professore Salvatore Ortu Carboni, Rome, Italy, 1935. P. 13–60.
  17. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing // Journal of the Royal Statistical Society: Series B (Methodological). 1995. Vol. 57, N 1. P. 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
  18. Hollestein L.M., Lo S.N., Leonardi‐Bee J., Rosset S., Shomron N., Couturier D.‐L., Gran S. MULTIPLE ways to correct for MULTIPLE comparisons in MULTIPLE types of studies // British Journal of Dermatology. 2021 Vol. 185, N 1. P. 1081–1083. https://doi.org/10.1111/bjd.20600.
  19. Collins G. S., Moons K. G. M., Dhiman P., Riley R. D., Beam A. L., Van Calster B. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods // BMJ. 2024. Vol. 385. e078378. doi: 10.1136/bmj-2023-078378.
  20. Cohen J.F., Korevaar D.A., Altman D.G. et al. Guidelines for reporting diagnostic accuracy studies: explanation and elaboration // BMJ. 2016. Vol. 6. e012799. doi: 10.1136/bmjopen-2016-012799.
  21. Bossuyt P.M., Reitsma J.B., Bruns D.E. et al. An Updated List of Essential Items for Reporting Diagnostic Accuracy Studies. https://www.equator-network.org/wp-content/uploads/2015/03/STARD-2015-checklist.pdf.
  22. Vasiliev Y.A., Vlazimirsky A.V., Omelyanskaya O.V., Arzamasov K.M., Chetverikov S.F., Rumyantsev D.A., Zelenova M.A. Methodology for testing and monitoring artificial intelligence-based software for medical diagnostics // Digital Diagnostics. 2023. Vol. 4, N. 3. P. 252–267. doi: 10.17816/DD321971.
  23. Vasilev Y. A., Bobrovskaya T. M., Arzamasov K. M. et al. Medical datasets for machine learning: fundamental principles of standartization and systematization // Manager Zdravoohranenia. 2023. Vol. 4. P. 28–41. doi: 10.21045/1811-0185-2023-4-28-41.
  24. Vinogradova IA, Nizovtsova LA, Omelyanskaya OV. Innovative strategic session in the scientific activity of the Center for Diagnostics and Telemedicine // Digital Diagnostics. 2022. Vol. 3, N. 4. P. 414−420. DOI: https://doi.org/10.17816/DD111833.
  25. Certificate for state registration of a database No. 2024621476, the Russian Federation. MosMedData: tekstovye protokoly KT grudnoj polosti s nalichiem i otsutstviem priznakov rasshireniya legochnogo stvola, anevrizmy aorty, emfizemy, gidrotoraksa, kompressionnogo pereloma tel pozvonkov [MosMedData: textual reports of chest cavity CT studies with and without signs of pulmonary trunk dilation, aortic aneurysm, emphysema, hydrothorax, or compression fracture of the vertebral bodies] : No. 2023625239 : submitted 28.12.2023 : published 04.04.2024 / Yu.A. Vasilev, A.V. Vladzymyrskyy, O.V. Omelyanskaya [et al.] ; submitter Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department.
  26. Kalinina M.L., Svitachev A.P., Biswas D., Vishnu P. Comparison of awareness and attitudes toward artificial intelligence among Russian- and English-speaking students at Orenburg State Medical University // Digital Diagnostics. 2023. Vol. 4. N. 1S. P. 62–65. https://doi.org/10.17816/DD430346.

Қосымша файлдар

Қосымша файлдар
Әрекет
1. JATS XML

© Eco-Vector,

Creative Commons License
Бұл мақала лицензия бойынша қол жетімді Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 79539 от 09 ноября 2020 г.