Multimodal large language models for use in diabetic retinopathy screening

S. Saeed Mohammadi; Sahana Aggarwal; Kavina Aggarwal; Grant Wiarda; Kayla Nguyen; Emmanuel A. Sarmiento; Quan Nguyen; Manjot K. Gill

doi:10.35119/aivo.v2i1.157

Vol. 2 No. 1 (2026), Original Articles

Vol. 2 No. 1 (2026)

Multimodal large language models for use in diabetic retinopathy screening

Original Articles

https://doi.org/10.35119/aivo.v2i1.157

Published 2026-03-11

S. Saeed Mohammadi⁺⁻
Sahana Aggarwal⁺⁻
Kavina Aggarwal⁺⁻
Grant Wiarda⁺⁻
Kayla Nguyen⁺⁻
Emmanuel A. Sarmiento
Quan Nguyen⁺⁻
Manjot K. Gill⁺⁻

S. Saeed Mohammadi

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

https://orcid.org/0000-0002-5996-730X

Sahana Aggarwal

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

Kavina Aggarwal

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

Grant Wiarda

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

Kayla Nguyen

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

https://orcid.org/0009-0002-1005-5219

Emmanuel A. Sarmiento

Quan Nguyen

Spencer Center for Vision Research, Byers Eye Institute, Stanford University School of Medicine, Palo Alto, CA, USA

Manjot K. Gill

Department of Ophthalmology, Northwestern University, Feinberg School of Medicine, Chicago, IL, USA

AIVO 2-1 157 PDF

How to Cite

1.

Mohammadi SS, Aggarwal S, Aggarwal K, Wiarda G, Nguyen K, Sarmiento EA, Nguyen Q, Gill MK. Multimodal large language models for use in diabetic retinopathy screening. AIVO [Internet]. 2026 Mar. 11 [cited 2026 Jul. 25];2(1). Available from: https://www.aivojournal.com/index.php/AIVO/article/view/157

Copyright notice

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Keywords

artificial intelligence; ChatGPT; diabetic retinopathy; Gemini; multimodal large language model

Abstract

Purpose: To evaluate the performance o f ChatGPT-4o and Gemini 2.5 Pro in detecting more-than-mild diabetic retinopathy (mtmDR) from fundus photography (FP) and diabetic macular edema (DME) from optical coherence tomography (OCT) using publicly available datasets.

Methods: A custom GPT (powered by ChatGPT-4o) was created and instructed to follow the LumineticsCore™ (IDx-DR) screening criteria for mtmDR, defined as an ETDRS level ≥ 35 and/or clinically significant diabetic macular edema (CSDME). Gemini 2.5 Pro was evaluated with the same criteria. Performance on FPs was assessed using 2 publicly available datasets: MESSIDOR-2 (n = 106; 66 mtmDR, 40 no mtmDR) and EyePACS (n = 99; 56 mtmDR, 43 non-mtmDR). To assess detection of DME, a separate OCT dataset (n = 48; 24 DME, 24 normal) was used to evaluate identification of intraretinal and/or subretinal fluid. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for detecting mtmDR on FP and DME on OCT were calculated for each multimodal large language model (LLM).

Results: On MESSIDOR-2 (n = 106), ChatGPT-4o achieved a sensitivity of 90.77%, specificity of 97.50%, PPV of 98.33%, and NPV of 86.67% for mtmDR detection. Gemini 2.5 Pro achieved a sensitivity of 80.30%, specificity of 97.50%, PPV of 98.15%, and NPV of 75.00%. On EyePACS (n = 99), ChatGPT-4o demonstrated a sensitivity of 94.64%, specificity of 86.05%, PPV of 89.83%, and NPV of 92.50%, while Gemini 2.5 Pro achieved a sensitivity of 89.29%, specificity of 88.37%, PPV of 90.91%, and NPV of 86.36%. For OCT-based DME detection (n = 48), ChatGPT-4o achieved a sensitivity of 95.83%, specificity of 100%, and PPV of 100%, while Gemini 2.5 Pro achieved a sensitivity of 95.83%, specificity of 95.65%, PPV of 95.83%, and NPV of 95.65%.

Conclusion: ChatGPT-4o and Gemini 2.5 Pro demonstrated high performance in detecting mtmDR and DME across multiple publicly available datasets. These findings support the potential of multimodal LLMs as cost-effective and accessible tools for diabetic retinopathy screening. Further validation in larger, more diverse real-world datasets is warranted.

https://doi.org/10.35119/aivo.v2i1.157

AIVO 2-1 157 PDF

References

Lim JI, Kim SJ, Bailey ST, et al. Diabetic Retinopathy Preferred Practice Pattern(R). Ophthalmology. 2025;132(4):P75-P162. https://doi.org/10.1016/j.ophtha.2024.12.020

Wong TY, Cheung CM, Larsen M, Sharma S, Simo R. Diabetic retinopathy. Nat Rev Dis Primers. 2016;2:16012. https://doi.org/10.1038/nrdp.2016.12

American Diabetes Association Professional Practice C. 12. Retinopathy, Neuropathy, and Foot Care: Standards of Care in Diabetes-2025. Diabetes Care. 2025;48(1 Suppl 1):S252-S265. https://doi.org/10.2337/dc25-S012

Lundeen EA, Andes LJ, Rein DB, et al. Trends in Prevalence and Treatment of Diabetic Macular Edema and Vision-Threatening Diabetic Retinopathy Among Medicare Part B Fee-for-Service Beneficiaries. JAMA Ophthalmol. 2022;140(4):345-353. https://doi.org/10.1001/jamaophthalmol.2022.0052

Solomon SD, Chew E, Duh EJ, et al. Diabetic Retinopathy: A Position Statement by the American Diabetes Association. Diabetes Care. 2017;40(3):412-418. https://doi.org/10.2337/dc16-2641

Li J, Guan Z, Wang J, et al. Integrated image-based deep learning and language models for primary diabetes care. Nat Med. 2024;30(10):2886-2896. https://doi.org/10.1038/s41591-024-03139-8

Jaskari J, Sahlsten J, Summanen P, et al. DR-GPT: A large language model for medical report analysis of diabetic retinopathy patients. PLoS One. 2024;19(10):e0297706. https://doi.org/10.1371/journal.pone.0297706

Abramoff MD, Folk JC, Han DP, et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013;131(3):351-357. https://doi.org/10.1001/jamaophthalmol.2013.1743

Decencière E, Zhang X, Cazuguel G, et al. FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE. 2014. 2014;33(3):4. https://doi.org/10.5566/ias.1155

Zhou Y, Chia MA, Wagner SK, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622(7981):156-163. https://doi.org/10.1038/s41586-023-06555-x

Men Y, Fhima J, Celi LA, Ribeiro LZ, Nakayama LF, Behar JA. DRStageNet: Deep learning for diabetic retinopathy staging from fundus images. arXiv preprint arXiv:231214891. 2023.

Dugas E, Jared J, Cukierski W. Diabetic retinopathy detection (2015). URL https://kaggle com/competitions/diabetic-retinopathy-detection.7.

Kermany DS, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. 2018;172(5):1122-1131 e1129. https://doi.org/10.1016/j.cell.2018.02.010

FDA. De Novo Classification Request for IDx-DR, DEN180001, Decision Summary. Available from: https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180001.pdf.

Grading diabetic retinopathy from stereoscopic color fundus photographs--an extension of the modified Airlie House classification. ETDRS report number 10. Early Treatment Diabetic Retinopathy Study Research Group. Ophthalmology. 1991;98(5 Suppl):786-806. https://www.ncbi.nlm.nih.gov/pubmed/2062513

Wong TY, Sabanayagam C. Strategies to Tackle the Global Burden of Diabetic Retinopathy: From Epidemiology to Artificial Intelligence. Ophthalmologica. 2020;243(1):9-20. https://doi.org/10.1159/000502387

Teo ZL, Tham YC, Yu M, et al. Global Prevalence of Diabetic Retinopathy and Projection of Burden through 2045: Systematic Review and Meta-analysis. Ophthalmology. 2021;128(11):1580-1591. https://doi.org/10.1016/j.ophtha.2021.04.027

Yau JW, Rogers SL, Kawasaki R, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. 2012;35(3):556-564. https://doi.org/10.2337/dc11-1909

Ipp E, Liljenquist D, Bode B, et al. Pivotal Evaluation of an Artificial Intelligence System for Autonomous Detection of Referrable and Vision-Threatening Diabetic Retinopathy. JAMA Netw Open. 2021;4(11):e2134254. https://doi.org/10.1001/jamanetworkopen.2021.34254

Owsley C, McGwin G, Jr., Lee DJ, et al. Diabetes eye screening in urban settings serving minority populations: detection of diabetic retinopathy and other ocular findings using telemedicine. JAMA Ophthalmol. 2015;133(2):174-181. https://doi.org/10.1001/jamaophthalmol.2014.4652

Rajesh AE, Davidson OQ, Lee CS, Lee AY. Artificial Intelligence and Diabetic Retinopathy: AI Framework, Prospective Studies, Head-to-head Validation, and Cost-effectiveness. Diabetes Care. 2023;46(10):1728-1739. https://doi.org/10.2337/dci23-0032

Lee AY, Yanagihara RT, Lee CS, et al. Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems. Diabetes Care. 2021;44(5):1168-1175. https://doi.org/10.2337/dc20-1877

Riotto E, Gasser S, Potic J, et al. Accuracy of Autonomous Artificial Intelligence-Based Diabetic Retinopathy Screening in Real-Life Clinical Practice. J Clin Med. 2024;13(16). https://doi.org/10.3390/jcm13164776

Chen EM, Chen D, Chilakamarri P, Lopez R, Parikh R. Economic Challenges of Artificial Intelligence Adoption for Diabetic Retinopathy. Ophthalmology. 2021;128(3):475-477. https://doi.org/10.1016/j.ophtha.2020.07.043

Mehra AA, Softing A, Guner MK, Hodge DO, Barkmeier AJ. Diabetic Retinopathy Telemedicine Outcomes With Artificial Intelligence-Based Image Analysis, Reflex Dilation, and Image Overread. Am J Ophthalmol. 2022;244:125-132. https://doi.org/10.1016/j.ajo.2022.08.008

Rajpurkar P, Lungren MP. The Current and Future State of AI Interpretation of Medical Images. N Engl J Med. 2023;388(21):1981-1990. https://doi.org/10.1056/NEJMra2301725

Abramoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. https://doi.org/10.1038/s41746-018-0040-6

Aftab O, Khan H, VanderBeek BL, Scoles D, Kim BJ, Tsui JC. Evaluation of ChatGPT-4 in detecting referable diabetic retinopathy using single fundus images. AJO International. 2025;2(2):100111. https://doi.org/https://doi.org/10.1016/j.ajoint.2025.100111

Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering for large language models. Patterns. 2025;6(6):101260. https://doi.org/https://doi.org/10.1016/j.patter.2025.101260

AIVO 2-1 157 PDF

Multimodal large language models for use in diabetic retinopathy screening

Categories

How to Cite

Download Citation

Copyright notice

Keywords

Abstract

References