Keywords
Abstract
Purpose: To evaluate the performance o f ChatGPT-4o and Gemini 2.5 Pro in detecting more-than-mild diabetic retinopathy (mtmDR) from fundus photography (FP) and diabetic macular edema (DME) from optical coherence tomography (OCT) using publicly available datasets.
Methods: A custom GPT (powered by ChatGPT-4o) was created and instructed to follow the LumineticsCore™ (IDx-DR) screening criteria for mtmDR, defined as an ETDRS level ≥ 35 and/or clinically significant diabetic macular edema (CSDME). Gemini 2.5 Pro was evaluated with the same criteria. Performance on FPs was assessed using 2 publicly available datasets: MESSIDOR-2 (n = 106; 66 mtmDR, 40 no mtmDR) and EyePACS (n = 99; 56 mtmDR, 43 non-mtmDR). To assess detection of DME, a separate OCT dataset (n = 48; 24 DME, 24 normal) was used to evaluate identification of intraretinal and/or subretinal fluid. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for detecting mtmDR on FP and DME on OCT were calculated for each multimodal large language model (LLM).
Results: On MESSIDOR-2 (n = 106), ChatGPT-4o achieved a sensitivity of 90.77%, specificity of 97.50%, PPV of 98.33%, and NPV of 86.67% for mtmDR detection. Gemini 2.5 Pro achieved a sensitivity of 80.30%, specificity of 97.50%, PPV of 98.15%, and NPV of 75.00%. On EyePACS (n = 99), ChatGPT-4o demonstrated a sensitivity of 94.64%, specificity of 86.05%, PPV of 89.83%, and NPV of 92.50%, while Gemini 2.5 Pro achieved a sensitivity of 89.29%, specificity of 88.37%, PPV of 90.91%, and NPV of 86.36%. For OCT-based DME detection (n = 48), ChatGPT-4o achieved a sensitivity of 95.83%, specificity of 100%, and PPV of 100%, while Gemini 2.5 Pro achieved a sensitivity of 95.83%, specificity of 95.65%, PPV of 95.83%, and NPV of 95.65%.
Conclusion: ChatGPT-4o and Gemini 2.5 Pro demonstrated high performance in detecting mtmDR and DME across multiple publicly available datasets. These findings support the potential of multimodal LLMs as cost-effective and accessible tools for diabetic retinopathy screening. Further validation in larger, more diverse real-world datasets is warranted.
References
Lim JI, Kim SJ, Bailey ST, et al. Diabetic Retinopathy Preferred Practice Pattern(R). Ophthalmology. 2025;132(4):P75-P162. https://doi.org/10.1016/j.ophtha.2024.12.020
Wong TY, Cheung CM, Larsen M, Sharma S, Simo R. Diabetic retinopathy. Nat Rev Dis Primers. 2016;2:16012. https://doi.org/10.1038/nrdp.2016.12
American Diabetes Association Professional Practice C. 12. Retinopathy, Neuropathy, and Foot Care: Standards of Care in Diabetes-2025. Diabetes Care. 2025;48(1 Suppl 1):S252-S265. https://doi.org/10.2337/dc25-S012
Lundeen EA, Andes LJ, Rein DB, et al. Trends in Prevalence and Treatment of Diabetic Macular Edema and Vision-Threatening Diabetic Retinopathy Among Medicare Part B Fee-for-Service Beneficiaries. JAMA Ophthalmol. 2022;140(4):345-353. https://doi.org/10.1001/jamaophthalmol.2022.0052
Solomon SD, Chew E, Duh EJ, et al. Diabetic Retinopathy: A Position Statement by the American Diabetes Association. Diabetes Care. 2017;40(3):412-418. https://doi.org/10.2337/dc16-2641
Li J, Guan Z, Wang J, et al. Integrated image-based deep learning and language models for primary diabetes care. Nat Med. 2024;30(10):2886-2896. https://doi.org/10.1038/s41591-024-03139-8
Jaskari J, Sahlsten J, Summanen P, et al. DR-GPT: A large language model for medical report analysis of diabetic retinopathy patients. PLoS One. 2024;19(10):e0297706. https://doi.org/10.1371/journal.pone.0297706
Abramoff MD, Folk JC, Han DP, et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013;131(3):351-357. https://doi.org/10.1001/jamaophthalmol.2013.1743
Decencière E, Zhang X, Cazuguel G, et al. FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE. 2014. 2014;33(3):4. https://doi.org/10.5566/ias.1155
Zhou Y, Chia MA, Wagner SK, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622(7981):156-163. https://doi.org/10.1038/s41586-023-06555-x
Men Y, Fhima J, Celi LA, Ribeiro LZ, Nakayama LF, Behar JA. DRStageNet: Deep learning for diabetic retinopathy staging from fundus images. arXiv preprint arXiv:231214891. 2023.
Dugas E, Jared J, Cukierski W. Diabetic retinopathy detection (2015). URL https://kaggle com/competitions/diabetic-retinopathy-detection.7.
Kermany DS, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. 2018;172(5):1122-1131 e1129. https://doi.org/10.1016/j.cell.2018.02.010
FDA. De Novo Classification Request for IDx-DR, DEN180001, Decision Summary. Available from: https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180001.pdf.
Grading diabetic retinopathy from stereoscopic color fundus photographs--an extension of the modified Airlie House classification. ETDRS report number 10. Early Treatment Diabetic Retinopathy Study Research Group. Ophthalmology. 1991;98(5 Suppl):786-806. https://www.ncbi.nlm.nih.gov/pubmed/2062513
Wong TY, Sabanayagam C. Strategies to Tackle the Global Burden of Diabetic Retinopathy: From Epidemiology to Artificial Intelligence. Ophthalmologica. 2020;243(1):9-20. https://doi.org/10.1159/000502387
Teo ZL, Tham YC, Yu M, et al. Global Prevalence of Diabetic Retinopathy and Projection of Burden through 2045: Systematic Review and Meta-analysis. Ophthalmology. 2021;128(11):1580-1591. https://doi.org/10.1016/j.ophtha.2021.04.027
Yau JW, Rogers SL, Kawasaki R, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. 2012;35(3):556-564. https://doi.org/10.2337/dc11-1909
Ipp E, Liljenquist D, Bode B, et al. Pivotal Evaluation of an Artificial Intelligence System for Autonomous Detection of Referrable and Vision-Threatening Diabetic Retinopathy. JAMA Netw Open. 2021;4(11):e2134254. https://doi.org/10.1001/jamanetworkopen.2021.34254
Owsley C, McGwin G, Jr., Lee DJ, et al. Diabetes eye screening in urban settings serving minority populations: detection of diabetic retinopathy and other ocular findings using telemedicine. JAMA Ophthalmol. 2015;133(2):174-181. https://doi.org/10.1001/jamaophthalmol.2014.4652
Rajesh AE, Davidson OQ, Lee CS, Lee AY. Artificial Intelligence and Diabetic Retinopathy: AI Framework, Prospective Studies, Head-to-head Validation, and Cost-effectiveness. Diabetes Care. 2023;46(10):1728-1739. https://doi.org/10.2337/dci23-0032
Lee AY, Yanagihara RT, Lee CS, et al. Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems. Diabetes Care. 2021;44(5):1168-1175. https://doi.org/10.2337/dc20-1877
Riotto E, Gasser S, Potic J, et al. Accuracy of Autonomous Artificial Intelligence-Based Diabetic Retinopathy Screening in Real-Life Clinical Practice. J Clin Med. 2024;13(16). https://doi.org/10.3390/jcm13164776
Chen EM, Chen D, Chilakamarri P, Lopez R, Parikh R. Economic Challenges of Artificial Intelligence Adoption for Diabetic Retinopathy. Ophthalmology. 2021;128(3):475-477. https://doi.org/10.1016/j.ophtha.2020.07.043
Mehra AA, Softing A, Guner MK, Hodge DO, Barkmeier AJ. Diabetic Retinopathy Telemedicine Outcomes With Artificial Intelligence-Based Image Analysis, Reflex Dilation, and Image Overread. Am J Ophthalmol. 2022;244:125-132. https://doi.org/10.1016/j.ajo.2022.08.008
Rajpurkar P, Lungren MP. The Current and Future State of AI Interpretation of Medical Images. N Engl J Med. 2023;388(21):1981-1990. https://doi.org/10.1056/NEJMra2301725
Abramoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. https://doi.org/10.1038/s41746-018-0040-6
Aftab O, Khan H, VanderBeek BL, Scoles D, Kim BJ, Tsui JC. Evaluation of ChatGPT-4 in detecting referable diabetic retinopathy using single fundus images. AJO International. 2025;2(2):100111. https://doi.org/https://doi.org/10.1016/j.ajoint.2025.100111
Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering for large language models. Patterns. 2025;6(6):101260. https://doi.org/https://doi.org/10.1016/j.patter.2025.101260
