https://www.aivojournal.com/index.php/AIVO/issue/feed Artificial Intelligence in Vision and Ophthalmology 2026-03-12T06:38:39+00:00 Kugler Publications info@kuglerpublications.com Open Journal Systems <p>Artificial Intelligence in Vision and Ophthalmology (AIVO) provides a forum for interdisciplinary approaches integrating techniques from artificial intelligence, mathematics, computer science, engineering and experimental and clinical sciences to address open problems in ophthalmology.</p> <p>AIVO uses the Continuous Article Publication (CAP) model. Articles are published as soon as they are ready. </p> <p>Read more about AIVO's <a title="AIVO Focus &amp; Scope" href="https://www.aivojournal.com/index.php/AIVO/about/#focusAndScope" target="_blank" rel="noopener">focus and scope</a>.<br /><a href="https://www.aivojournal.com/index.php/AIVO/issue/archive">See all issues here</a></p> <p style="text-align: center;"> </p> https://www.aivojournal.com/index.php/AIVO/article/view/157 Multimodal large language models for use in diabetic retinopathy screening 2026-03-12T06:38:39+00:00 S. Saeed Mohammadi s.saeed.mohammadi@gmail.com Sahana Aggarwal saggarwa@lsoc.org Kavina Aggarwal kaggarwa@lsoc.org Grant Wiarda grant.wiarda@northwestern.edu Kayla Nguyen nghgkayla@gmail.com Emmanuel A. Sarmiento angelo.sarmiento@icloud.com Quan Nguyen ndquan@stanford.edu Manjot K. Gill mgill@nm.org <p><em><strong>Purpose:</strong></em> To evaluate the performance o f ChatGPT-4o and Gemini 2.5 Pro in detecting more-than-mild diabetic retinopathy (mtmDR) from fundus photography (FP) and diabetic macular edema (DME) from optical coherence tomography (OCT) using publicly available datasets.</p> <p><em><strong>Methods:</strong> </em>A custom GPT (powered by ChatGPT-4o) was created and instructed to follow the LumineticsCoreā„¢ (IDx-DR) screening criteria for mtmDR, defined as an ETDRS level ≄ 35 and/or clinically significant diabetic macular edema (CSDME). Gemini 2.5 Pro was evaluated with the same criteria. Performance on FPs was assessed using 2 publicly available datasets: MESSIDOR-2 (<em>n</em> = 106; 66 mtmDR, 40 no mtmDR) and EyePACS (<em>n</em> = 99; 56 mtmDR, 43 non-mtmDR). To assess detection of DME, a separate OCT dataset (<em>n</em> = 48; 24 DME, 24 normal) was used to evaluate identification of intraretinal and/or subretinal fluid. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for detecting mtmDR on FP and DME on OCT were calculated for each multimodal large language model (LLM).</p> <p><em><strong>Results:</strong></em> On MESSIDOR-2 (n = 106), ChatGPT-4o achieved a sensitivity of 90.77%, specificity of 97.50%, PPV of 98.33%, and NPV of 86.67% for mtmDR detection. Gemini 2.5 Pro achieved a sensitivity of 80.30%, specificity of 97.50%, PPV of 98.15%, and NPV of 75.00%. On EyePACS (<em>n</em> = 99), ChatGPT-4o demonstrated a sensitivity of 94.64%, specificity of 86.05%, PPV of 89.83%, and NPV of 92.50%, while Gemini 2.5 Pro achieved a sensitivity of 89.29%, specificity of 88.37%, PPV of 90.91%, and NPV of 86.36%. For OCT-based DME detection (<em>n</em> = 48), ChatGPT-4o achieved a sensitivity of 95.83%, specificity of 100%, and PPV of 100%, while Gemini 2.5 Pro achieved a sensitivity of 95.83%, specificity of 95.65%, PPV of 95.83%, and NPV of 95.65%.</p> <p><em><strong>Conclusion:</strong> </em>ChatGPT-4o and Gemini 2.5 Pro demonstrated high performance in detecting mtmDR and DME across multiple publicly available datasets. These findings support the potential of multimodal LLMs as cost-effective and accessible tools for diabetic retinopathy screening. Further validation in larger, more diverse real-world datasets is warranted.</p> 2026-03-11T00:00:00+00:00 Copyright (c) 2026 S. Saeed Mohammadi, Sahana Aggarwal, Kavina Aggarwal, Grant Wiarda, Kayla Nguyen, Emmanuel A. Sarmiento, Quan Nguyen, Manjot K. Gill