Bosniak classification of renal cysts using large language models: a comparative study

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer Heidelberg

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Background: The Bosniak classification system is widely used to assess malignancy risk in renal cystic lesions, yet inter-observer variability poses significant challenges. Large language models (LLMs) may offer a standardized approach to classification when provided with textual descriptions, such as those found in radiology reports. Objective: This study evaluated the performance of five LLMs-GPT-4 (ChatGPT), Gemini, Copilot, Perplexity, and NotebookLM-in classifying renal cysts based on synthetic textual descriptions mimicking CT report content. Methods: A synthetic dataset of 100 diagnostic scenarios (20 cases per Bosniak category) was constructed using established radiological criteria. Each LLM was evaluated using zero-shot and few-shot prompting strategies, while NotebookLM employed retrieval-augmented generation (RAG). Performance metrics included accuracy, sensitivity, and specificity. Statistical significance was assessed using McNemar's and chi-squared tests. Results: GPT-4 achieved the highest accuracy (87% zero-shot, 99% few-shot), followed by Copilot (81-86%), Gemini (55-69%), and Perplexity (43-69%). NotebookLM, tested only under RAG conditions, reached 87% accuracy. Few-shot learning significantly improved performance (p< 0.05). Classification of Bosniak IIF lesions remained challenging across models. Conclusion: When provided with well-structured textual descriptions, LLMs can accurately classify renal cysts. Few-shot prompting significantly enhances performance. However, persistent difficulties in classifying borderline lesions such as Bosniak IIF highlight the need for further refinement and real-world validation.

Açıklama

Anahtar Kelimeler

Bosniak classification, Few-shot learning, Large language models, Renal cysts, Synthetic data

Kaynak

Die Radiologie

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Hacibey, I., & Kaba, E. (2025). Bosniak classification of renal cysts using large language models: a comparative study. Bosniak-Klassifikation von Nierenzysten unter Verwendung von Large-Language-Modellen: Vergleichsstudie. Radiologie (Heidelberg, Germany), 10.1007/s00117-025-01499-x. Advance online publication. https://doi.org/10.1007/s00117-025-01499-x

Onay

İnceleme

Ekleyen

Referans Veren