Bosniak classification of renal cysts using large language models: a comparative study
| dc.contributor.author | Hacıbey, İbrahim | |
| dc.contributor.author | Kaba, Esat | |
| dc.date.accessioned | 2025-11-17T07:41:18Z | |
| dc.date.issued | 2025 | |
| dc.department | RTEÜ, Tıp Fakültesi, Dahili Tıp Bilimleri Bölümü | |
| dc.description.abstract | Background: The Bosniak classification system is widely used to assess malignancy risk in renal cystic lesions, yet inter-observer variability poses significant challenges. Large language models (LLMs) may offer a standardized approach to classification when provided with textual descriptions, such as those found in radiology reports. Objective: This study evaluated the performance of five LLMs-GPT-4 (ChatGPT), Gemini, Copilot, Perplexity, and NotebookLM-in classifying renal cysts based on synthetic textual descriptions mimicking CT report content. Methods: A synthetic dataset of 100 diagnostic scenarios (20 cases per Bosniak category) was constructed using established radiological criteria. Each LLM was evaluated using zero-shot and few-shot prompting strategies, while NotebookLM employed retrieval-augmented generation (RAG). Performance metrics included accuracy, sensitivity, and specificity. Statistical significance was assessed using McNemar's and chi-squared tests. Results: GPT-4 achieved the highest accuracy (87% zero-shot, 99% few-shot), followed by Copilot (81-86%), Gemini (55-69%), and Perplexity (43-69%). NotebookLM, tested only under RAG conditions, reached 87% accuracy. Few-shot learning significantly improved performance (p< 0.05). Classification of Bosniak IIF lesions remained challenging across models. Conclusion: When provided with well-structured textual descriptions, LLMs can accurately classify renal cysts. Few-shot prompting significantly enhances performance. However, persistent difficulties in classifying borderline lesions such as Bosniak IIF highlight the need for further refinement and real-world validation. | |
| dc.identifier.citation | Hacibey, I., & Kaba, E. (2025). Bosniak classification of renal cysts using large language models: a comparative study. Bosniak-Klassifikation von Nierenzysten unter Verwendung von Large-Language-Modellen: Vergleichsstudie. Radiologie (Heidelberg, Germany), 10.1007/s00117-025-01499-x. Advance online publication. https://doi.org/10.1007/s00117-025-01499-x | |
| dc.identifier.doi | 10.1007/s00117-025-01499-x | |
| dc.identifier.issn | 2731-7048 | |
| dc.identifier.issn | 2731-7056 | |
| dc.identifier.pmid | 40851045 | |
| dc.identifier.uri | https://doi.org/10.1007/s00117-025-01499-x | |
| dc.identifier.uri | https://hdl.handle.net/11436/11487 | |
| dc.identifier.wos | WOS:001556457600001 | |
| dc.identifier.wosquality | Q4 | |
| dc.indekslendigikaynak | Web of Science | |
| dc.indekslendigikaynak | PubMed | |
| dc.institutionauthor | Kaba, Esat | |
| dc.language.iso | en | |
| dc.publisher | Springer Heidelberg | |
| dc.relation.ispartof | Die Radiologie | |
| dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.subject | Bosniak classification | |
| dc.subject | Few-shot learning | |
| dc.subject | Large language models | |
| dc.subject | Renal cysts | |
| dc.subject | Synthetic data | |
| dc.title | Bosniak classification of renal cysts using large language models: a comparative study | |
| dc.type | Article |











