Integrative machine learning model for overall survival prediction in breast cancer using clinical and transcriptomic data

dc.contributor.authorKıvrak, Mehmet
dc.contributor.authorNalkıran, Hatice Sevim
dc.contributor.authorKesen, Oğuzhan
dc.contributor.authorNalkıran, İhsan
dc.date.accessioned2025-12-09T13:35:06Z
dc.date.issued2025
dc.departmentRTEÜ, Tıp Fakültesi, Temel Tıp Bilimleri Bölümü
dc.departmentRTEÜ, Tıp Fakültesi, Dahili Tıp Bilimleri Bölümü
dc.description.abstractBreast cancer is the most common malignancy in women, with the Luminal A subtype generally associated with favorable survival. However, age and menopausal status may influence tumor biology and prognosis. To improve prediction beyond conventional models, we analyzed transcriptomic and clinical data from the METABRIC cohort. Patients with Luminal A breast cancer were stratified into premenopausal, postmenopausal-nongeriatric, and geriatric (>= 70 years) groups. Differentially expressed genes (DEGs) were identified, and Boruta feature selection revealed 27 clinical and genomic variables. Random Forest, Logistic Regression, Multilayer Perceptron, and ensemble XGBoost models were trained with stratified 5-fold cross-validation, using SMOTE to correct class imbalance. Principal component analysis showed distinct clustering across age groups, while DEG analysis revealed 41 genes associated with age and survival. Key predictors included clinical variables (age, tumor size, NPI, radiotherapy) and molecular markers (ATM, HERC2, AKT2, FOXO3, CYP3A43). Among ML models, XGBoost demonstrated the highest performance (accuracy 98%, sensitivity 98%, specificity 97%, F1-score 0.99, AUC 0.86), outperforming other algorithms. These findings indicate that age-related transcriptomic changes impact survival in Luminal A breast cancer and that an ML-based integrative approach combining clinical and molecular variables provides superior prognostic accuracy, supporting its potential for clinical application.
dc.identifier.citationKivrak, M., Sevim Nalkiran, H., Kesen, O., & Nalkiran, I. (2025). Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data. Biology, 14(11), 1539. https://doi.org/10.3390/biology14111539
dc.identifier.doi10.3390/biology14111539
dc.identifier.issn2079-7737
dc.identifier.issue11
dc.identifier.startpage1539
dc.identifier.urihttps://doi.org/10.3390/biology14111539
dc.identifier.urihttps://hdl.handle.net/11436/11670
dc.identifier.volume14
dc.identifier.wosWOS:001624100100001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.institutionauthorKıvrak, Mehmet
dc.institutionauthorNalkıran, Hatice Sevim
dc.institutionauthorKesen, Oğuzhan
dc.institutionauthorNalkıran, İhsan
dc.language.isoen
dc.publisherMultidisciplinary Digital Publishing Institute (MDPI)
dc.relation.ispartofBiology- Basel
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectluminal A breast cancer
dc.subjectmachine learning
dc.subjectgene expression
dc.subjectage
dc.subjectsurvival prediction
dc.subjectXGBoost
dc.titleIntegrative machine learning model for overall survival prediction in breast cancer using clinical and transcriptomic data
dc.typeArticle

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
kıvrak-2025.pdf
Boyut:
3.42 MB
Biçim:
Adobe Portable Document Format

Lisans paketi

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: