Machine learning in tumor classification in breast cancer

Lima, Ana Sofia; Coutinho, Carolina; Machado, Raquel; Oliveira, Alexandra Alves; Faria, Brígida Mónica; Faria, Brigida Monica; Oliveira, Alexandra

http://hdl.handle.net/10400.22/30955

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
POSTER_Raquel Machado.pdf		300.88 KB	Adobe PDF	Download

Send Feedback

Authors

Lima, Ana Sofia

Coutinho, Carolina

Machado, Raquel

Oliveira, Alexandra Alves

Faria, Brígida Mónica

Faria, Brigida Monica

Oliveira, Alexandra

Abstract(s)

Breast cancer is the primary cause of mortality among women worldwide (1). Discernible patterns can be found within the disease, presenting an opportunity for the application of machine learning (ML), garnering effective results in screening and diagnosis. Different ML algorithms were tested - Decision Tree, Deep Learning (DL), k-Nearest Neighbors (k-NN) and Naïve Bayes - to construct a predictive model allowing the early classification of a breast tumor as benign or malignant, avoiding the need to proceed to a more invasive technique. The ML models were constructed and applied to a database of 201 individuals with breast cancer and descriptive attributes (e.g. age, tumor size, presence of invasive nodes) (2) by using RapidMiner Studio. The evaluation of the models was done by analyzing their accuracy, true negative (TNR) and true positive rates (TPR), their ROC (Receiver Operating Characteristic) curves and AUC (Area Under Curve). During a first exploratory phase, fours clusters were detected: smaller tumor sizes, younger patients, and a benign diagnosis; older age, bigger tumor sizes and a malignant diagnosis; and two more with the opposite characteristics. These characteristics were later found to be important factors in the construction of the Decision Tree. When comparing the models accuracy, the best model was Naïve Bayes (91.04%), followed by the Decision Tree (90.55%), DL (90.02%) and k-NN (86.32%). There is a statistically significant difference between the performances of every model (p<0.05) except between the DL and the Decision Tree models. Naïve Bayes presented the highest TPR (98.21%) while DL presented the highest TNR (83.15%). The Decision Tree model presented the highest AUC (0.976), followed by Naïve Bayes (0.961). The Decision Tree model best achieved our goal by having the highest AUC which denotes an exceptional sensitivity rate, surpassing Naïve Bayes while maintaining a similar accuracy and TNR.

Keywords

Machine learning Predictive models Breast cancer

URI

http://hdl.handle.net/10400.22/30955

Citation

Lima, A. S., Coutinho, C., Machado, R., Oliveira, A. A., & Faria, B. M. (2024). Machine learning in tumor classification in breast cancer. Proceedings of the 1st Symposium on Biostatistics and Bioinformatics Applied to Health, 19–20. https://recipp.ipp.pt/entities/publication/a634fd4f-6053-47fa-8145-4f876572cba7

Publisher

ESS | P. PORTO Edições

Collections

ESS - BBB - Posters apresentados em eventos científicos

CC License

Without CC licence

Full item page