Towards feature selection for digital mammogram classification (2021)

Abstract

The most common cancer type amongst women is the breast cancer with a large number of cases reported each year, many of them diagnosed in an advanced phase. In this paper our scope is to create a base for a support system that helps detecting breast cancer in an early stage. After defining the region of interest (ROI) and segmenting the result image (using k-means algorithm), Gray-Level Run-Length Matrices (GLRLM) features are extracted from both the ROI and from the segmented image in four directions (horizontal, vertical, first- and second diagonals). To reduce the dimensionality of the input data composed from the GLRLM features of the ROI and its segmentation for different combination of directions (removing redundant information, selecting just the most essential ones) two methods are used: Principal Component Analysis (PCA), and genetic algorithm (GA) feature selection. For classification, two methods are used and compared, namely Decision Trees (DT) and Random Forest (RF). For experiments we used the Mammographic Image Analysis Society (MIAS) dataset to train and to test the classifiers. The best performance is obtained for GLRLM features calculated for directions 45, and 90, using PCA feature selection and RF with a 100% training accuracy and 70% test accuracy.

Citare

Bajcsi A, Andreica, A-M, Chira C, Towards feature selection for digital mammogram classification, Procedia Computer Science, Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES2021, volume 192, pages 632-641

Leave a Reply

Your email address will not be published. Required fields are marked *