Past Issues

Studies in Informatics and Control
Vol. 12, No. 1, 2003

Classification and Feature Selection of Breast Cancer Data based on Decision Tree Algorithm

Aboul Ella Hassanien
Abstract

Medical information systems have received a lot of research attention in the past. As a result of advances in hardware and software technologies, the nature of medical information systems has changed from only performing record keeping functions to more decision making oriented functionalities. Lage collections of medical data are valuable resource from which potentially new and useful knowledge can be discovered through data mining. Data mining is an increasingly popular field that uses statistical visualization, machine learning and other data manipulation and knowledge extraction techniques aiming at gaining an insight into the relationships and patterns hidden in the data. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we introduce an efficient symbolic machine learning algorithm to identify the important breast cancer attributes needed for interpretalion. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. In addition, among all features, we use only the subset of features that leads to the best performance. The proposed technique is evaluated using real data of 699 samples for building the decision tree. Evaluation shows that the ratio of correct classification of new cases is high.

Keywords

Machine learning, decision tree, data mining, feature selection and classification

View full article