Genetic Algorithm and Tabu Search for Feature Selection
Sabra EL FERCHICHI, Kaouther LAABIDI
LACS, ENIT Tunis
LAGIS, USTL Lille
Abstract: In this paper we propose a wrapper approach to select features involving the Support Vector Machines (SVM) combined with a metaheuristic optimization algorithm: Tabu Search and Genetic Algorithms. Feature selection is efficient in searching the most descriptive features which would contribute in increasing the performance of the inductive algorithm by reducing dimensionality and processing time. The process we propose is based on the use of the rate of misclassification as an evaluating criterion. First, we used the tabu algorithm to guide the search of the optimal set of features; then a genetic algorithm is implemented to reach the same goal. This procedure is applied on data from regulation of urban transport network systems. A comparison between the performances of each search engine (TS and GAs) used is then presented.
Keywords: Features Selection, Support Vector Machines, Tabu Search, Genetic algorithm and urban transport regulation.
Sabra El Ferchichi received the engineering degree in Electrical Engineering and the M.S degree in Control and Signal Processing from the National School Engineering of Tunis in 2007 and 2008. Where, she is currently pursuing her Ph.D. degree in Department of Electrical Engineering at National School Engineering of Tunis. Her research interests include Feature extraction and selection, learning machines, metaheuristic algorithms.
Kaouther Laabidi was born in Tunis, Tunisia. She received the Master degree from the “Ecole Suprieur de Sciences et de Technologie de Tunis”, in 1995 and the Ph.D degree in Genie Electique from the “Ecole Nationale d’Ingnieurs de Tunis”, in 2005. She is currently preparing the ability degree in the laboratory ACS “Analyse et Commande des Systmes”.His research is related to the Ide ntification and control of complex systems.
Salah Zidi was born in Gafsa, Tunisia, in 1980. In 2003, he received the B.E. degree in electrical engineering from the National School of Engineers of Sfax, Tunisia. He received the D.E.A. degree in new technologies of dedicated computing systems from the National School of Engineers of Sfax, Tunisia, in 2004. He received the Ph.D. degrees in automatic control from the University of Lille, Lille, France, in 2007. He is currently a teacher and researcher at the University of Lille. Her research interests include transport, reconfiguration, scheduling, ant colony algorithms, and decision-support systems.
CITE THIS PAPER AS:
Sabra EL FERCHICHI, Kaouther LAABIDI, Salah ZIDI, Genetic Algorithm and Tabu Search for Feature Selection, Studies in Informatics and Control, ISSN 1220-1766, vol. 18 (2), pp. 181-187, 2009.
Dealing with high dimensional data reveals of computational time’s problem. Thus, feature dimensionality reduction is of a considerable importance to treat this problem and to improve the classifier’s performance.
To address this problem, Feature Selection (FS) tries to find an optimal subset of features from the original set which contains the most pertinent ones taking into account regards to the classification task . In this context, two main directions exist; they are based on the evaluation subset’ criterion: the first is the filter approach; the second is the wrapper approach . Filter method’ principle is based on sorting variables according to their score, computed with the use of different criteria. The features with low scores are then eliminated. Wrapper methods rely on the classification’ accuracy to determine the most optimal subset of features: the search for the optimal subset makes use of the classification algorithm itself to evaluate candidate subsets . Other method implements the same concept as the wrapper is called embedded methods. It involves knowledge about the structure of the classifier on the selection process, there is no more separation between classification and selection task .
The wrapper approach requires a search space, operators, search engine, and an evaluation function . The search space in our case is composed of a large set of candidate subsets of features, which makes the evaluation of all possible subsets quite impractical. As a result, the exhaustive search has been replaced by heuristic search strategies in an attempt to avoid this high-priced complexity. This solution is computationally feasible and represents a trade-off balance between solution quality and processing time as argued in face recognition  and image annotation field . within this work, we propose a feature selection algorithm involving an SVM-driven tabu search and genetic algorithm. Support Vector Machines ‘SVM’ has been applied to a variety of areas and it has shown a great performance and success in areas such as Biomedical data [7, 9], speech recognition . The classification accuracy is used as an evaluation function to access feature’ relevance.
The paper is organised as follows. In section 2, we briefly introduce the Tabu Search and the Genetic Algorithm. Support Vector Machines and our wrapper approach to feature selection are presented in the section 3. Section 4 is reserved to simulation’ results on data set concerning the regulation of urban transport networks, comparing SVM-driven Tabu Search and SVM-driven GAs. Finally, a conclusion to our work is developed.
Ameliorating classification accuracy is the main concern of learning machines. The feature selection algorithm aims at resolving this issue by searching for an optimal subset of features from the original one. Selecting the most pertinent feature would reduce the processing time and reduce the dimensionality of data.
We have proposed in this work a wrapper approach based on Support Vector machines as an inductive algorithm. Since the search space is very large, non conventional approaches would present an acceptable solution to avoid this prohibitive complexity. We have implemented in a first time SVM driven by tabu search, then SVM driven by genetic algorithm.
As our principle concern is the classifier’s accuracy, reducing dimensionality of data is one way to get to our target. In that progress, as Genetic Algorithms performs previous solution in an iterative way, it gets to the optimal solution much faster than the tabu search, albeit that TS gain on dimensionality (it could attend the dimension 17). Tabu Search has also the priority to get to a diversity of solutions for the same rate of misclassification.
- Badea, Liviu, Simultaneous Feature selection and Clustering for Gene Expression Data using Nonnegative Matrix Factorization with offset, Studies in Informatics and Control, Vol. 16, No. 4, 2007.
- Kohavi, Ron and George. H. John, Wrapper for feature subset selection, ELSIEVER Artificial Intelligence, Vol. 97, No. 2-1, 1997, pp.273-324.
- Guyon, Isabelle and André Elisseeff, An Introduction to Variable and Feature selection, Journal of Machine Learning Research, Vol.3, Special Issue, March 2003, pp. 1157-1182.
- Lal, T. N., O. Chapelle, J. Weston and A. Elisseeff, Embedded methods, Feature Extraction: Foundations and Applications, 2006, Berlin, Germany, pp.137-165.
- Hamidreza Rashidy Kanan, Karim Faez, An improved feature selection method based on Ant Colony Optimization (ACO) evaluated on face recognition system, ELSIEVER Applied Mathematics and Computation, Vol. 205, No. 2, 2008, pp.716-725.
- Jianjiang Lu, Tianzhong Zhao, Yafei Zhang, Feature selection based-on genetic algorithm for image annotation, ELSIEVER Knowledge-Based Systems, Vol.21, No. 8, 2008, pp. 887-891.
- Luyin Zhao, Lilla Boroczky, K.P. Lee, False positive reduction for lung nodule CAD using support vector machines and genetic algorithms, ELSIEVER International Congress Series, Vol.1281, May 2005, pp. 1109-1114.
- SALOMON, Jesper, Support Vector Machines for Phomene classification, Master of Science School of Artificial Intelligence Division of Informatics University of Edinburgh, 2001.
- Zhou, Nina and Lipo Wang, A Novel Vector Machine with class-dependent Features for Biomedical Data, IEEE International Conference on Systems, Man, and Cybernetics, Vol.2, October 8-11, 2006,Taipei,Taiwan, pp.1666-1670.
- Dréo, J. A. Pétrowski, P. Siarry, E. Taillard, Métaheuristiques pour l’optimisation difficile, Editions EYROLLES, 2003.
- Ayas, Jean et Marc André Viau, la recherche Tabou, cours Polytechnique Montréal, Novembre 2004.
- Vapnik, V.N., An Overview of Statistical Learning Theory, IEEE transactions on Neural Networks, Vol.10, No.5, September 1999, pp. 988-999.
- Hsu, Chih-Wei and Chih Jen lin, A Comparison of Methods for Multiclass Support Vector machines, IEEE Transactions on Neural Networks, Vol.13, No.2, March 2002, pp. 415-425.