An Integrated Feature Selection and Classification Scheme

Yi Peng
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, China

Gang Kou
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, China

Daji Ergu
Southwest University for Nationalities
Chengdu, 610200, China

Wenshuai Wu
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, China

Yong Shi
CAS Research Center on Fictitious Economy and Data Sciences
Beijing 100080, China

Abstract: Irrelevant and redundant features may not only deteriorate the performances of classifiers, but also slow the prediction process. Another problem in prediction is the availability of a large number of classification models. How to choose a satisfactory classifier is an important yet understudied task. The goal of this paper is to propose an integrated scheme for feature selection and classifier evaluation in the context of prediction. It combines traditional feature selection techniques and multi-criteria decision making (MCDM) methods in an attempt to increase the accuracies of classification models and identify appropriate classifiers for different types of data sets.

Keywords:

Multi-criteria decision making (MCDM); feature selection; classification.

>>Full text
CITE THIS PAPER AS:
Yi PENG, Gang KOU, Daji ERGU, Wenshuai WU, Yong SHI, An Integrated Feature Selection and Classification Scheme, Studies in Informatics and Control, ISSN 1220-1766, vol. 21 (3), pp. 241-248, 2012. https://doi.org/10.24846/v21i3y201202

1. Introduction

Classification is one of the most important tasks in data mining [1] and is a commonly used approach in prediction. Research results reported on the performances of classification models diverge considerably [2-4]. How to choose a satisfactory classifier is an important yet understudied task.

Including all these attributes in the model-building process can deteriorate the performances of classifiers. In addition, high dimensionality can slow down the prediction process. Thus feature subset selection, which aims at selecting the most relevant and representative attributes to increase accuracy rates, is an essential step in the process of prediction.

This paper focuses on two issues in prediction: feature subset selection and classification algorithm evaluation. It proposes a research scheme that integrates traditional feature selection methods and multi-criteria decision making (MCDM) methods to improve the accuracy and reliability of prediction models.

The rest of this paper is organized as follows: section 2 reviews related works. Section 3 describes the research methodologies, including the research design, feature selection methods, and MCDM methods. Section 4 summarizes the paper.

References:

PENG, Y., G. KOU, Y. SHI, Z. CHEN, A Descriptive Framework for the Field of Data Mining and Knowledge Discovery, International Journal of Information Technology and Decision Making, vol. 7, no. 4, 2008, pp. 639-682.

MYRTVEIT, I., E. STENSRUD, M. SHEPPERD, Reliability and Validity in Comparative Studies of Software Prediction Models, IEEE Transactions on Software Engineering, vol. 31, no. 5, 2005, pp. 380-391.
PENG, Y., G. KOU, G. WANG, H. WANG, F. KO, Empirical Evaluation of Classifiers for Software Risk Management, International Journal of Information Technology and Decision Making, vol. 8, no. 4, 2009, pp. 749-768.
PENG, Y., G. WANG., H. WANG, User Preferences based Software Defect Detection Algorithms Selection using MCDM, Information Sciences, doi:10.1016/j.ins.2010.04.019, 2010.
WITTEN, I. H., E. FRANK, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
RICE, J., The Algorithm Selection Problem, Advances in Computers, vol. 15, 1976, pp. 65-118.
SMITH-MILES, K. A., Cross-Disciplinary Perspectives on Meta-learning for Algorithm Selection, ACM Computing Surveys, vol. 41, no. 1, December 2008.
NAKHAEIZADEH, G., A. SCHNABL, Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms, Proceeding of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, California, August 14-17, 1997, pp. 37-42.
ROKACH, L., Ensemble-based Classifiers, Artificial Intelligence Review, vol. 33, no. 1-2, 2010, pp. 1-39.
ERGU, D., G. KOU, Y. PENG, Y. SHI, A Simple Method to Improve the Consistency Ratio of the Pair-wise Comparison Matrix in ANP, European Journal of Operational Research, vol. 213, no. 1, 2011, pp. 246-259.
ERGU, D., G. KOU, Y. SHI, Y. SHI, Analytic Network Process in Risk Assessment and Decision Analysis, Computers & Operations Research, doi:10.1016/j.cor.2011.03.005. 2011b
KOU, G., Y. SHI, S. Y. WANG, Multiple Criteria Decision Making and Decision Support Systems – Guest Editor’s Introduction, DOI:10.1016/j.dss.2010.11.027, Decision Support Systems, vol. 51, no. 2, 2011, pp. 247-249.
PENG, Y., G. KOU, G. WANG, Y. SHI, FAMCDM: A Fusion Approach of MCDM Methods to Rank Multiclass Classification Algorithms, Omega, vol. 39, no. 6, 2011, pp. 677-689.
CHARNES, A., W. W. COOPER, E. RHODES, Measuring the Efficiency of Decision Making Units, European Journal of Operational Research, vol. 2, no. 6, 1978, pp. 429-444.
COOPER, W. W., L. M. SEIFORD, J. ZHU, Data Envelopment Analysis: History, Models and Interpretations, Chapter 1, 1-39, in Cooper, W. W. Seiford, L. M., Zhu, J. (eds), Handbook on Data Envelopment Analysis, Kluwer Academic Publisher, Boston, 2004.
ROY, B., Classement et choix en presence de points de vue multiples (la methode ELECTRE) R.I.R.O, vol. 8, 1968, pp. 57-75.
BRANS, J. P., B. MARESCHAL, PROMETHEE Methods, In Multiple Criteria Decision Analysis: State of the Art Surveys, J. Figueira, V. Mousseau and B. Roy (eds.), Springer, New York, 2005, pp. 163-195.
HWANG, C. L., K. YOON, Multiple Attribute Decision Making Methods and Applications, Springer, Berlin Heidelberg, 1981.
OPRICOVIC, S., G. H. TZENG, Compromise Solution by MCDM Methods: A Comparative Analysis of VIKOR and TOPSIS, European Journal of Operational Research, vol. 156, 2004, pp. 445-455.
OPRICOVIC, S., Multicriteria Optimization of Civil Engineering Systems. Faculty of Civil Engineering, Belgrade, 1998.
KOU, G., C. LOU, Multiple Factor Hierarchical Clustering Algorithm for Large Scale Web Page and Search Engine Clickstream Data, DOI: 10.1007/s10479-010-0704-3, Annals of Operations Research, 2010.
KOU, G., Y. LU, Y. PENG, Y. SHI, Evaluation of Classification Algorithms using MCDM and Rank Correlation, International Journal of Information Technology & Decision Making, vol. 11, no. 1, 2012, pp. 197-225.
PENG, Y., G. KOU, G. WANG, W. WU, Y. SHI, Ensemble of Software Defect Predictors: An AHP-based Evaluation Method, International Journal of Information Technology & Decision Making, vol. 10, no. 1, 2011, pp. 187-206.
PENG, Y., G. WANG, G. KOU, Y. SHI, An Empirical study of Classification Algorithm Evaluation in Financial Risk Management, Applied Soft Computing, vol. 11, no. 2, 2011, pp. 2906–2915.