Classifier Evaluation for Software Defect Prediction

Gang Kou
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, CHINA

Yi Peng
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, CHINA

Yong Shi
College of Information Science & Technology, University of Nebraska at Omaha
Omaha, NE 68182, USA

Wenshuai Wu
School of Management and Economics, University of Electronic Science and Technology of China
No.4, Sec 2, North Jianshe Rd, Chengdu, 610054, CHINA

Abstract: Feature selection is an essential step in the process of software defect prediction due to the negative effect of irrelevant features on classification algorithms. Hence selecting the most relevant and representative features is critical to the success of software defect detection. Another problem in software defect prediction is the availability of a large number of classification models. This paper applies feature selection and classifier evaluation in the context of software defect prediction. An empirical study is presented to validate the proposed scheme using 9 classifiers over 4 public domain software defect data sets. The results indicate that the proposed scheme can improve the performance of classifiers using the most representative features and recommend classifiers that are accurate and reliable in software defect prediction.

Keywords: Software defect detection; classifier evaluation; multi-criteria decision making (MCDM).

>>Full text
CITE THIS PAPER AS:
Gang KOU, Yi PENG, Yong SHI, Wenshuai WU, Classifier Evaluation for Software Defect Prediction, Studies in Informatics and Control, ISSN 1220-1766, vol. 21 (2), pp. 117-126, 2012. https://doi.org/10.24846/v21i2y201201

1. Introduction

Defects are prevalent in large and complex software systems and cause huge losses to organizations [1-2]. Timely and accurate software defect prediction can help identify faults in an early stage of software development lifecycle, which facilitates efficient test resource allocation, improves software architecture design, and reduces the number of defective modules [3]. Software defects prediction with high accuracy and reliability is a challenge and active research area.

Classification is one of the most important tasks in data mining [4] and is a commonly used approach in software defect prediction. It models software defects prediction as a two-group classification problem through categorizing software modules as either fault-prone (fp) or non-fault-prone (nfp) using historical data. A large number of classification algorithms have been developed over the years for software defect prediction [5-7].

Software defect data sets normally collect large number of attributes to describe the characteristics of software modules at various states of the software development process. Since the attributes collected in software defect data may not be relevant to software defects classification, including all these attributes in the model-building process can deteriorate the performances of classifiers. Thus feature subset selection is an essential step in the process of software defect prediction.

This paper integrates traditional feature selection methods and multi-criteria decision making (MCDM) methods to improve the accuracy and reliability of defect prediction models and evaluate the performances of software defect detection models. An experimental study is designed to validate the propose scheme using 9 classifiers over 4 public domain software defect data sets.

The rest of this paper is organized as follows: section 2 reviews related works. Section 3 describes the research methodologies. Section 4 presents the experimental study and analyzes the results; section 5 summarizes the paper.

References:

FILIP, F. G., Decision Support and Control for Large-scale Complex Systems, Annual Reviews in Control, Elsevier, vol. 32, no. 1, 2008, pp. 61-70.
FILIP, F. G., K. LEIVIISKA, Large-scale Complex Systems, Springer Handbook of Automation, Springer, Berlin, 2009, pp. 619-638.
LESSMANN, S., B. BAESENS, C. MUES, S. PIETSCH, Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings, IEEE Transactions on Software Engineering, vol. 34, no. 4, 2008, pp. 485-496.
PENG, Y., G. KOU, Y. SHI, Z. CHEN, A Descriptive Framework for The Field of Data Mining and Knowledge Discovery, International Journal of Information Technology and Decision Making, vol. 7, no. 4, 2008, pp. 639-682.
KOU, G., Y. PENG, Z. CHEN, Y. SHI, Multiple Criteria Mathematical Programming for Multi-class Classification and Application in Network Intrusion Detection, Information Sciences, vol. 179, no. 4, 2009, pp. 371-381.
PENG, Y., G. KOU, G. WANG, H. WANG, F. KO, Empirical Evaluation of Classifiers For Software Risk Management, International Journal of Information Technology and Decision Making, vol. 8, no. 4, 2009, pp. 749 -768.
PENG, Y., G. WANG, H. WANG, User Preferences based Software Defect Detection Algorithms Selection using MCDM, Information Sciences, doi:10.1016/j.ins.2010.04.019, 2010.
RODRIGUEZ, D., R. RUIZ, J. CUADRADO-GALLEGO, J. AGUILAR-RUIZ, M. GARRE, Attribute Selection in Software Engineering Datasets for Detecting Fault Modules, EUROMICRO, 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO 2007), 2007, pp. 418-423
PORTER, A. A., R. W. SELBY, Evaluating Techniques for Generating Metric-Based Classification Trees, Journal of Systems and Software, vol. 12, no. 3, 1990, pp. 209-218.
EMAM, K. E., S. BENLARBI, N. GOEL, S. N. RAI, Comparing Case-based Reasoning Classifiers for Predicting High Risk Software Components, Journal of Systems and Software, vol. 55, no. 3, 2001, pp. 301-310.
KHOSHGOFTAAR, T. M., N. SELIYA, Analogy-Based Practical Classification Rules for Software Quality Estimation, Empirical Software Eng., vol. 8, no. 4, 2003, pp. 325-350.
MYRTVEIT, I., E. STENSRUD, M. SHEPPERD, Reliability and Validity in Comparative Studies of Software Prediction Models, IEEE Transcriptions Software Engineering, vol. 31, no. 5, 2005, pp. 380-391.
SHEPPERD, M. J., C. SCHOFIELD, Estimating Software Project Effort Using Analogies, IEEE Transcriptions Software Engineering, vol. 23, no. 12, 1997, pp. 736-743.
MYRTVEIT, I., E. STENSRUD, A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models, IEEE Transcriptions Software Engineering, vol. 25, no. 4, 1999, pp. 510-525.
ERGU, D., G. KOU, Y. PENG, Y. SHI, A Simple Method to Improve the Consistency Ratio of the Pair-wise Comparison Matrix in ANP, European Journal of Operational Research, vol. 213, no. 1, 2011, pp. 246-259.
ERGU, D., G. KOU, Y. SHI, Y. SHI, Analytic Network Process in Risk Assessment and Decision Analysis, Computers & Operations Research, doi:10.1016/j.cor.2011.03.005, 2011.
KOU, G., Y. SHI, S. Y. WANG, Multiple Criteria Decision Making and Decision Support Systems – Guest editor’s introduction, DOI:10.1016/j.dss.2010.11.027, Decision Support Systems, vol. 51, no. 2, 2011, pp. 247-249.
PENG, Y., G. KOU, G. WANG, Y. SHI, FAMCDM: A Fusion Approach of MCDM Methods to Rank Multiclass Classification Algorithms, Omega, vol. 39, no. 6, 2011, pp. 677-689.
CHAPMAN, M., P. CALLIS, W. JACKSON, Metrics Data Program, NASA IV and V Facility, http://mdp.ivv.nasa.gov/, 2004.
PENG, Y., G. KOU, G. WANG, W. WU, Y. SHI, Ensemble of Software Defect Predictors: An AHP-based Evaluation Method, International Journal of Information Technology & Decision Making, vol. 10, no. 1, 2011, pp. 187-206.