Monday , June 18 2018

A Novel Approach Using Fuzzy Self-Organizing Maps
for Detecting Software Faults

Istvan-Gergely CZIBULA, Gabriela CZIBULA,
Zsuzsanna MARIAN, Vlad-Sebastian IONESCU
Babeş-Bolyai University,
1, M. Kogălniceanu Street, Cluj-Napoca, 400084, Romania
{istvanc, gabis, marianzsu, ivlad}

Abstract: As software projects become more complex, there is an increased focus on their analysis and testing. Detecting software faults is a problem of major importance for improving the quality of the software development related processes and the efficiency of the software testing process. In order to detect faults in existing software systems, we introduce in this paper a novel approach, based on fuzzy self-organizing feature maps. A fuzzy map will be trained, using unsupervised learning, to provide a two-dimensional representation of the faulty and non-faulty entities from a software system and it will be able to identify if a software module is or not a defective one. Five open-source case studies are used for the experimental evaluation of our approach. The obtained results are better than most of the results already reported in the literature for the considered datasets and emphasize that a fuzzy self-organizing map is more efficient than a crisp one for the case studies used for evaluation.

Keywords: Software defect detection, Machine learning, Self-organizing map, Fuzzy clustering.

>>Full text<<
: Istvan-Gergely CZIBULA, Gabriela CZIBULA, Zsuzsanna MARIAN, Vlad-Sebastian IONESCU, A Novel Approach Using Fuzzy Self-Organizing Maps for Detecting Software Faults, Studies in Informatics and Control, ISSN 1220-1766, vol. 25(2), pp. 207-216, 2016.

  1. Introduction

Software defect detection represents the activity of identifying software modules which contain errors and it contributes to increasing the effectiveness of the quality assurance process. Fault detection methods would be helpful for suggesting to the developers which software modules should be focused on during testing, particularly when, from lack of time, the modules cannot be systematically tested.

Code review is frequently used in agile development processes for maintaining the quality of the software. During code review, an experienced programmer reviews the source code in order to identify vulnerabilities, security problems and other problems overlooked by the initial implementer. Since code review is a time consuming and costly activity, software defect detection can be used to guide the code review process by identifying parts of the source code where the code review is most likely to identify problems.

Software defect detection is intensively investigated in the literature and an active area in the software engineering field, as shown by a systematic review published in 2011, which collected 208 fault prediction studies published between 2000 and 2010 [12]. Detecting software faults is a complex and difficult task, mainly for large scale software projects. In the literature there are a lot of machine learning-based approaches for predicting faulty software entities. From a supervised learning perspective, defect prediction is a hard problem, particularly because of the imbalanced nature of the training data (the number of non-defective training instances is much higher than the number of defective ones). Moreover, it is not a trivial problem to identify a set of software metrics that would be relevant for discriminating between faulty and non-faulty modules.

Even if there are a lot of methods already developed for detecting software defects, researchers are still focusing on improving the performance of existing classifiers. We are introducing in this paper an unsupervised machine learning method based on fuzzy self-organizing maps for detecting faults within software systems. To the best of our knowledge, our approach is novel in the search-based software engineering literature and proved to outperform most of the existing similar approaches, considering the case studies we have used for evaluation.

The rest of the paper is structured as follows. Section 2 presents the importance of the problem approached in this paper and gives a motivation for our work. Several existing approaches similar to ours are given in Section 3. Our proposal is introduced in Section 4. Section 5 provides the experimental results which were obtained on several open-source case studies and Section 6 analyses the experimental results and compares them to existing similar work from the literature. The conclusions of the paper and directions for future research are outlined in Section 7.


  1. ABAEI, G., Z. REZAEI, A. SELAMAT, Fault Prediction by Utilizing Self-organizing Map and Threshold, in ICCSCE, 2013, pp. 465-470.
  1. AFZAL, W., R. TORKAR, R. FELDT, Resampling Methods in Software Quality Classification, International Journal of Software Engineering and Knowledge Engineering, vol. 22, no. 2, 2012 pp. 203-223.
  2. ARANDA, J., G. VENOLIA, The Secret Life of Bugs: Going Past the Errors and Omissions in Software Repositories, in Proceedings of the 31st International Conference on Software Engineering, USA, 2009, pp. 298-308.
  3. BISHNU, P., V. BHATTACHERJEE, Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 6, June 2012, pp. 1146-1150.
  4. BOETTICHER, G. D., Advances in Machine Learning Applications in Software Engineering, IGI Global, 2007.
  5. CATAL, C., U. SEVIM, B. DIRI, Software Fault Prediction of Unlabeled Program Modules, in WCE, 2009, pp. 212-217.
  6. CLARK, B., D. ZUBROW, How Good is the Software: A Review of Defect Prediction Techniques, in Software Engineering Symposium, Carnegie Mellon University, 2001, pp. 1-35.
  7. DINU, S., Multi-objective Assembly Line Balancing Using Fuzzy Inertia-adaptive Particle Swarm Algorithm, Studies in Informatics and Control, vol. 24, no. 3, 2015, pp. 283-292.
  8. DRAGOMIR, O., F. DRAGOMIR, V. STEFAN, E. MINCA, Adaptive Neuro – Fuzzy Inference Systems – An Alternative Forecasting Tool for Prosumers, Studies in Informatics and Control, vol. 24, no. 3, 2015, pp. 351-360.
  9. FAWCETT, T., An Introduction to ROC Analysis, Pattern Recognition Letters, vol. 27, no. 8, 2006, pp. 861-874.
  10. GRAY, D., D. BOWES, N. DAVEY, Y. SUN, B. CHRISTIANSON, The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction, Proceedings of the Evaluation and Assesment in Software Engineering, 2011.
  11. HALL, T., S. BEECHMAN, D. BOWES, D. GRAY, S. COUNSELL, A Systematic Literature Review on Fault Prediction Performance in Software Engineering, IEEE Transactions on Software Eng., vol. 38(6), 2011, pp. 1276-1304.
  12. KIM, S., H. ZHANG, R. WO, L. GONG, Dealing with Noise in Defect Prediction, in Proceedings of the 33rd International Conference on Software Engineering, New York, NY, USA, 2011, pp. 481-490.
  13. KLAWONN, F. and HÖPPNER, F., What Is Fuzzy about Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier, LNCS 2810, Springer, 2003, pp. 254-264.
  14. MALHOTRA, R., Comparative Analysis of Statistical and Machine Learning Methods for Predicting Faulty Modules, Applied Soft Computing, vol. 21, 2014, pp. 286-297.
  15. MARIAN, Z., G. CZIBULA, I.-G. CZIBULA, S. SOTOC, Software Defect Detection using Self-Organizing Maps, Studia Univ. Babes-Bolyai, Informatica, vol. LX, no. 2, 2015 pp. 55-69.
  16. MENZIES, T., R. KRISHNA, D. PRYOR, The Promise Repository of Empirical Software Engineering Data,, North Carolina State Univ., Dep. of Computer Science.
  17. MITCHELL, T. M., Machine learning, McGraw-Hill, New York, USA, 1997.
  18. NAM, J. S. KIM, Heterogeneous Defect Prediction, in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015, pp. 508-519.
  19. PARK, M. E. HONG, Software Fault Prediction Model using Clustering Algorithms Determining the Number of Clusters Automatically, Int. Journal of Software Engineering and Its Applications, vol. 8, no. 7, 2014, pp. 199-205.
  20. SOMERVUO, P., T. KOHONEN, Self-organizing Maps and Learning Vector Quantization for Feature Sequences, Neural Processing Letters, vol. 10, 1999, pp. 151-159.
  21. TEODORESCU, H.-N. L., Coordinate Fuzzy Transforms and Fuzzy Tent Maps – Properties and Applications, Studies in Informatics and Control, vol. 24, no. 3, 2015, pp. 243-250.
  22. VAN DER MAATEN, L., G. HINTON, Visualizing Data using t-SNE, Journal of Machine Learning Research, vol. 9, 2008, pp. 2579-2605.
  23. YU, L., A. MISHRA, Experience in Predicting Fault-Prone Software Modules Using Complexity Metrics, Quality Technology & Quantitative Manag., vol. 9, no. 4, 2012, pp. 421-433.
  24. ZHENG, J., Predicting Software Reliability with Neural Network Ensembles, Expert Systems with Applications , vol. 36, no. 2, Part 1, 2009, pp. 2116-2122.