Alexandra CERNIAN*, Dorin CÂRSTOIU,
Adriana OLTEANU, Valentin SGÂRCIU
University Politehnica of Bucharest ,
313 Splaiul Independentei, Bucharest, Romania Alexandra.firstname.lastname@example.org
* Corresponding author
Abstract: This paper focuses on the compression based clustering and aims to determine the most suitable combinations of algorithms for different clustering contexts (text, heterogeneous data, Web pages, metadata and so on) and establish whether using compression with traditional clustering methods leads to better performance. In this context, we propose an integrated cluster analysis test platform, called EasyClustering, which incorporates two subsystems: a clustering component and a cluster validity expert system, which automatically determines the quality of a clustering solution by computing the FScore value. The experimental results are focused on two main directions: determining the best approach for compression based clustering in terms of context, compression algorithms and clustering algorithms, and validating the functionality of the cluster analysis expert system for determining the quality of the clustering solutions. After conducting a set of 324 clustering tests, we concluded that compressing the input when using traditional clustering methods increases the quality of the clustering solutions, leading to results comparable to the NCD and the cluster analysis expert system proved 100% its accuracy so far, so we estimate that, even if some slight deviation should occur, it will be minimal.
Keywords: Clustering, compression, cluster analysis, FScore, expert system.
CITE THIS PAPER AS:
Alexandra CERNIAN, Dorin CÂRSTOIU, Adriana OLTEANU, Valentin SGÂRCIU, An Integrated Cluster Analysis and Validity Test Platform<br> for the Compression based Clustering Approach, Studies in Informatics and Control, ISSN 1220-1766, vol. 24 (2), pp. 151-158, 2015.