Past Issues

Studies in Informatics and Control
Vol. 24, No. 2, 2015

An Integrated Cluster Analysis and Validity Test Platform for the Compression based Clustering Approach

Alexandra CERNIAN, Dorin CARSTOIU, Adriana OLTEANU, Valentin SGARCIU
Abstract

This paper focuses on the compression based clustering and aims to determine the most suitable combinations of algorithms for different clustering contexts (text, heterogeneous data, Web pages, metadata and so on) and establish whether using compression with traditional clustering methods leads to better performance. In this context, we propose an integrated cluster analysis test platform, called EasyClustering, which incorporates two subsystems: a clustering component and a cluster validity expert system, which automatically determines the quality of a clustering solution by computing the FScore value. The experimental results are focused on two main directions: determining the best approach for compression based clustering in terms of context, compression algorithms and clustering algorithms, and validating the functionality of the cluster analysis expert system for determining the quality of the clustering solutions. After conducting a set of 324 clustering tests, we concluded that compressing the input when using traditional clustering methods increases the quality of the clustering solutions, leading to results comparable to the NCD and the cluster analysis expert system proved 100% its accuracy so far, so we estimate that, even if some slight deviation should occur, it will be minimal.

Keywords

clustering, compression, cluster analysis, FScore, expert system.

View full article