The present era is one of Big Data, digitalization, Internet of Things and Internet of Everything, which imply the daily creation of an enormous amount of useful content with a very high number of producers and consumers for the online information. The ascending trend for Internet data, has made clear the necessity of defining and engineering innovative solutions for coping with redundant transfers, which led to performing smart data transfers for obtaining an increased throughput, data availability and resource utilization and implicitly to a cost reduction and to avoiding bottlenecks and denial of service issues. Internet data employed by an Internet user must be consistent, so distributed systems are gaining research interest with regard to concurrency control, atomic transfers, data replication and synchronization, compression and decompression, correction or other potential problems. Two different versions of a file have a high similarity and as synchronization is concerned, the delta between the second version and the initial version of the file applied to its initial version will provide a better transfer throughput, thus an efficient data deduplication technique is necessary and worth analyzing in order to minimize the cost of synchronization. This paper focuses on optimizing the bandwidth utilization for remote data synchronization, and proposes a prototype based on three classic open-source data compression methods. The experiments carried out show how these compression utilities along with the transfer of data perform the synchronization of large data sets between two remote sites and how the use of compression helps to reduce the data size on storage devices along with decreasing the network bandwidth significantly. The novelty of this paper lies in the fact that it combines two different compression algorithms in order to provide better compression rates.
Data replication, Delta encoding, Differential file transfer, Big Data, Network transfer, Rsync.
Romina DRUTA, Cristian-Filip DRUTA, Ioan SILEA, "Evaluation of Remote Data Compression Methods", Studies in Informatics and Control, ISSN 1220-1766, vol. 31(1), pp. 59-70, 2022. https://doi.org/10.24846/v31i1y202206