Tuesday , March 19 2024

Fault Tolerance for Conjugate Gradient Solver Based on FT-MPI

Weizhe ZHANG, Hui HE
Harbin Institution of Technology
92 West Dazhi, Harbin, 150001, China
wzzhang@hit.edu.cn, hehui@hit.edu.cn

Abstract: Grid computing is characterized by high speed, large scale, large task quantity, and long cycles. Such characteristics prevent the waste of large amounts of computing power and time that can be attributed to system errors. Moreover, such features provide the fault tolerance of computing resource nodes in the structural system of grid computing, which has become a key issue in the field. This paper describes the current fault-tolerant message passing interface library, designs a grid computing-based task migration and recovery model, and then identifies the functional architecture of each module of the mode. Further analysis and comparison were conducted on the storage mechanism of the fault-tolerant checkpoint of the model as well as its information-encoding algorithm. Finally, the realization of a Checksum algorithm-based fault-tolerant conjugate gradient solver shows the validity of the theory

Keywords: Fault tolerance; CG solver; FT-MPI; computational grid.

>>Full Text
CITE THIS PAPER AS:
Weizhe ZHANG, Hui HE, Fault Tolerance for Conjugate Gradient Solver Based on FT-MPI, Studies in Informatics and Control, ISSN 1220-1766, vol. 22 (1), pp. 51-60, 2013. https://doi.org/10.24846/v22i2y101306