When handling large-scale tasks in distributed systems, a critical challenge arises as the system scales up and the number of processors grows: the increasing prevalence of failures, which jeopardizes the timely completion of tasks. To address this issue, a fault-tolerant distributed system task scheduling model was developed, and the scheduling sequence was optimized by using a genetic algorithm aimed at minimizing the task completion time. The obtained results demonstrate that, under normal conditions, the model can achieve a resource utilization rate as high as 99.1%. Specifically, for a total of four faulty processors and 10,000 tasks, the proposed model's task completion time was 21,571 ms, outperforming other comparative models such as the Quartz model. Furthermore, when the task volume was 200, an analysis of the optimal scheduling sequence derived from the genetic algorithm revealed a task scheduling length ratio of 3.21, which was lower than the value achieved by other methods such as the ascending-order sorting method. This paper proves that the proposed model effectively reduces the task completion time and enhances resource utilization in distributed system task scheduling. Additionally, by employing genetic algorithms for solving this model, the optimal processor scheduling sequence can be determined. This research approach improves a distributed system's stability and task completion efficiency, offering a novel strategy for fault-tolerant scheduling in distributed systems.
Distributed system, Fault tolerance, Task scheduling, GA, Optimal order, Completion time.
Shigan YU, Xiaoling RU, "Fault Tolerant Task Scheduling for Distributed Systems Based on Genetic Algorithm", Studies in Informatics and Control, ISSN 1220-1766, vol. 34(3), pp. 39-50, 2025. https://doi.org/10.24846/v34i3y202504