Abstract:Fault-Tolerant scheduling, an effective means of improving a system’s performance, plays a significant role in scheduling research. Despite the fact that fault-tolerant scheduling has been extensively proposed for real-time tasks on clusters, QoS (quality of service) requirements for some tasks have not been considered. This paper proposes a fault-tolerance scheduling algorithm FTQ (fault-tolerant QoS-based scheduling) for real-time tasks with QoS needs on heterogeneous clusters. FTQ adopts the primary/backup model and takes the timing constraints of tasks, QoS requirements of tasks, reliability of systems, and system resource utilization into account. FTQ can adjust the QoS levels of real-time tasks and the execution schemes of backup copies to improve system flexibility, reliability, schedulability, and resource utilization. The system reliability is quantitatively measured and combined into FTQ, which improves the system performance. Meanwhile, FTQ strives to advance the start time of primary copies and delay the start time of backup copies to make backup copies adopt passive execution scheme, or decrease overlapping sections of primary and backup copies as much as possible to improve resource utilization. FTQ adaptively adjusts the QoS levels of tasks and the execution schemes of backup copies to attain a higher flexibility. The overlapping technology of backup copies is employed. The latest start time of backup copies and its constraints are analyzed. Compared with NOFTQ and DYFARS, FTQ shows obvious superiority with a higher scheduling quality proven by a considerable number of simulated experiments.