dc.description.abstract | Virtual machines (VMs) have been widely used in many cloud computing platforms, and they may fail for many reasons. Once a VM becomes failed, the cloud services running on it fail consequently and the service providers may suffer deferent levels of property and business losses. To prevent such a failure, one can use the fault tolerance technology to protect a VM. That is, a backup VM is used and its execution state is synchronized with the VM to be protected. When the VM to be protected fails, the backup VM replaces it immediately. There are two approaches to implement a fault tolerance mechanism on VM. The concept of the first one, namely the memory-level fault tolerance, is to synchronize the memory content of the pair of VMs. The concept of the second one, namely the instruction-level synchronization, is to execute the same instructions and events with the same order on the pair of VMs. The first type has been seen in open-source projects, while the second type can only be found in VMWare. In this paper, we aim to develop a prototype of the instruction-level fault tolerance mechanism on KVM. The proposed mechanism creates a backup VM by cloning the state of the VM to be protected in the beginning. Consequently, it records non-deterministic events on the VM to be protected, turns them into deterministic events on the backup VM, and replays them in the right moment. An overhead analysis is provided in the paper, to see how the replay parameters affect the performance of the proposed fault tolerance mechanism. | en_US |