現代許多雲端系統服務皆建置在虛擬機的基礎之上,因此當虛擬機器因為某些問題而無法運行時,就會讓虛擬機上的服務與應用程式停止,進而造成服務提供者以及客戶的損失,所以如何提高虛擬機器系統的可用性就成為一個重要的議題。容錯技術為提供虛擬機高可用性的一種技術,這種技術可以在虛擬機器發生錯誤時,由備援實體機器來接手運行這個發生錯誤的虛擬機器,並讓此虛擬機持續而不間斷的執行(Continuous Execution)。通常有兩種方式可以用以實作虛擬機容錯,一為利用Checkpointing方式達成虛擬機器與備援虛擬機器的狀態同步,我們稱為記憶體層級狀態同步容錯機制;二為記錄主要虛擬機器所執行的指令,並且在備援虛擬機器上重現以達成兩虛擬機器狀態同步,此為指令層級狀態同步容錯機制。本研究鎖定在KVM虛擬機器系統上的指令層級同步容錯機制。這個機制主要透過監控主要虛擬機器上所發生的不定性事件 (Non-Deterministic Events),然後計算該事件之邏輯時間並記錄該事件的參數後,再傳送給備援虛擬機器重現。備援虛擬機器一開始的狀態需要與主虛擬機器一致,也就是會執行同一份程式指令並保持相同記憶體內容,但是維持在暫停的狀態。當備援虛擬機器在接收到事件紀錄後,它會去設定該事件對應的指令中斷點後開始執行,並於中斷發生時安插紀錄的事件資料並重現,因此可達成兩虛擬機器的狀態同步。最後我們利用這種不定性事件記錄與重播之技術來設計並實作一個錯誤處理與復原的機制,來達成虛擬機器自動容錯的目的。;Virtual machine fault tolerance (VMFT) is a technology enabling continuous execution upon hardware/software failures, and it thus can be used to protect virtualized, critical software services. There are two ways to implement VMFT. The first one uses a continuous-checkpointing strategy, in which a backup virtual machine (VM) keeps receiving the latest VM checkpoint from the protected VM. The other one uses a log-and-replay strategy, in which all events in the protected VM are recorded and the recorded events are turned into deterministic events for replay in the backup VM. Once the protected VM fails, the backup VM replaces the role of the protected VM immediately to minimize service downtime. This research aims to provide a log-and-replay-based mechanism for VMFT over Kernel-based Virtual Machine (KVM). Before entering the phase of VMFT, the proposed mechanism creates the backup VM by live cloning the protected VM. Then, the two VMs enter the fault tolerance phase, in which they synchronize periodically. In each synchronization epoch, the proposed mechanism monitors the non-deterministic events happening on the protected VM, and identifies the logical time along with the parameters of the events. It then transfers the logged data to the backup VM for event replay. Upon reception of the data, the backup VM sets instruction break points at the right place and starts execution. It injects each logged event when reaches the corresponding break point. The backup VM signals the protected VM when it finishes. When the protected VM fails during the fault tolerance phase, the backup VM is responsible to detect such a failure and to replace the role of the protected VM.