摘要: | 隨著雲端運算與虛擬化技術的發展,虛擬機(Virtual Machine, VM)已成為現代計算環境中的核心技術,能夠有效提升硬體資源的利用率。然而,相較於實體機器,虛擬機更容易因硬體故障或操作失誤導致服務中斷,影響系統的可用性。因此,高可用性(High Availability, HA)技術被廣泛應用於虛擬機容錯(Fault Tolerance, FT)系統,以確保系統在發生故障時仍能維持正常運行。 NCU-MFTVM 是由 PDCLAB 團隊開發的 VM 容錯系統,採用持續同步機制,即時將主要虛擬機的狀態同步至備援虛擬機,確保系統能在故障發生時提供不中斷的服務。最新版本 NCU-MFTVM v8.2 採用基於 epoch base的設計,以提升系統的容錯能力。然而,VM 停機時間(VM stop time)仍是影響系統效能的關鍵因素,尤其在同步時間較短的情況下,停機時間的長短更為關鍵。為此,本研究提出一種減少虛擬設備同步時間的機制,透過追蹤系統中虛擬設備的狀態,決定所需同步的設備資料量,從而降低同步時間與資料傳輸量,進一步縮短系統停機時間與延遲時間,提升整體系統效能。;With the advancement of cloud computing and virtualization technologies, Virtual Machines (VMs) have become a core component of modern computing environments, significantly improving hardware resource utilization. However, compared to physical machines, VMs are more susceptible to service disruptions caused by hardware failures or human errors, which can impact system availability. To address this issue, High Availability (HA) techniques are widely employed in Fault Tolerance (FT) systems for VMs, ensuring continuous operation even in the event of failures. NCU-MFTVM is a VM fault-tolerant system developed by the PDCLAB team, utilizing a continuous checkpointing mechanism to instantly synchronize the primary VM′s state to a backup VM, ensuring uninterrupted service during failures. The latest version, NCU-MFTVM v8.2, is designed based on an epoch-based approach to enhance system fault tolerance. However, VM stop time remains a critical factor affecting system performance, especially when synchronization time is short, making the duration of VM stop time even more crucial. To address this, this study proposes a mechanism to reduce the synchronization time of virtual devices by tracking their states and dynamically determining the amount of device data that needs to be synchronized. This approach effectively reduces synchronization time and data transmission overhead, thereby minimizing system stop time and latency while improving overall system performance. |