摘要(英) |
Virtual machine fault tolerance (VMFT) is a technology enabling continuous execution upon hardware/software failures, and it thus can be used to protect virtualized, critical software services. There are two ways to implement VMFT. The first one uses a continuous-checkpointing strategy, in which a backup virtual machine (VM) keeps receiving the latest VM checkpoint from the protected VM. The other one uses a log-and-replay strategy, in which all events in the protected VM are recorded and the recorded events are turned into deterministic events for replay in the backup VM. Once the protected VM fails, the backup VM replaces the role of the protected VM immediately to minimize service downtime. This research aims to provide a log-and-replay-based mechanism for VMFT over Kernel-based Virtual Machine (KVM). Before entering the phase of VMFT, the proposed mechanism creates the backup VM by live cloning the protected VM. Then, the two VMs enter the fault tolerance phase, in which they synchronize periodically. In each synchronization epoch, the proposed mechanism monitors the non-deterministic events happening on the protected VM, and identifies the logical time along with the parameters of the events. It then transfers the logged data to the backup VM for event replay. Upon reception of the data, the backup VM sets instruction break points at the right place and starts execution. It injects each logged event when reaches the corresponding break point. The backup VM signals the protected VM when it finishes. When the protected VM fails during the fault tolerance phase, the backup VM is responsible to detect such a failure and to replace the role of the protected VM. |
參考文獻 |
[1] Staff, VMWare. "Virtualization overview." White Paper,
http://www.vmware.com/pdf/virtualization.pdf (2012).
[2] Popek, Gerald J., and Robert P. Goldberg. "Formal requirements for virtualizable third generation architectures." Communications of the ACM 17.7 (1974): 412-421.
[3] Power, Emerson Network. "Understanding the cost of data center downtime: an analysis of the financial impact on infrastructure vulnerability." white paper (2011).
[4] Gray, Jim, and Daniel P. Siewiorek. "High-availability computer systems." Computer 24.9 (1991): 39-48.
[5] Scales, Daniel J., Mike Nelson, and Ganesh Venkitachalam. "The design of a practical system for fault-tolerant virtual machines." ACM SIGOPS Operating Systems Review 44.4 (2010): 30-39.
[6] VMware Inc., “VMWare vSphere 4 Fault Tolerance: Architecture and Performance,” Chapter 1-Chapter 2, 2009
[7] Red Hat Inc., “KVM – KERNEL BASED VIRTUAL MACHINE,” white paper, update: January 2015.
[8] Maohua Lu, and Tzi-cker Chiueh, “Fast Memory State Synchronization for Virtualization-based Fault Tolerance,” 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, 534-543, July 2009
[9] Uhlig, Rich, et al. "Intel virtualization technology." Computer 38.5 (2005): 48-56.
[10] Virtualization, A. M. D. "Amd-v nested paging." White paper.[Online] Available: http://sites.amd.com/us/business/it-solutions/virtualization/Pages/amd-v.aspx (2008).
[11] Intel, Intel. "and IA-32 architectures software developer’s manual." Volume 3B.
[12] QEMU Fabrice Bellard, “QEMU, a Fast and Portable Dynamic Translator,” USENIX Annual Technical Conference, FREENIX Track, 41-46, 2005.
[13] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: Virtual Machine Synchronization for Fault Tolerance,” Proc. USENIX Annual Technical Conference, 2008.
[14] Micro-Checkpointing “Features/MicroCheckpointing – QEMU,” [Online]. Available: http://wiki.qemu.org/Features/MicroCheckpointing. [Accessed: 24-June-2016].
[15] Lockstep Thomas C. Bressoud, Fred B. Schneider, “Hypervisor-based fault tolerance,” ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles (Volume: 14 Issue 1), 80-107, Feb. 1996.
[16] Kurt E. Kiefer* and Louise E. Moser, ” Replay debugging of non-deterministic executions in the Kernel-based Virtual Machine,” Software: Practice and Experience (Volume: 43, Issue: 11), 1261-1281, November 2013.
[17] J. Li, S. Si, B. Li, L. Cui, and J. Zheng , “LoRe: Supporting Non-deterministic Events Logging and Replay for KVM Virtual Machines,” High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on, 442-449, November 2013.
[18] Julian B. Grizzard and Ryan W. Gardner, “Analysis of Virtual Machine Record and Replay for Trustworthy Computing,” JOHNS HOPKINS APL TECHNICAL DIGEST (Volume: 32, Number: 2), 528-535, 2013.
[19] Kurt E. Kiefer, Louise E. Moser, “Replay debugging of non-deterministic executions in the Kernel-based Virtual Machine,” Software: Practice and Experience (Volume: 43 Issue 11), 1261-1281, November 2013.
[20] Sheldon, M. X. V. M. J., and Ganesh Venkitachalam Boris Weissman. "Retrace: Collecting execution trace with virtual machine deterministic replay." Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation (MoBS 2007). 2007. |