摘要(英) |
With the widespread use of the virtualization technology, many network services on the cloud have been using virtual machines as their computing resources. Although virtualization provides many preferable features to cloud platforms, such as good manageability and sever consolidation, it still faces the problem of the single-point failure. For example, a physical machine failure consequently fails all the virtual machines that are running on it. Automatic fault tolerance for VM is one way to solve this problem. That is, a backup virtual machine keeps synchronized with the virtual machine to be protected, and replaces the role of the protected virtual machine as it is down. Based on our study, the existing open-source fault-tolerant VM solutions, Kemari and Micro-Checkpointing, do not work smoothly when hosting a network service. We even found that, a Micro-Checkpointing fault-tolerant VM crashes very often. Therefore, we have proposed a novel design of a fault-tolerant virtual machine based on KVM, namely M –FTVM. We have also implemented a prototype of the proposed fault-tolerant VM, and keep working on improving its performance. This paper focuses on the techniques of performance improvement for M-FTVM. We have used the DVD-Store benchmark to evaluate the performance of M-FTVM. The experimental result shows that, the latest M-FTVM is about four times as fast as the original version, about three times as fast as Micro-Checkpointing, and about seven times as fast as Kemari, when measured in operations per minute. |
參考文獻 |
[1] G. J. Popek and R. P. Goldberg, “Formal requirements for virtualizable third generation architectures,” Commun ACM, vol. 17, no. 7, pp. 412–421, 1974.
[2] R. P. Goldberg, “Survey of virtual machine research,” Computer, vol. 7, no. 6, pp. 34–45, Jun. 1974.
[3] S. N. T. Chiueh and S. Brook, “A survey on virtualization technologies,” RPE Rep., pp. 1–42, 2005.
[4] W.-C. Feng, “Making a case for efficient supercomputing,” Queue, vol. 1, no. 7, p. 54, 2003.
[5] I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen, “A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems,” J Supercomput, vol. 65, no. 3, pp. 1302–1326, 2013.
[6] J. Gray and D. P. Siewiorek, “High-availability computer systems,” Computer, vol. 24, no. 9, pp. 39–48, Sep. 1991.
[7] T. Hirt, “Kvm-the kernel-based virtual machine,” Red Hat Inc, 2010.
[8] M. Zabaljauregui, Hardware Assisted Virtualization. Intel Virtualization Technology. Buesnos Aires, June, 2008.
[9] “AMD Virtualization.” [Online]. Available: http://www.amd.com/en-us/solutions/servers/virtualization. [Accessed: 11-Jun-2015].
[10] F. Bellard, “QEMU, a Fast and Portable Dynamic Translator.,” in USENIX Annual Technical Conference, FREENIX Track, pp. 41–46, 2005
[11] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: High availability via asynchronous virtual machine replication,” in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 161–174, 2008
[12] D. J. Scales, M. Nelson, and G. Venkitachalam, “The design and evaluation of a practical system for fault-tolerant virtual machines,” Technical Report VMWare-RT-2010-001, VMWare, 2010.
[13] T. C. Bressoud and F. B. Schneider, “Hypervisor-based fault tolerance,” ACM Trans. Comput. Syst. TOCS, vol. 14, no. 1, pp. 80–107, 1996.
[14] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: Virtual machine synchronization for fault tolerance,” in Proc. USENIX Annu. Tech. Conf.(Poster Session), 2008.
[15] “Features/MicroCheckpointing - QEMU.” [Online]. Available: http://wiki.qemu.org/Features/MicroCheckpointing. [Accessed: 24-Nov-2014].
[16] M. Lu and T. Chiueh, “Fast memory state synchronization for virtualization-based fault tolerance,” in Dependable Systems & Networks, 2009. DSN’09. IEEE/IFIP International Conference on, pp. 534–543, 2009.
[17] M. Lu and T. Chiueh, “Speculative Memory State Transfer for Active-Active Fault Tolerance,” pp. 268–275, 2012.
[18] B. Gerofi and Y. Ishikawa, “Workload Adaptive Checkpoint Scheduling of Virtual Machine Replication,” pp. 204–213, 2011.
[19] B. Gerofi and Y. Ishikawa, “RDMA Based Replication of Multiprocessor Virtual Machines over High-Performance Interconnects,” pp. 35–44, 2011.
[20] S. Kasampalis, “Copy On Write Based File Systems Performance Analysis And Implementation,” Dostopno Prek Httpfaif Object. Netdownload-Copy-Onwrite-Based-File-Syst. 12 10 2014, 2010.
[21] “VMware vSphereTM 4 Fault Tolerance: Architecture and Performance.” [Online]. Available: http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf. [Accessed: 27-Jul-2015].
|