博碩士論文 107525003 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:30 、訪客IP:18.118.149.183
姓名 王建文(Jiann-Wen Wang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於libvirt與QEMU-KVM虛擬機器之記憶體層級同步容錯系統
(An Adaptive Continuous Checkpointing Fault-Tolerant Virtual Machine System based on QEMU-KVM with libvirt)
相關論文
★ 使用QoS策略优化DDS系统的数据可靠性和吞吐量
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著雲端計算與虛擬化技術的快速發展,資訊產業得以利用相關技術提升實體機器的利用效率並達成彈性的資源分配;然而在將多個伺服器整合到同一實體機器之時,也產生單一主機硬體故障即會導致多個服務失效的問題。基於虛擬化技術的容錯系統可以在主機硬體發生故障時,保護關鍵服務之虛擬機器運作狀態與其執行的 soft real-time 程式,進一步提升服務的可用性。
本研究基於 QEMU 3.0.0 、 libvirt 5.7.0 與持續同步的架構實作可透過外部管理介面控制的容錯系統,其中的持續同步架構藉由不斷同步主要虛擬機器與備援虛擬機器的狀態、並保證對外輸出的一致性,以達到容錯系統之基本要求。同時本研究以引入壓縮工具降低同步所需之頻寬、感知虛擬機器工作負載並進行參數設定等方式,協助系統管理者提升服務於容錯系統運作之效能。
摘要(英) The IT industries have commonly adopted the concept of cloud computing and virtualization, making resource management more efficient and elastic. However, as more servers are consolidated into one physical server, availability will be threatened by a single physical host′s hardware failure. A virtualization-based fault-tolerant system can protect mission-critical virtual machines running soft real-time applications from such hardware failures, thus improving the services′ availability.
Based on QEMU 3.0.0, libvirt 5.7.0, and continuous checkpointing, this study implements a virtualization-based fault-tolerant system with a management interface. Continuous checkpointing keeps replicating internal states of VM on the primary host to backup host to meet the requirements of fault tolerance, and outputs are buffered to ensure consistency. This study also designed and implemented two methods to reduce the performance degradation of guest applications brought by the system; by adjusting the checkpointing parameter automatically and utilizing compression tools to speed up dirty pages transfer on demand, system administrators can set up the system without finding out suitable parameter for every application and have more flexibility to deploy the system.
關鍵字(中) ★ QEMU-KVM
★ Libvirt
★ 虛擬機器
★ 容錯系統
★ 持續同步
關鍵字(英) ★ QEMU-KVM
★ Libvirt
★ Virtual Machine
★ Fault Tolerance
★ Continuous Checkpointing
論文目次 摘要..............................................................................................................................................i
Abstract.......................................................................................................................................ii
Contents.....................................................................................................................................iii
List of Figures............................................................................................................................vi
List of Tables...........................................................................................................................viii
I. Introduction..............................................................................................................................1
1.1 Research Background.......................................................................................................1
1.2 Motivation and Contributions..........................................................................................3
1.3 Outline..............................................................................................................................4
II. Background Knowledge.........................................................................................................5
2.1 QEMU and Kernel-based Virtual Machine......................................................................5
2.2 Libvirt...............................................................................................................................5
2.3 Types of VM Fault Tolerance Systems............................................................................6
2.3.1 Lock-Stepping...........................................................................................................6
2.3.2 Continuous Checkpointing.......................................................................................7
2.3.3 Hybrid.......................................................................................................................7
2.4 Live Migration with Compression Techniques................................................................8
III. System Design.......................................................................................................................9
3.1 Overall Architecture.........................................................................................................9
3.1.1 Checkpointing and Messaging................................................................................10
3.1.2 Watchdog................................................................................................................10
3.1.3 Export.....................................................................................................................10
3.1.4 Autopilot.................................................................................................................11
3.2 System Initialization.......................................................................................................11
3.3 Checkpointing Process and Network Output Correctness.............................................12
3.4 Fault Model and Fault Handling....................................................................................13
3.4.1 Fault Model Overview............................................................................................13
3.4.2 Correctness.............................................................................................................14
3.5 Libvirt Integration..........................................................................................................17
3.6 Additional Modification to QEMU................................................................................19
IV. Performance Improvements................................................................................................20
4.1 Experiment Environment...............................................................................................20
4.1.1 Environment Overview and Configuration............................................................20
4.1.2 Applications for Performance Evaluation..............................................................21
4.2 Adjusting Epoch Time Adaptively.................................................................................23
4.2.1 Finding Optimal Epoch Time with Manual Experiments.......................................23
4.2.2 Probing the Moving Average Online......................................................................26
4.3 Utilizing Compression Techniques................................................................................28
4.3.1 Implementation of Compressing Checkpoints........................................................28
4.3.2 Performance Evaluation on Compressing Checkpoints.........................................30
V. Evaluation.............................................................................................................................32
5.1 Experiment Environment...............................................................................................32
5.2 Experiment Results.........................................................................................................33
5.2.1 TPC-C OLTP Database Benchmark.......................................................................33
5.2.2 Acme Air in NodeJS...............................................................................................34
5.2.3 Kernel Compilation................................................................................................35
5.2.4 Network Latency of Idle Guest...............................................................................36
5.2.5 Network Throughput..............................................................................................37
VI. Related Work......................................................................................................................38
6.1 Virtual Machine Fault Tolerance...................................................................................38
6.1.1 Continuous Checkpointing Implementations.........................................................38
6.2 Live Migration with Lossless Compression Algorithms................................................40
6.2.1 XOR-Based Zero Run Length Encoding (XBZRLE).............................................40
6.2.2 LZ4 Lossless Compression.....................................................................................40
VII. Conclusion and Future Work.............................................................................................41
References.................................................................................................................................42
參考文獻 [1] M. Armbrust et al., “A View of Cloud Computing,” Commun ACM, vol. 53, pp. 50–58,
Apr. 2010, doi: 10.1145/1721654.1721672.
[2] Armbrust et al., “Above the Clouds: A Berkeley View of Cloud Computing,” Jan.
2009.
[3] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud Computing and
Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th
Utility,” Future Gener. Comput. Syst., vol. 25, pp. 599–616, Jun. 2009, doi:
10.1016/j.future.2008.12.001.
[4] McAfee, LLC, “Cloud Market Share Report | AWS vs Azure vs Google Cloud 2019 |
McAfee,” Oct. 25, 2019. https://www.skyhighnetworks.com/cloud-security-blog/microsoft-
azure-closes-iaas-adoption-gap-with-amazon-aws/ (accessed Jul. 10, 2020).
[5] VMware, Inc, “What is vSphere 7? | Server Virtualization Software | VMware.” https://
www.vmware.com/products/vsphere.html (accessed Jul. 10, 2020).
[6] O. Sefraoui, M. Aissaoui, and M. Eleuldj, “OpenStack: Toward an Open-Source
Solution for Cloud Computing,” Int. J. Comput. Appl., vol. 55, pp. 38–42, Oct. 2012, doi:
10.5120/8738-2991.
[7] F. Bellard, “QEMU, a fast and portable dynamic translator,” in Proceedings of the
annual conference on USENIX Annual Technical Conference, Anaheim, CA, Apr. 2005, p.
41, Accessed: Jul. 10, 2020. [Online].
[8] A. Qumranet, Y. Qumranet, D. Qumranet, U. Qumranet, and A. Liguori, “KVM: The
Linux virtual machine monitor,” Proc. Linux Symp., vol. 15, Jan. 2007.
[9] “libvirt: The virtualization API.” https://libvirt.org/ (accessed Jul. 10, 2020).
[10] C. Clark et al., “Live Migration of Virtual Machines.,” May 2005.
[11] W. Voorsluys, J. Broberg, S. Venugopal, and R. Buyya, “Cost of Virtual Machine Live
Migration in Clouds: A Performance Evaluation,” Sep. 2011, vol. 5931, doi: 10.1007/978-
3-642-10665-1_23.
[12] K. Vishwanath and N. Nagappan, “Characterizing Cloud Computing Hardware
Reliability,” Jan. 2010, pp. 193–204, doi: 10.1145/1807128.1807161.
[13] J. Gray and D. Siewiorek, “High-Availability Computer Systems,” Computer, vol. 24,
pp. 39–48, Oct. 1991, doi: 10.1109/2.84898.
[14] D. Scales, M. Nelson, and G. Venkitachalam, “The design of a practical system for
fault-tolerant virtual machines,” Oper. Syst. Rev., vol. 44, pp. 30–39, Dec. 2010, doi:
10.1145/1899928.1899932.
[15] P.-J. Tsao, Y.-F. Sun, L.-H. Chen, and C.-Y. Cho, “Efficient Virtualization-Based
Fault Tolerance,” Dec. 2016, pp. 114–119, doi: 10.1109/ICS.2016.0031.
[16] C. Wang et al., “PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-
tolerance,” Apr. 2018.
[17] Y. Dong et al., “COLO: COarse-grained LOck-stepping virtual machines for non-stop
service,” presented at the Proceedings of the 4th Annual Symposium on Cloud Computing,
SoCC 2013, Oct. 2013, doi: 10.1145/2523616.2523630.
[18] A. Souza, A. Papadopoulos, L. Tomás, D. Gilbert, and J. Tordsson, “Hybrid Adaptive
Checkpointing for Virtual Machine Fault Tolerance,” Apr. 2018, pp. 12–22, doi:
10.1109/IC2E.2018.00023.
[19] M. Pereira da Silva, R. Obelheiro, and G. Koslovski, “Adaptive Remus : adaptive
checkpointing for Xen-based virtual machine replication,” Int. J. Parallel Emergent Distrib.
Syst., vol. 32, pp. 1–20, Mar. 2016, doi: 10.1080/17445760.2016.1162302.
[20] “qemu git repository: docs/COLO-FT.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[21] R. Russell, “virtio: towards a de-facto standard for virtual I/O devices.,” Oper. Syst.
Rev., vol. 42, pp. 95–103, Jan. 2008.
[22] Red Hat,Inc., “Introduction to virtio-networking and vhost-net.”
https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net (accessed
Jul. 10, 2020).
[23] Advanced Micro Devices Inc., “AMD64 Architecture Programmer’s Manual, Volume
2: System Programming; Chapter 15: Secure Virtual Machine,” p. 714, 2020.
[24] Intel Corporation, “Intel® 64 and IA-32 Architectures Software Developer’s Manual,
Volume 3C: System Programming Guide, Part 3; Part 3: CHAPTER 23, INTRODUCTION
TO VIRTUAL MACHINE EXTENSIONS,” p. 730.
[25] “libvirt: Applications using libvirt.” https://libvirt.org/apps.html (accessed Jul. 10,
2020).
[26] “Documentation/QMP - QEMU.” https://wiki.qemu.org/Documentation/QMP
(accessed Jul. 10, 2020).
[27] T. Bressoud and F. Schneider, “Hypervisor-Based Fault Tolerance.,” ACM Trans
Comput Syst, vol. 14, pp. 80–107, Feb. 1996, doi: 10.1145/224056.224058.
[28] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus:
High Availability via Asynchronous Virtual Machine Replication,” Apr. 2008.
[29] “Features/MicroCheckpointing - QEMU.”
https://wiki.qemu.org/Features/MicroCheckpointing (accessed Jul. 12, 2020).
[30] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: virtual machine
synchronization for fault tolerance,” Jan. 2008.
[31] “VMware vSphere 6 Fault Tolerance: Architecture and Performance,” Fault Toler., p.
21.
[32] P. Svärd, B. Hudzia, J. Tordsson, and E. Elmroth, “Evaluation of Delta Compression
Techniques for Efficient Live Migration of Large Virtual Machines,” Jul. 2011, vol. 46, pp.
111–120, doi: 10.1145/2007477.1952698.
[33] L. Li and Y. Zhang, “KVM Live Migration Optimization - KVM Forum 2015.” http://
www.linux-kvm.org/images/b/b3/02x-09-Cedar-Liang_Li-
KVMLiveMigrationOptimization.pdf (accessed Jul. 10, 2020).
[34] X. Song, J. Shi, R. Liu, J. Yang, and H. Chen, “Parallelizing Live Migration of Virtual
Machines,” ACM SIGPLAN Not., vol. 48, Mar. 2013, doi: 10.1145/2451512.2451531.
[35] M. Hines, U. Deshpande, and K. Gopalan, “Post-copy live migration of virtual
machines,” Oper. Syst. Rev., vol. 43, pp. 14–26, Jul. 2009, doi: 10.1145/1618525.1618528.
[36] “Features/AutoconvergeLiveMigration - QEMU.”
https://wiki.qemu.org/Features/AutoconvergeLiveMigration (accessed Jul. 10, 2020).
[37] “qemu git repository: docs/xbzrle.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[38] “open(2) - Linux manual page.” https://man7.org/linux/man-pages/man2/open.2.html
(accessed Jul. 10, 2020).
[39] “ChangeLog/2.10 - QEMU.”
https://wiki.qemu.org/ChangeLog/2.10#Block_devices_and_tools (accessed Jul. 10, 2020).
[40] “fcntl(2) - Linux manual page.”
https://www.man7.org/linux/man-pages/man2/fcntl.2.html (accessed Jul. 10, 2020).
[41] “Percona-Lab/tpcc-mysql,” Jul. 10, 2020. https://github.com/Percona-Lab/tpcc-mysql
(accessed Jul. 10, 2020).
[42] “acmeair/acmeair-nodejs,” Jul. 07, 2020. https://github.com/acmeair/acmeair-nodejs
(accessed Jul. 10, 2020).
[43] “Node.js Benchmarking.” https://benchmarking.nodejs.org/ (accessed Jul. 10, 2020).
[44] “lz4/lz4,” Aug. 15, 2020. https://github.com/lz4/lz4 (accessed Aug. 16, 2020).
指導教授 梁德容 王尉任(Deron Liang Wei-Jen Wang) 審核日期 2020-8-17
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明