A Practical Log and Replay Strategy for VM Fault Tolerance

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：7

、訪客IP：3.20.232.68

姓名

阿斐奇(Afiqie Fadhihansah) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(A Practical Log and Replay Strategy for VM Fault Tolerance)

相關論文

★ 以標記為基礎之網實擴增實境導覽系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

虛擬化是一種計算機體系結構技術，通過該技術，多個虛擬機（VM）在同一硬件機器中被復用。虛擬機的目的是增強許多用戶的資源共享，並且在資源利用和應用靈活性方面改進計算機性能。可以在各種功能層中虛擬化硬件資源（CPU，存儲器，I / O設備等）或軟件資源（操作系統和軟件庫）。這種虛擬化技術隨著近年來對分佈式和雲計算的需求急劇增加而得到重振。
容錯不僅僅是單個機器的屬性;它也可以表徵它們相互作用的規則。例如，傳輸控制協議（TCP）被設計為在分組交換網絡中允許可靠的雙向通信，即使在存在不完全或過載的通信鏈路的情況下。它通過要求通信的端點期望分組丟失，複製，重新排序和損壞來進行，使得這些條件不會損害數據完整性，並且僅以成比例的量減少吞吐量。
在容錯虛擬機中設計的最重要的要求是確保它實際上滿足其對可靠性的要求。我們對這個問題的解決方案採用虛擬機記錄和重放的形式。通過記錄關於系統執行的足夠信息，我們能夠在稍後的時間重放執行，重複所有非確定性事件，正如它們在原始執行中發生的那樣。我們已將日誌記錄和重放機制集成到用於Linux的基於內核的虛擬機（KVM）開源全系統虛擬化軟件包中。
最後，關於VM容錯的實際日誌和重放策略的研究結果是，當輸出需要執行時，主要應該將數據事件傳輸到備份，然後允許主要執行輸出。在執行輸出後，主要應該通知備份，並且如果接收到通知，備份將不執行輸出，並且如果不接收則執行輸出

摘要(英)

Virtualization is a computer architecture technology by which multiple virtual machines (VMs) are multiplexed in the same hardware machine. The purpose of a virtual machine is to enhance resource sharing by many users and improve computer performance in terms of resource utilization and application flexibility. Hardware resource (CPU, memory, I/O devices, etc.) or software resources (operating system and software libraries) can be virtualized in various functional layers. This virtualization technology has been revitalized as the demand for distributed and cloud computing which increased sharply in recent years.
Fault tolerance is not just a property of individual machines; it may also characteristic the rules by which they interact. For example, the Transmission Control Protocol (TCP) is designed to allow reliable two-way communication in a packet-switched network, even in the presence of communications links which are imperfect or overloaded. It does this by requiring the endpoints of the communication to expect packet loss, duplication, reordering and corruption, so that these conditions do not damage data integrity, and only reduce throughput by a proportional amount.
The most important requirement of design in a fault tolerant virtual machine is making sure it actually meets its requirements for reliability. Our solution to this problem takes the form of virtual machine logging and replay. By logging enough information about the execution of the system, we are able to replay the execution at a later time, repeating all non-deterministic events exactly as they occurred in the original execution. We have integrated the logging and replay mechanisms into the Kernel-based Virtual Machine (KVM) open-source full-system virtualization package for Linux.
Finally, the result of this research about a practical log and replay strategy for VM fault tolerance is that primary should transfer data events to backup when output need to be executed, then primary will be allowed to execute the output. After output been performed, primary should notify backup, and backup will not perform output if received notification, and do the output if not receiving.

關鍵字(中)

★ 日誌和重放
★ 容錯
★ 虛擬機

關鍵字(英)

★ Log-and-replay
★ fault tolerance
★ virtual machine

論文目次

摘要 iii
ABSTRACT iv
ACKNOWLEDGMENT v
TABLE OF CONTENTS vi
LIST OF FIGURE viii
LIST OF TABLES ix

Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 4
1.3 Research Objective 4
1.4 Thesis Structure 4
Chapter 2 Literature Review 6
2.1 Fault Tolerance 6
2.1.1 Faults and Failures 6
2.1.2 Dependency Relations 10
2.1.3 Fault Tolerance Mechanism 13
2.1.4 Fault Tolerance with Virtualization Technology 14
2.2 Kernel-based Virtual Machine (KVM) and QEMU 15
2.2.1 KVM (Kernel-based Virtual Machine) 15
2.2.2 QEMU 16
2.2.3 Input/output and Interrupts 17
2.3 MMIO (Memory-mapped I/O) 17
2.3.1 Definition 17
2.3.2 Memory Barriers 20
2.3.3 Examples 20
2.4 Registers Concept 22
Chapter 3 System Design 25
3.1 Architecture Design 25
3.2 Experimental Environment 27
Chapter 4 Discussion 28
4.1 Logging and Replay Process 28
4.1.1 Solution Concept 35
4.2.2 Scenario Cases 37
Chapter 5 Conclusion 40
5.1 Conclusion 40
5.2 Future Work 40
REFERENCES 41

參考文獻

[1] Habib I. Virtualization with KVM. Linux Journal 2008; 2008(166). Article No. 8.
[2] Uhlig R, Neiger G, Rodgers D, Santoni A, Martins F, Anderson A, Bennett S, Kagi A, Leung F, Smith L. Intel virtualization technology. Computer 2005; 38(5):48–56.
[3] AMD. AMD64 Virtualization Codenamed “Pacifica” Technology: Secure Virtual Machine Architecture Reference Manual. Advanced Micro Devices: Sunnyvale, CA, 2005. AMD Publication No. 33047
[4] Russell R. Virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Systems Review 2008; 42(5):95–103
[5] S. Osman, D. Subhraveti, G. Su, and J. Nieh, “The Design and Implementation of Zap: A System for Migrating Computing Environments”, Proc. USENIX OSDI, 2002.
[6] H. Zhong and J. Nieh, “Linux Checkpoint/Restart As a Kernel Module”, Technical Report CUCS-014-01, Department of Computer Science, Columbia University, 2001.
[7] J. Sankaran, J. M. Squyres, B. Barret, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman, “The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing”, Proceedings of the LACSI Symposium, 2003.
[8] G. E. Fagg and J. Dongarra, “FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World”, Proc. The 7th European PVM/MPI User’s GroupMeeting, LNCS, Vol.1908, 2000.
[9] Y. Chen, J. S. Plank, and K. Li, “CLIP: A Checkpointing Tool for Message-Passing Parallel Programs”, Proc. IEEE Supercomputing, 1997
[10] G. Stellner, “CoCheck: checkpointing and process migration for MPI”, Proc. of IPPS’96, 1996.
[11] S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, and E. Roman. “The LAM/MPI checkpoint/restart framework: System-initiated checkpointing”, Proc. LACSI Symposium, Sante Fe, New Mexico, USA, October 2003.
[12] G. Bosilca, A. Boutellier, and F. Cappello, “MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes”, Proc. Supercomputing, Nov. 2002.
[13] R. T. Aulwes, D. J. Daniel, N. N. Desai, R. L. Graham, L. D. Risinger, M. A. Taylor, T. S. Woodall, and M. W. Sukalski, “Architecture of LA-MPI, a network-fault-tolerant MPI”, Proc. International Parallel and Distributed Processing Symposium, 2004
[14] J. Duell, “The design and implementation of berkeley lab’s linux checkpoint/restart”, Technical Report, Lawrence Berkeley National Laboratory, 2000.
[15] A. B. Nagarajan, F. Mueller, C. Engelmann, and S. L. Scott, “Proactive Fault Tolerance for HPC with Xen Virtualization”, Proc. ACM International Conference on Supercomputing, 2007.
[16] B. Cully, G. Lefebvre, D. Meyer, M. Freeley, N. Hutchinson, and A. Warfield, “Remus: High Availability via Asynchronous Virtual Machine Replication”, Proc. USENIX NSDI, 2008.
[17] Y. Tamura, K. Sato, S.Kihara, and S. Moriai, “Kemari: virtual machine synchronization for fault tolerance”, Proc. USENIX′08 Poster Session, San Jose, CA, USA, 2008.
[18] Melliar-Smith, P. M. “A Project to Investigate Data-base Reliability”, Report, Computing Lab., University of Newcastle-upon-Tyne, England, 1975.
[19] "Intel 64 and IA-32 Architectures Software Developer′s Manual: Volume 2A: Instruction Set Reference, A-M" (PDF). Intel 64 and IA-32 Architectures Software Developer’s Manual. Intel Corporation. June 2010. pp. 3–520.
[20] "Intel 64 and IA-32 Architectures Software Developer′s Manual: Volume 2B: Instruction Set Reference, N-Z" (PDF). Intel 64 and IA-32 Architectures Software Developer’s Manual. Intel Corporation. June 2010. pp. 4–22.
[21] "AMD64 Architecture Programmer′s Manual: Volume 3: General-Purpose and System Instructions" (PDF). AMD64 Architecture Programmer′s Manual. Advanced Micro Devices. November 2009. pp. 117, 181. Retrieved 2010-08-21.
[22] ARM Cortex-A Series Programmer′s Guide. Literature number ARM DEN0013D. pp. 10–3.
[23] Paolo Bonzini et al. QEMU. http://wiki.qemu.org/Main_Page. (Access on December 2016)
[24] Irv Englander. 2009. “The Architecture of Computer Hardware, System Software, and Networking 4th Edition”. Danvers: John Wiley and Sons, Inc.

指導教授

梁德榮、M. Aziz Muslim、Muhammad Aswin
(Deron Liang Deron Liang、M. Aziz Muslim、Muhammad Aswin)

審核日期

2017-1-25

推文