軟體定義運算叢集之快速自動化軟硬體錯誤偵測與復原機制

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.17.155.232

姓名

鄭鈞輿(Chun-Yu Cheng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

軟體定義運算叢集之快速自動化軟硬體錯誤偵測與復原機制
(Fast Failover Based on Software-Defined Computing Cluster)

相關論文

★ 以伸展樹為基礎的Android Binder Driver	★ 應用增量式學習於多種農作物判釋之研究
★ 應用分類重建學習偵測航照圖幅中的新穎坵塊	★ 一個建立在平行工作系統上的動態全球計算平台
★ 用權重參照計數演算法執行主動物件垃圾收集	★ 一個動態負載平衡之最大可能性估算計算架構
★ 利用多項系統負載資訊進行動態P2P系統重組的策略研究	★ 基於Hadoop系統的雲端應用程式特徵擷取與計算監測架構
★ 適用於大型動態分散式系統的調適性計算模型	★ 一個提供彈性虛擬資料中心的雲端服務平台
★ 雲端彈性虛擬機房服務平台之資源控管中心	★ 一個適用於自動供應雲端系統的動態調適計算架構
★ 線性相關工作與非相關工作的探索式排程策略	★ 適用於大資料集高效率的分散式階層分群演算法
★ 混合雲端環境上的多重代理人動態調適計算管理架構	★ 基於圖形的平行化最小生成樹分群演算法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來雲端運算技術日益成熟，大多數企業都選擇將其服務佈署至雲端環境運行，由於雲端技術所帶來的擴展性與方便性，雲端環境相較於實體環境對於資源能以低成本的方式動態調整，能夠妥善利用完整的機器資源，因此OpenStack成為建置企業雲的熱門選項。然而企業仍著重於服務的不中斷性，也就是雲端的高可用性(High Availability, HA)，然而OpenStack對於使用者之虛擬機器並沒有一套完整的HA機制。而本研究首先提出軟體定義運算叢集(Software-Defined High Availability Cluster, SDHAC)的機制，透過邏輯性地切割運算資源成多個不同之SDHAC，並根據不同需求設置每個叢集之HA策略，使管理者能夠更輕易地管理與分配雲端資源。本研究基於SDHAC之上，針對叢集內部之運算節點與虛擬機器發展一套自動化錯誤偵測與復原機制，除了監控運算節點之軟體服務狀態外，亦與IPMI(Intelligent Platform Management Interface)結合提供硬體層級的監控，像是作業系統、電源及硬體內部之溫度與電壓感測器，若偵測出錯誤則針對本研究提出之錯誤模型(Failure Model)進行復原程序。本研究提出之HA系統由於結合IPMI介面，因此大幅下降錯誤偵測之時間，並提供更完善之復原機制，提高了OpenStack針對虛擬機器之高可用性。

摘要(英)

In recent years, virtualized cloud computing has become more and more mature. Most enterprises decide to deploy their services on a virtualized cloud platform because of its elasticity and manageability. Compared to traditional computing platforms, the virtualized cloud platform can automatically adjust the computing resources in response to the change of users’ requirements. OpenStack is a popular virtualized cloud computing project that facilitates building such a cloud platform, where computations are carrying on virtual machines. In the past, we have proposed and implemented a cloud platform that supports the concept of Software-Defined High Availability Cluster (SDHAC), to address the problem of cloud platform availability and manageability. This mechanism can logically divide the computing pool into multiple HA clusters, and the administrators can apply different HA policies to different software-defined HA clusters according to different demands. This research focuses on the issue of fast failure detection and recovery on a platform with Software-Defined High Availability Clusters. The proposed system supports the use of IPMI machines, which are the computers with the interface for fast hardware state detection, and therefore it can efficiently identify the root cause of a failure. In addition, our proposed system provides a complete set of recovery features such as VM recovery and machine recovery when IPMI is used. Our experimental results show that, the proposed system with IPMI machines can achieve higher availability than the traditional system with the heart-beating failure-detection approach.

關鍵字(中)

★ 雲端運算
★ OpenStack
★ 高可用性
★ 虛擬機器
★ IPMI
★ 軟體定義運算叢集

關鍵字(英)

★ Cloud Computing
★ OpenStack
★ High Availability
★ Virtual Machine
★ IPMI
★ Software-Defined High Availability Cluster

論文目次

摘要 I
Abstract II
目錄 III
圖目錄 IV
表目錄 V
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與實作目標 4
1-3 研究貢獻 5
1-4 論文架構 6
第二章相關研究 7
2-1 背景知識 7
2-1-1 Intelligent Platform Management Interface 7
2-1-2 OpenStack 9
2-1-3 OpenStack HA機制 10
2-2 高可用性相關研究 12
2-2-1 VMware vSphere HA 12
2-2-2 相關文獻探討 13
第三章系統設計 15
3-1 系統架構模型 15
3-2 軟體定義高可靠度叢集 17
3-3 錯誤偵測與復原機制 18
3-3-1 錯誤偵測機制 19
3-3-2 錯誤復原機制 23
3-4 與其他系統比較與討論 33
3-5 與OpenStack Horizon結合 34
第四章實驗環境及測量 38
4-1 實驗環境及架構 38
4-2 實驗環境假設 39
4-3 實驗案例 40
4-4 實驗結果 43
第五章結論 49
參考文獻 50

參考文獻

[1] A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, et al., ”Above the clouds: A Berkeley view of cloud computing,” Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, vol. 28, p. 2009, 2009.
[2] Y. Jadeja and K. Modi, ”Cloud computing - concepts, architecture and challenges,” in International Conference on Computing, Electronics and Electrical Technologies, 2012, pp. 877-880.
[3] S. N. T.c. Chiueh and S. Brook, ”A survey on virtualization technologies,” in Rpe Report, 2005, pp. 1-42.
[4] P. Mell and T. Grance, (2011), The NIST definition of cloud computing [Special Publication]. Available: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
[5] K. Jackson and C. Bunch, OpenStack Cloud Computing Cookbook - Second Edition. Birmingham, UK: Packt Publishing, 2013.
[6] K. Pepple, Deploying OpenStack: O′Reilly Media, Inc., 2011.
[7] RightScale, (2017), 2017-State-of-the-Cloud-Report [Online]. Available: http://assets.rightscale.com/uploads/pdfs/RightScale-2017-State-of-the-Cloud-Report.pdf
[8] M. Toeroe and F. Tam, Service availability: principles and practice: John Wiley & Sons, 2012.
[9] Ponemon, (2016), Cost of Data Center Outages [Online]. Available: http://datacenterfrontier.com/cost-of-data-center-outages/
[10] C.Y. Cheng, Z.J. Su, C.C. Chen, S.J. Chen, and W.J. Wang, ”Supporting Software-Defined HA Clusters on OpenStack Platform,” in IEEE Applied System Innovation Conf. ICASI ′17, Sapporo, Japan, May 2017.
[11] A. Oliner and J. Stearley, ”What Supercomputers Say: A Study of Five System Logs,” in Proceedings of the 37th Annunal IEEE/IFIP International Conference on Dependable Systems and Networks. DSN ′07, Washington, DC, USA, 2007, pp. 575-584.
[12] K. V. Vishwanath and N. Nagappan, ”Characterizing cloud computing hardware reliability,” in Proceedings of the 1st ACM symposium on Cloud computing, 2010, pp. 193-204.
[13] C. Minyard, (2006), IPMI – A Gentle Introduction with OpenIPMI [Online]. Available: http://openipmi.sourceforge.net/IPMI.pdf
[14] A. Babu, (2006), GNU FreeIPMI User’s Guide [Online]. Available: ftp://ftp.gwdg.de/pub/gnu/www/savannah-checkouts/gnu/freeipmi/freeipmi.pdf
[15] T. T. Murphy, (2004), Managing Dell PowerEdge Servers Using IPMItool [Online]. Available: https://www.dell.com/downloads/global/power/ps4q04-20040204-murphy.pdf
[16] OpenStack High Availability Guide web site. [Online]. Available: https://docs.openstack.org/ha-guide/
[17] F. Haas, ”Ahead of the pack: the pacemaker high-availability stack,” Linux Journal, vol. 2012, p. 4, 2012.
[18] Corosync web site. [Online]. Available: http://corosync.github.io/corosync/
[19] L. Ellenberg, A. Grünbacher, F. Haas, B. Hellman, R. Kammerer, P. Marek, et al., (2016), The DRBD 9 User’s Guide [Online]. Available: https://www.linbit.com/en/resources/documentation/535-drbd-users-guide-9-0/
[20] Libvirt web site. [Online]. Available: https://libvirt.org/
[21] Keepalived web site. [Online]. Available: http://www.keepalived.org/
[22] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, ”Ceph: A scalable, high-performance distributed file system,” in Proceedings of the 7th symposium on Operating systems design and implementation, Seattle, Washington, 2006, pp. 307-320.
[23] A. Muller, S. Wilson, D. Happe, G. J. Humphrey, and R. Troupe, Virtualization with VMware ESX Server: Syngress, 2005.
[24] M. Potheri, G. B. Fritz, and P. Gupta, (2015), VMware vCenter Server™ 6.0 Availability Guide [Online]. Available: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vcenter-server-6-0-availability-guide-white-paper.pdf
[25] VMware Tools web site. [Online]. Available: https://www.vmware.com/support/ws55/doc/new_guest_tools_ws.html
[26] P. Heidari, M. Hormati, M. Toeroe, Y. Al Ahmad, and F. Khendek, ”Integrating Open SAF High Availability Solution with OpenStack,” in Services (SERVICES), 2015 IEEE World Congress on, 2015, pp. 229-236.
[27] Y. Yamato, Y. Nishizawa, S. Nagao, and K. Sato, ”Fast and reliable restoration method of virtual resources on OpenStack,” IEEE Transactions on Cloud Computing, vol. PP, pp. 1-1, 2015.
[28] F. F. Moghaddam, A. Gherbi, and Y. Lemieux, ”Self-healing redundancy for openstack applications through fault-tolerant multi-agent task scheduling,” in Cloud Computing Technology and Science (CloudCom), 2016 IEEE International Conference on, 2016, pp. 572-577.
[29] Init Process web site. [Online]. Available: https://help.ubuntu.com/community/KnowThyUbuntu#The_Init_Process
[30] C.D. Lu. (, 2005), Scalable diskless checkpointing for large parallel systems [Online]. Available: https://www.ideals.illinois.edu/bitstream/handle/2142/11054/Scalable%20Diskless%20Checkpointing%20for%20Large%20Parallel%20Systems.pdf?sequence=2&isAllowed=y
[31] FIVE NINES: CHASING THE DREAM? [Online]. Available: http://www.continuitycentral.com/feature0267.htm

指導教授

王尉任(Wei-Jen Wang)

審核日期

2017-8-15

推文