NoSQL效能與穩定性之研究-以HBase為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：36

、訪客IP：18.224.33.184

姓名

劉一正(Yi-Cheng Liu) 查詢紙本館藏

畢業系所

資訊管理學系在職專班

論文名稱

NoSQL效能與穩定性之研究-以HBase為例

相關論文

★ 影響ERP導入過程及成效因素之研究 - 單一公司兩次導入SAP系統之比較分析	★ 運用資料倉儲技術建置物力動員資訊系統之開發
★ 買方採用自有電子市集之個案研究─以台塑企業為例	★ DEA模型評估經營效率之研究—以某綜合證券商為例
★ 尋求卓越：中小企業資訊部門的管理之個案研究	★ 「證券商共同網路交易平台」之可行性分析
★ 產業競合模式策略探討－以自行車產業為例	★ RFID導入航空貨運站出口作業流程應用之研究
★ 綠色供應鏈活動建構之個案研究-以筆記型電腦製造業為例	★ 導入資訊科技服務管理之評估-以遠東銀行為例
★ 資訊系統導入歷程中專案團隊決策衝突之探討	★ 應用資源基礎理論探討持久競爭優勢-以智慧型手機H公司為例
★ 服務導向架構為基礎的企業流程管理之探討 - 以瀚宇博德股份有限公司為例	★ 沙賓法案實施與企業遵循個案研究--以K公司為例
★ 資訊服務委外之個案分析－以銀行簡訊為例	★ 有線電視業者經營IPTV之競爭優勢分析—以個案公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

商業競爭益趨激烈的今天，如何貼近消費者需求，取得競爭優勢，無不是各家廠商競相努力的方向，近年來資料探勘與巨量資料Big Data變成一門顯學。如何萃取出資料中所隱含的資訊及價值，引起了廣大的關注。更精確的消費行為分析與預測，從顧客的各項資料中汲取有利於商業價值的資訊，以擬訂企業經營策略，強化競爭優勢，獲取最大利潤，巨量資料絕對是強而有力的工具。

巨量資料平台Hadoop技術雖帶來了許多優點，如可横向擴展能力、單一節點故障不影響整體叢集系統的好處之外，亦有其先天上的限制。本研究藉由個案分析方法，針對NoSQL Database HBase運行時，所遇到的效能與穩定性問題進行探討，透過彙整HBase實際運行時所遇到的問題，以系統化、結構化問題分析與解決方法，找出影響HBase運行時問題發生與效能不佳的關鍵性因子，並依據各個影響因子，擬訂可行的解決方案，進行評估與追蹤後續問題的再發生率，以確認所提出之解決方案的有效性。

研究結果顯示，HBase效能與穩定性會受到底層元件，如Hadoop平台的穩定性、Hadoop DataNode及作業系統相關參數設定影響之外，每一RegionServer所承載的region個數也是關鍵因子之一，資料索引鍵的設計對於讀取、寫入效能亦有重大的影響。本研究亦發現HBase效能與穩定性的問題發生與管理程序不當有很大的關係，故除了提出上述的系統性因素改善之外，亦針對系統參數一致性、版本控制與軟體佈署方式、異動作業程序管理、系統監控方式、緊急問題處理等管理程序提出改善建議，如此才能降低人為錯誤，建置更加穩定的叢集系統。

摘要(英)

Business competition has become more aggressive these days. Meeting consumer demand is the focus at which every company is aiming nowadays in order to win the competition. In recent years, data mining and big data have become a hot topic. How to extract valuable information from various databases to benefit business, conduct accurate analysis, and predict consumer behavior become critical while big date tools is helpful in all these. The tools can help business develop business strategies, strengthen competitive advantage, and maximize profits.

Although Hadoop, a big data platform, has brought many advantages, such as the scalability and the tolerance of single node failure, it has its disadvantage as well. This research is based on the case study of NoSQL Database HBase, which has been applied in a prominent foundry company in Taiwan. This research systematically studied the problems and tried to find out the factors that affected the system’s efficiency and stability and searched for solutions for each factor, and then evaluated the effectiveness of the solutions for the repeated problems.

The results of this study shows that HBase’s performance and stability will likely be impacted by the factors such as Hadoop platform stability, DataNode xceiver parameter, operation system parameters, the region count of each RegionServer. Also, the row key design will impact the read/write performance as well. This study also found the stability problem of HBase was related to the inappropriate process management. This suggests that to improve HBase’s performance and stability has to be performed not only from system level perspective but also from the management perspective. Through the improvement of management control like consistency of parameter, version control of parameters, deployment processes, change management, enhancing monitoring and emergency management can reduce human error and construct a more stable distributed cluster system.

關鍵字(中)

★ 巨量資料
★ 效能
★ 穩定性
★ 影響因子

關鍵字(英)

★ Big Data
★ Efficiency
★ Stability
★ Impact factor

論文目次

中文摘要 I
ABSTRACT II
誌謝 III
目錄 IV
圖目錄 VI
表目錄 VII
第一章緒論 1
1.1研究背景 1
1.2研究動機 2
1.3研究目的 4
1.4論文架構 6
第二章文獻探討 7
2.1 NoSQL及Hadoop運作架構 8
2.1.1 Hadoop 資料儲存元件 - HDFS 8
2.1.1 Hadoop 資料運算元件 - MapReduce 10
2.2 HBase運作架構 14
2.3 資料鍵設計對於讀寫效能影響 15
2.4 Region數對於效能與穩定性的影響 18
2.5 Hadoop DataNode與HBase穩定性間的影響 20
2.6 MECE問題分析原則 20
2.7 小節 21
第三章研究方法 22
3.1研究架構 22
3.2現況分析與問題彚整 24
3.3尋找關鍵因子 26
3.3.1問題分類與定義 26
3.4擬訂解決方案 28
3.5方案評估與成效 29
第四章 HBASE問題分析與效能影響 30
4.1現況分析與問題彙整 30
4.2尋找關鍵因子 33
4.3 擬訂解決方案 51
4.4方案評估與成效 57
第五章結論 63
5.1研究建議 64
5.2研究限制 65
5.3 未來研究方向 66

參考文獻

【中文文獻】
1. 王淑以 (2013/05/13)，「巨量資料爆紅概念股上雲端」，經濟日報。
2. 麥爾荀伯格Viktor Mayer-Schönberger、庫基耶Kenneth Cukier (2012)，大數據。台北：遠見天下文化出版股份有限公司。

【英文文獻】
1. Cattell, R. (2010), “Scalable SQL and NoSQL Data Stores”, SIGMOD Record, December 2010, 39(4), 12-27.
2. Chang, F., Dean, J., Ghemawat, S., Hsieh, C. W., Wallach, A. D., Burrows, M., Chandra, T., Fikes, A. and Gruber, E. R. (2006), “Bigtable: A Distributed Storage System for Structured Data”, OSDI′06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA.
3. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P. and Vogels, W. (2007), “Dynamo: Amazon’s Highly Available Key-value Store”, ACM SIGOPS Operating Systems Review, 41(6), 205-220.
4. Dimiduk, N. and Khurana, A. (2013), HBase in Action. Shelter Island: Manning Publications Co.
5. Gates, A. (2011), Programming Pig. Sebastopol: O’Reilly Media, Inc.
6. George, L. (2011), HBase: The Definitive Guide. Sebastopol: O’Reilly Media, Inc.
7. Jiang, Y. (2012), HBase Administration Cookbook. Birmingham: Packt Publishing Ltd.
8. Lublinsky, B., Smith, K. T. and Yakubovich, A. (2013), PROFESSIONAL Hadoop Solutions. Indianapolis: John Wiley & Sons, Inc.
9. Pokorny, J. (2013), “NoSQL databases: A Step to Database Scalability in Web Environment”, International Journal of Web Information Systems, 9(1), 69-82.
10. Sharma, V. and Dave, M. (2012), “SQL and NoSQL Databases”, International Journal of Advanced Research in Computer Science and Software Engineering, 2(8), 20-27.
11. Strauch, C. (2011), “NoSQL Databases”, Stuttgart Media University.
12. White, T. (2012), Hadoop: The Definitive Guide 3rd Edition. Sebastopol: O’Reilly Media, Inc.
13. Zburivsky, D. (2013), Hadoop Cluster Deployment. Birmingham: Packt Publishing Ltd.

【網路資料】
1. Apache HBase (2014), “The Apache HBase™ Reference Guide,” (accessed 2015/4/7 , available at: http://hbase.apache.org/book/book.html)
2. Cloudera Documentation, “Cloudera Installation and Upgrade,” (accessed 2015/01/23, available at, http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/installation.html).
3. Cloudera Documentation, “Issue Fixed in CDH5”, (accessed 2014/11/10, available at, http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_rn_fixed.html).
4. Cloudera Documentation, “New Features in CDH4,” (accessed 2014/5/20, available at: http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-2/CDH4-Release-Notes/cdh4rn_topic_2.html).
5. George, L. (2012), “Apache HBase + Apache Hadoop + Xceivers,” (accessed 2015/4/1, available at: http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/).
6. Hedlund, B. (2011), “Understanding Hadoop Clusters and the Network,” (accessed 2015/4/1, available at: http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/).
7. Spencer, T. (2013), “MECE Framework,” (accessed 2015/6/7, available at: http://www.spencertom.com/2013/01/30/mece-framework).
8. Whitehorn Mark (2006), “The parable of the beer and diapers,” (accessed 2015/4/1, available at: http://www.theregister.co.uk/2006/08/15/beer_diapers/).
9. WIKIPEDIA (2013), “CAP theorem,” (accessed 2015/4/1, available at: http://en.wikipedia.org/wiki/CAP_theorem).
10. WIKIPEDIA (2015a ), “Apache HBase,” (accessed 2015/4/1, available at: http://en.wikipedia.org/wiki/Apache_HBase).
11. WIKIPEDIA (2015b), “Apache Hadoop,” (accessed 2015/4/1, available at: http://en.wikipedia.org/wiki/Hadoop_Distributed_Filesystem).
12. WIKIPEDIA (2015c), “MECE principle,” (accessed 2015/6/7, available at: http://en.wikipedia.org/wiki/MECE_principle).
13. Wong, J. (2013), “Which Big Data Company has the World’s Biggest Hadoop Cluster?” (accessed 2015/4/2, available at: http://www.hadoopwizard.com/which-big-data-company-has-the-worlds-biggest-hadoop-cluster/).

指導教授

王存國

審核日期

2015-6-12

推文