姓名 謝佳昕(Jia-Shin Shie)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 以MapReduce進行交叉驗證整合大量天文資料
(Incorporating Astronomical Catalog by using Cross-Matching Algorithm with MapReduce)
摘要(中) 隨著科技的進步,在天文觀測時所使用的望遠鏡的功能也越來越強大,所觀測到的資訊更多、資料量也更大。增加了天文研究人員在進行研究時的困難,因此,本論文提出以交叉驗證(Cross-Matching)的方式,對大量的天文資料進行整合,以利於研究人員能快速的找出所需的資料。
摘要(英) Cross-Matching is a common way for find out the useful information from different star catalogs. Today hardware is more powerful than before. The data obtained through astronomical telescopes are becoming much larger. Therefore, single machine is not able to afford handling the astronomical data. In this paper, we use OpenStack to build a cloud computing environment, Hadoop as a distributed system, HDFS and HBase as distributed storages. Implement Cross-matching with MapReduce framework. In addition, Hbase supports random access so we make an incremental mechanism. User can update new astronomical data as they want. In the experiment, Transient is my test data to compare the operation time of using single machine with distributed system and using the same number of nodes on the physical machine with virtual machine. The result shows that using virtual machine is faster than using physical machine. Furthermore, we create 12 physical nodes on cloud environment to observe the operation time of different number of node. Theoretically, when we use more nodes to run the program the speed is much faster. The fact that the speeds of 10 nodes and 12 nodes are very similar.
關鍵字(中) ★ 大量資料
★ 雲端運算
★ 分散式系統
★ 交叉驗證
關鍵字(英) ★ Big Data
★ Cloud computing
★ Distributed system
★ Cross-Matching
論文目次 摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
一、 緒論 1
1-1 研究背景 1
1-2 研究動機與目的 2
1-3 章節介紹 3
二、 文獻探討 4
2-1 瞬變天文事件(Transient astronomical event) 4
2-2 OpenStack 5
2-3 Hadoop 7
2-4 MapReduce 8
2-5 NoSQL 8
三、 系統架構 10
3-1 雲端運算平台 10
3-2 HDFS檔案系統 11
3-3 HBase資料庫 12
3-4交叉驗證(Cross-Matching) 14
3-5 Clustering Stage 16
3-6 Cross Matching Stage 16
四、 研究方法 18
4-1 Clustering Stage 18
4-1-1 資料簡化、分群 18
4-2 Cross Matching Stage 20
4-2-1 交叉驗證 21
4-3 新增觀測資料 23
4-4 視覺化查詢介面 24
五、 實驗 27
5-1 Clustering Stage執行時間 29
5-1-1 Clustering Stage基於HDFS 29
5-1-2 Clustering Stage基於HBase 32
5-1-3 Clustering Stage於HDFS與HBase比較 35
5-2 Cross Matching Stage執行時間 38
5-2-1 Cross Matching Stage基於HDFS 39
5-2-2 Cross Matching Stage基於HBase 41
5-2-3 Cross Matching Stage之HDFS與HBase比較 44
5-3 新增觀測資料 48
六、 結論 50
參考文獻 52
指導教授 蔡孟峰(Meng-Feng Tsai)
