在分散式雲端平台上對不同巨量天文應用之資料區域性適用策略研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：76

、訪客IP：3.133.140.2

姓名

黃祈勝(Chi-Sheng Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

在分散式雲端平台上對不同巨量天文應用之資料區域性適用策略研究
(Analysis for Big-Data Locality Strategies on Various Distributed Astronomical Researches Using Cloud Platform)

相關論文

★ 應用自組織映射圖網路及倒傳遞網路於探勘通信資料庫之潛在用戶	★ 基於社群網路特徵之企業電子郵件分類
★ 行動網路用戶時序行為分析	★ 社群網路中多階層影響力傳播探勘之研究
★ 以點對點技術為基礎之整合性資訊管理及分析系統	★ 應用資料倉儲技術探索點對點網路環境知識之研究
★ 從交易資料庫中以自我推導方式探勘具有多層次FP-tree	★ 建構儲存體容量被動遷徙政策於生命週期管理系統之研究
★ 應用服務探勘於發現複合服務之研究	★ 利用權重字尾樹中頻繁事件序改善入侵偵測系統
★ 有效率的處理在資料倉儲上連續的聚合查詢	★ 入侵偵測系統：使用以函數為基礎的系統呼叫序列
★ 有效率的在資料方體上進行多維度及多層次的關聯規則探勘	★ 在網路學習上的社群關聯及權重之課程建議
★ 在社群網路服務中找出不活躍的使用者	★ 利用階層式權重字尾樹找出在天文觀測紀錄中變化相似的序列

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來巡天計畫觀測技術的進步，為了探索小行星軌跡及觀察瞬變天文事件等天文現象，所儲存的天文數據資料量已達到了PB等級，利用巨量資料分析已經成為天文研究的趨勢。在本研究中的三個天文應用模組，基於位置的天體查詢(DSSIS)；變星天文事件序列模式管理(DASPQS)；小行星軌跡延伸(DADS)，是依照不同天文資料特性與關鍵的索引結構，透過分散式運算，輔以雲端環境之儲存系統，來加快對觀測資料儲存與處理的速度，供天文學家可以進行後續的分析與維護，本研究透過結合分散式運算的方法所執行的時間比在單機上減少了98％。然而使用索引結構會影響資料存取及運算的比例，針對不同的應用對資料儲存與計算行為皆有不同資源分配需求，且在分散式節點進行運算時，會有I/O負載問題的情況產生，因此，必須考慮索引結構與資料結構並配合資料區域性才會有突顯的效果，本研究運用OpenStack與Hadoop作為雲端運算平台，透過不同雲端環境的參數，解析對於運算效率的影響程度，進行資料區域性的策略探討。實驗結果表明，所提出的資料區域性策略Hybrid locality比傳統方法（Node locality）所執行運算的速度提升了13-50％，因此本研究所提出的適用性策略，能夠有效地解決分散式雲端平台上的I/O負載問題，進而提升運算的效能。

摘要(英)

Recent advances in astronomical observation technology have led to the collection of Petabytes of data. Such massive datasets warrant a big-data approach to analysis. Numerous recent projects have involved the construction of advanced telescopes and their use to survey the sky, to obtain data on asteroid movements and transient astronomical events. Astronomical researchers use various methods to analyze the observational datasets. This study is concerned with three data access models in the field of astronomy, which are location-based queries about celestial objects, management of sequential pattern of event of variable stars and asteroid track linkages. Astronomers’ research can be accelerated using cloud computing and data indexing technologies. We provide appropriate distributed systems – DSSIS, DASPQS and DADS to deal with corresponding problems. Each methodology has associated index structure or intermediate result. The best of our experimental results reveals that the distributed approach reduced the execution time by 98% below that required when running on a single host. However, index usage substantially affects data access behavior, and especially the ratio between the storage and computation. Various applications have different costs in terms of storage space and processing time, so that it causes a significant number of non-local tasks which data are accessed across nodes through networks. Accordingly, this research considers data locality on a cloud platform (Hadoop on OpenStack) using different approaches of critical index on astronomical applications. The performance of applications is discussed with reference to specific parameters concerning cloud environments, and a strategy for data locality is proposed. Experimental results herein demonstrate that the proposed strategy, Hybrid locality reduced the execution time of these applications by up to 13-50% from that achieved using a conventional method (Node locality.) Hence, our proposed strategies for data locality provide a great performance improvement.

關鍵字(中)

★ 巨量資料
★ 天文計算
★ 雲端運算
★ 分散式系統
★ 科學計算
★ 資料探勘
★ 資料區域性

關鍵字(英)

★ Big Data
★ Astronomical Calculation
★ Cloud Computing
★ Distributed System
★ Scientific Computing
★ Data Mining
★ Data locality

論文目次

摘要 i
Abstract ii
Table of Contents iv
List of Figures vi
List of Tables viii
Description of symbols ix
1. Introduction 1
1-1 Background 1
1-2 Target Applications 3
1-3 Motivation 5
2. Related Works 7
2-1 Discovery of Asteroid 7
2-2 Pan-STARRS 7
2-3 Palomar Transient Factory 8
2-4 Hierarchical Triangular Mesh 9
2-5 Suffix Tree 15
2-6 Hough Transform 16
2-7 Hadoop 17
2-7-1 Hadoop Distributed File System 17
2-7-2 MapReduce 18
2-8 Spark 18
2-9 NoSQL 20
2-10 OpenStack 20
3. Data Locality Strategy and Experimental Environments 22
3-1 Architecture of Cloud Platform 22
3-2 Experimental Environments 26
3-3 Strategy of Data Locality 28
4. Decentralized Spherical Spatial Indexing System 30
4-1 HTM Index Stage 30
4-2 Query Stage 33
4-2-1 Point Query 34
4-2-2 Range Query 35
4-3 Experimental Results 36
5. Distributed Astronomy Sequential Pattern Query System 39
5-1 IC Stage 40
5-2 Multilevel Structure of Angular Interval 42
5-3 PQ Stage 43
5-3 Experimental Results 46
6. Distributed Asteroid Discovery System 50
6-1 Additive property of the Linear Hough Transform 52
6-2 Preprocessing by the KD-tree 54
6-3 HT Stage 55
6-4 DL Stage 56
6-5 Experimental Results 59
7. Conclusions 62
References 64

參考文獻

[1] D. Finkenthal, B. Greco, R. Halsey, L. Pena, S. Rodecker, et al., “Introduction to the electromagnetic spectrum,” General Atomic, 1996.
[2] A. Mahabal, S.G. Djorgovski, R. Williams, A. Drake, C. Donalek, et al., “Towards Real-time Classification of Astronomical Transients, ” AIP Conference Proceedings, 2008, vol. 1082, pp. 287-293, 2008.
[3] S. G.Djorgovski et al., “Flashes in a star stream: Automated classification of astronomical transient events,” in 2012 IEEE 8th International Conference on E-Science, 2012, pp. 1-8.
[4] A. Corradi, L. Foschini, V. Pipolo and A. Pernafini, “Elastic provisioning of virtual Hadoop clusters in OpenStack-based clouds”, in Communication Workshop (ICCW), 2015 IEEE International Conference on, 2015, pp. 1914-1920.
[5] S.J. Yang, and Y.R. Chen, “Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds,” Journal of Network and Computer Applications, vol. 57, pp. 61-70, 2015.
[6] M. Díaz, C. Martín, and B. Rubio, “State-of-the-art, challenges, and open issues in the integration of Internet of things and cloud computing,” Journal of Network and Computer Applications, vol. 67, pp. 99-117, 2016.
[7] S. H. H. Madni, M. S. A. Latiff, Y. Coulibaly, S. M. Abdulhamid, “Resource scheduling for infrastructure as a service (IaaS) in cloud computing: Challenges and opportunities,” Journal of Network and Computer Applications, vol. 68, pp. 173-200, 2016.
[8] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, 2010, pp. 1-10.
[9] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, et al., “Improving MapReduce performance through data placement in heterogeneous Hadoop clusters”, in Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, 2010, pp. 1-9.
[10] A. Mesmoudi, M.S. Hacid, F. Toumani, “Benchmarking SQL on MapReduce systems using large astronomy databases,” Distributed and Parallel Databases, vol. 34, no. 3, pp. 347-378, 2016.
[11] J. Dean and S. Ghemawat “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[12] L. Gu and H. Li “Memory or time: performance evaluation for iterative operation on hadoop and spark,” in High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on,. 2013, IEEE, pp. 721-727.
[13] P. Basanta-Val, N. Fernández García, A. J. Wellings, N. C. Audsley, “Improving the predictability of distributed stream processors,” Future Generation Computer Systems, vol. 52, pp. 22-36, 2015.
[14] P. Basanta-Val, N. C. Audsley, A. J. Wellings, I. Gray, N. Fernandez-Garcia, “Architecting Time-Critical Big-Data Systems, ” IEEE Transactions on Big Data, vol. 2, no. 4, pp. 310-324, 2016.
[15] S. D. Ross, “Near-earth asteroid mining,” Space Industry Report, Department of Control and Dynamical Systems, Caltech, CA, 2001.
[16] M. Elvis. “How Many Ore-Bearing Asteroids? ”, Planetary and Space Science, vol. 91, pp. 20-26, 2014.
[17] D.G. Andrews , K.D. Bonner, A.W. Butterworth, H.R. Calvert, B.R. H. Dagang, et al., “Defining a successful commercial asteroid mining program”, Acta Astronautica, vol. 108, pp. 106-118, 2015.
[18] A.S. Szalay, J. Gray, G. Fekete, P. Kunszt, P. Kukol, A. Thakar, “Indexing the sphere with the hierarchical triangular mesh,” Technical Report MSR-TR- 2005-123, 2005.
[19] M.F. Wang, C.S. Huang, M.F. Tsai, B.R. Song, S.F. Su, C.H. Tang, “Generalized Analysis of Message Propagation on Social Network,” International Journal of Future Generation Communication and Networking, vol. 5, no. 2, 2012.
[20] R. Duda and P. Hart, “Use of the Hough Transformation to Detect Lines and Curves in Pictures,” Communications of the ACM, vol. 15, no. 1, pp. 11-15, Jan. 1972.
[21] C.S. Huang, M.F. Tsai, P.H. Huang, L.D. Su, K.-S. Lee, “Distributed Asteroid Discovery System for Large Astronomical Data,” Journal of Network and Computer Applications, vol.93, pp. 27-37, 2017.
[22] C.L. Carilli and S. Rawlings, “Science with the Square Kilometre Array: Motivation, key science projects, standards and assumptions”, New Astronomy Reviews, vol. 48, no. 11-12, pp. 979-984, 2004.
[23] P. Huijse, P. Estevez, P. Protopapas, J. Principe and P. Zegers, “Computational intelligence challenges and applications on large-scale astronomical time series databases,” IEEE Comput. Intell. Mag., vol. 9, no. 3, pp. 27-39, 2014.
[24] Z.D. Stephens, S.Y. Lee, F. Faghri, R.H. Campbell, C. Zhai, et al., “Big Data: astronomical or genomical?,” PLoS Biol, vol. 13, no. 7, p. e1002195, 2015.
[25] W. Wang, K. Zhu, L. Ying, J. Tan, L. Zhang, “MapTask scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality,” IEEE/ACM Trans. Netw., vol. 24, no. 1, pp. 190-203, 2016.
[26] M. Sun, H. Zhuang, X. Zhou, K. Lu, C. Li, “HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters,” in International Conference on Algorithms and Architectures for Parallel Processing, 2014, pp. 82-95.
[27] W. Wang, M. Barnard, L. Ying, “Decentralized scheduling with data locality for data-parallel computation on peer-to-peer networks,” in Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, 2015, pp. 337-344.
[28] Q. Xie and Y. Lu, “Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality,” in Computer Communications (INFOCOM), 2015 IEEE Conference on, 2015, pp. 963-972.
[29] W. Wang and L. Ying, “Data locality in MapReduce: A network perspective,” Performance Evaluation, vol. 96, pp. 1-11, 2016.
[30] X. Bu, J. Rao, C.Z. Xu, “Interference and locality-aware task scheduling for MapReduce applications in virtual clusters,” in Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, 2013, pp. 227-238.
[31] R. Sun, J. Yang, Z. Gao, Z, He, “A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop,” in Computer and Information Science (ICIS), 2014 IEEE/ACIS 13th International Conference on, 2014, pp. 297-302.
[32] X. Ma, X. Fan, J. Liu, H. Jiang, K. Peng, “vLocality: Revisiting Data Locality for MapReduce in Virtualized Clouds,” IEEE Network, vol. 31, no. 1, pp. 28-35, 2017.
[33] S. Ibrahim, H. Jin, L. Lu, L. Qi, S. Wu, X. Shi, “Evaluating MapReduce on Virtual Machines: The Hadoop Case,” in IEEE International Conference on Cloud Computing, 2009, pp. 519-528.
[34] S. Moon, J. Lee, and Y. S. Kee, “Introducing SSDs to the Hadoop MapReduce Framework,” in Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, 2014, pp. 272-279.
[35] Y.H. Tsai, “Distributed Astronomy Sequential Pattern Analysis System Using Hadoop Platform with Weighted Suffix Tree,” master′s thesis, Department of Computer Science and Information Engineering, National Central University, Taiwan, 2015.
[36] H.C. Chan, “Distributed Hierarchical Triangular Mesh Index Base on Hadoop,” master′s thesis, Department of Computer Science and Information Engineering, National Central University, Taiwan, 2016.
[37] K. Kralevska, D. Gligoroski and H. Øverby, “Balanced locally repairable codes,” n Turbo Codes and Iterative Information Processing (ISTC), 2016 9th International Symposium on, 2016, pp. 280-284.
[38] L.D. Su, “Large Scale Sequential Pattern Mining based on Distributed Hierarchical Suffix Tree,” master′s thesis, Department of Computer Science and Information Engineering, National Central University, Taiwan, 2017.
[39] J. Kubica, L. Denneau, T. Grav, J. Heasley, R. Jedicke, et al., “Efficient intra-and inter-night linking of asteroid detections using kd-trees,” Icarus, vol. 189, no. 1, pp. 151-168, 2007.
[40] L. Denneau et al., “The Pan-STARRS Moving Object Processing System,” Publications of the Astronomical Society of the Pacific, vol. 125, no. 926, pp. 357-395, Apr.2013.
[41] P. Vereš et al., “Absolute magnitudes and slope parameters for 250,000 asteroids observed by Pan-STARRS PS1--Preliminary results,” Icarus, vol. 261, pp. 34-47, 2015.
[42] T. M. Brown et al., “Las Cumbres Observatory Global Telescope Network,” Publ. Astron. Soc. Pacific, vol. 125, no. 931, pp. 1031-1055, 2013.
[43] N.M. Law, S.R. Kulkarni, R.G. Dekany, E.O. Ofek, R.M. Quimby, et al., “The Palomar Transient Factory: System Overview, Performance, and First Results,” Publ. Astron. Soc. Pacific, vol. 121, pp. 1395-1408, 2009.
[44] A. Rau, S.R. Kulkarni, N.M. Law, J.S. Bloom, D. Ciardi, et al. “Exploring the Optical Transient Sky with the Palomar Transient Factory,” Publications of the Astronomical Society of the Pacific, vol. 121, no. 886, pp.1334-1351, 2009.
[45] C.K. Chang, W.H. Ip, H.W. Lin, Y.C. Cheng, C.C. Ngeow, et al., “Asteroid Spin-rate Study Using the Intermediate Palomar Transient Factory,” The Astrophysical Journal Supplement Series, vol. 219, no. 2, p. 27, 2015.
[46] J. Gray, A. Szalay, and G. Fekete, “Using table valued functions in SQL Server 2005 to implement a spatial data library,” Technical Report MSR-TR-2005-122, 2005.
[47] Z. Lv et al., “Spatial indexing of global geographical data with HTM,” in Geoinformatics, 2010 18th International Conference on, 2010, pp. 1-6.
[48] P. Weiner, “Linear Pattern Matching Algorithm,” in Switching and Automata Theory, 1973. SWAT’08. IEEE Conference Record of 14th Annual Symposium on, 1973, pp. 1-11.
[49] P. Ambs, S. H. Lee, Q. Tian, Y. Fainman, “Optical implementation of the Hough transform by a matrix of holograms”, Applied Optics, vol.25, no. 22, pp. 4039-4045, 1986.
[50] C. Hollitt, “A convolution approach to the circle Hough transform for arbitrary radius,” Machine Vision and Applications, vol. 24, no.4, pp. 683-694, 2013.
[51] Y. Chen, W. Li, J. Li, T. Wang, “Novel parallel Hough Transform on multi-core processors,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, 2008, pp. 1457-1460.
[52] R. K. Satzoda, S. Suchitra, and T. Srikanthan, “Parallelizing the Hough Transform Computation,” IEEE Signal Process. Lett., vol. 15, pp. 297-300, 2008.
[53] S. S. Sathyanarayana, R. K. Satzoda, and T. Srikanthan, “Exploiting Inherent Parallelisms for Accelerating Linear Hough Transform,” IEEE Trans. Image Process., vol. 18, no. 10, pp. 2255-2264, 2009.
[54] Z. H. Chen, A. W.Y. Su, and M.T. Sun, “Resource-efficient FPGA architecture and implementation of hough transform,” IEEE Trans. Very Large Scale Integr. Syst., vol. 20, no. 8, pp. 1419-1428, 2012.
[55] X. Zhou, N. Tomagou, Y. Ito, and K. Nakano, “Efficient Hough transform on the FPGA using DSP slices and block RAMs”, in Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, 2013, pp. 771-778.
[56] X. Lu, L. Song, S. Shen, K. He, S. Yu and N. Ling, “Parallel Hough Transform-based straight line detection and its FPGA implementation in embedded vision,” Sensors, vol. 13, no. 7, pp. 9223-9247, 2013.
[57] T. White, “Hadoop: The definitive guide,” O’Reilly Media, Inc., 2012.
[58] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, “Learning spark: lightning-fast big data analysis,” O’Reilly Media, Inc., 2015.
[59] M. Zaharia et al. “Resilient distributed datasets: A fault-tolerant abstraction for in-memory,” Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012, p. 2.
[60] R. Smite, “Creative Networks.” Rearview Mirror of Eastern European History. Amsterdam. Institute of Network Cultures, 2012.
[61] L. Wang et al., “Cloud computing: a perspective study,” New Gener. Comput., vol. 28, no. 2, pp. 137-146, 2010.
[62] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, et al., “Apache hadoop yarn: Yet another resource negotiator”, Proc. 4th Annu. Symp. Cloud Comput. - SOCC ’13, pp. 1-16, 2013.

指導教授

蔡孟峰(Meng-Feng Tsai)

審核日期

2018-1-29

推文