摘要(英) |
Cross-Matching is a common way for find out the useful information from different star catalogs. Today hardware is more powerful than before. The data obtained through astronomical telescopes are becoming much larger. Therefore, single machine is not able to afford handling the astronomical data. In this paper, we use OpenStack to build a cloud computing environment, Hadoop as a distributed system, HDFS and HBase as distributed storages. Implement Cross-matching with MapReduce framework. In addition, Hbase supports random access so we make an incremental mechanism. User can update new astronomical data as they want. In the experiment, Transient is my test data to compare the operation time of using single machine with distributed system and using the same number of nodes on the physical machine with virtual machine. The result shows that using virtual machine is faster than using physical machine. Furthermore, we create 12 physical nodes on cloud environment to observe the operation time of different number of node. Theoretically, when we use more nodes to run the program the speed is much faster. The fact that the speeds of 10 nodes and 12 nodes are very similar. |
參考文獻 |
[1] Pastorello, A., Smartt, S. J., Botticella, M. T. (Including Urata, Y.), Ultra-bright Optical Transients are Linked with Type Ic Supernovae ,The Astrophysical Journal, v. 724, pp. L16, (2010)
[2] Palomar Transient Factory, http://www.ptf.caltech.edu/
[3] Pan-Stars Project, http://pan-starrs.ifa.hawaii.edu/public/
[4] OpenStack, https://www.openstack.org/
[5] Hadoop, http://hadoop.apache.org/
[6] Sachin Puttur: Big Data: Overview of apache Hadoop, http://www.sachinpbuzz.com/2014/01/big-data-overview-of-apache-hadoop.html
[7] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI′04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
[8] The Truth About MapReduce Performance on SSDs,
http://blog.cloudera.com/blog/2014/03/the-truth-about-mapreduce-performance-on-ssds
[9] J. Bhogal, I. Choksi, “Handling Big Data using NoSQL”, Advanced Information Networking and Applications Workshops (WAINA), pp. 393-398, 2015.
[10]HDFS, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[11]HBase, https://hbase.apache.org/
[12] The Khangaonkar Report, http://khangaonkar.blogspot.tw/2013/04/using-hbase-part-2-architecture.html
[13] Big data, http://hadoopbigdatas.blogspot.tw/2013/03/hbase-architecture.html
[14] M. A. Nieto-Santisteban, A. R. Thakar, and A. S. Szalay. Cross-matching very large datasets. In NSTC NASA Conference,2007
[15] VizieR, http://vizier.u-strasbg.fr
[16] Simbad, http://simbad.u-strasbg.fr
[17] Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui,Liqiang Lv, and Jian Xiao. A Paralleled Large-Scale Astronomical Cross-Matching Function
[18] Transient astronomical event, https://en.wikipedia.org/wiki/Transient_astronomical_event
[19] 山東大學 張夏旭, The Design and Implementation of Multi-stars Storage and Cross match Based on Hadoop.
[20] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters
[21] María A. Nieto-Santisteban, Aniruddha R. Thakar, and Alexander S. Szalay. Cross-Matching Very Large Datasets
[22] Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui,Liqiang Lv, and Jian Xiao. A Paralleled Large-Scale Astronomical Cross-Matching Function
[23] S.Sathya, Prof. M.Victor Jose. Application of Hadoop MapReduce Technique to
Virtual Database System Design
[24] Cuncang Mi, Qian Chen, Taoying Liu. An Efficient Cross-Match Implementation based on Directed Join Algorithm in MapReduce
[25] Hot Spot, http://hbase.apache.org/0.94/book/casestudies.perftroub.html |