摘要(英) |
In the field of science, astronomy has a very important status. As the observation technology and hardware equipment in recent years continue to improve, so that researchers in the field of astronomy can do more diversified analysis, and the amount of data observed by astronomical telescope continue to increase, and has gradually increased to Petabyte level.
In this paper, a suffix tree system based on distributed sturcture of Hadoop is proposed to assist astronomers to classify variable stars. The system is designed with MapReduce and Spark frameworks. In the stage of constructing suffix tree, the system converts a large amount of data, which is the sequence of star brightness changing over time, into a suffix tree structure, then stores the tree in the distributed file system; the system also supports appending following observation data. Using the characteristics of the suffix tree allows users to query efficiently. Moreover, the query stage of the system introduces the hierarchical concept, which can adjust the preciseness of the data in the tree, allows the system to not only find out the similar sequence generated by observation or calculation errors but also provide more diversified query in response to different classification methods. According to different needs, astronomical researchers can select the preciseness of data to classify stars, and quickly find the ID of same or similar characteristics of the star. |
參考文獻 |
[1] Pan-STARRS, http://pan-starrs.ifa.hawaii.edu/public/
[2] 陳文屏, 「天文觀測的新挑戰─談泛星計畫」, 科儀新知, 第30卷第3期, 2008.
[3] Wikipedia, “variable star”, https://en.wikipedia.org/wiki/Variable_star
[4] Apache Hadoop, http://hadoop.apache.org/
[5] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System,” MSST, 2010.
[6] HDFS, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[7] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters” OSDI′04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
[8] Hadoop 101: Programming MapReduce with Native Libraries, Hive, Pig, and Cascading, http://blog.pivotal.io/pivotal/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading
[9] Apache Spark, https://spark.apache.org/
[10] Spark Cluster, https://spark.apache.org/docs/latest/cluster-overview.html
[11] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,” In NSDI, 2012.
[12] P. Weiner, ”Linear Pattern Matching Algorithm,” 14th Annual IEEE Symposium on Switching and Automata Theory, 1973.
[13] Min-Feng Wang, Chi-Sheng Huang*, Meng-Feng Tsai, Bo-Ru Song, Shin-Fu Su and Cheng-Hsien Tang, “Generalized Analysis of Message Propagation on Social Network,” International Journal of Future Generation Communication and Networking Vol. 5, No. 2, June, 2012.
[14] 沈敬軒, “Mining Similar Astronomical Sequence Pattern with Hierarchical Weighted Suffix Tree,” 國立中央大學, 碩士論文, 2011.
[15] 張哲嘉, “Distributed Suffix Tree Based Sequential Pattern Management System for Astronomical Analysis,” 國立中央大學, 碩士論文, 2013.
[16] 蔡昀翰, “Distributed Astronomy Sequential Pattern Analysis System Using Hadoop Platform with Weighted Suffix Tree,” 國立中央大學, 碩士論文, 2015. |