dc.description.abstract | The improvement of information technology provides scientific observations of high quality that demand larger storage space and faster data processing power than ever before. However, it also massively increases the cost of the corresponding management and analytical processes. Thus, it becomes impractical to process tera-bytes of data using traditional approaches. From the perspective of astronomical data processing, the most important challenges are: 1. To maintain large amount of data with lower cost and overhead, 2. To locate and to extract desired data from a huge collection of data pool in a reasonable time, 3. To develop new analysis methods for large-scale of data based on distributed environment, and 4. To use a flexible architecture that can adapt into different situation quickly and decrease the overhead of development. Even though the existing distributed computing techniques, such as grid and cloud technologies, have provided the scientists a better way to access powerful computing resources, the development of big-data management and analysis software is still lagging far behind. The awkward predicament obstructs the connected computing resources from being utilized efficiently. To deal with the problem, we used integrated, efficient information management and analysis system for astronomical data processing. Therefore, this study focuses on the development of a management system design as well as the distributed classification and clustering methods for efficient data analysis in various astronomical application.
The proposed system can be viewed as a integrated system that supports management and analysis of large data collections. It consists of one data management sub-system and two analytical sub-systems. The first sub-system is called the Peer-to-Peer-Based Management System (P2PBMS), which adapt the Chord system design to construct a scalable platform for fast data retrieval and management. The second sub-system is called the Similarity Classification System (SCS), which uses a decentralized Multiple Classifier System (MCS) framework to provide fast and stable classification in a distributed environment using multiple classifiers. The last one is called the Distributed Hierarchical Clustering System (DHCS), which uses a distributed message-passing algorithm to efficiently calculate a hierarchical cluster, given a set of astronomical data.
The proposed integrated system can support large-scale data management and analysis for astronomical data processing. With the three sub-systems, we can provide necessary analytical tools and combination frameworks to deal with different kinds of complex analysis tasks. The Unit-Based structure can decrease the overhead of system customization for different purposes. | en_US |