Cluster analysis is an important technique in the field of machine learning. Data clustering allows units with similar characteristics to be clustered together in order to learn useful or implicit information. However, current mainstream cluster analysis algorithms need to analyze the whole dataset comprehensively to obtain the best parameters, which makes it difficult to process large-scale dataset.
This study proposes a distributed correlation-based clustering mechanism based on unsupervised learning. If neighboring data points in the same group are similar, then they can be related to more data points to form a complete cluster according to this characteristic. In processing the data, a large-scale dataset can be disassembled and distributed to multiple computers to calculate the correlation between any two pieces of data in parallel, and then the results are filtered and aggregated into a cluster.
This study uses 2D graphics, Go game (Weiqi) analysis, and medical data as experimental data, and similarity calculations are developed according to the data types. The experimental results show the ability of this clustering mechanism to handle large-scale dataset. This clustering mechanism provides advantages such as good execution performance, accuracy, variability, applicability, and ease of use. |
