摘要: | 社群發現是針對社群數據,進行分群的動作,以找出當中的群體。在過往的研究中,大多數的方法都是針對靜態時間下的數據進行分群,所以每個時間點的數據分群是獨立的,不會受到前後時間點的分群結果影響。然而,實際的社群是會經過時間的推移而演化,所以我們會想了解某一時間點的群如何受前後時間點的群影響,演化成後面時間點的群,以找出群的演化軌跡。意即我們要找出群演化的軌跡,希望每一個群都能平順的從上一時間點演化到下一時間點,因此本研究與過去研究不同的是我們要求前後時間點的群的差異能夠越小越好。根據上述的概念,我們提出了多時間區間 K-means 演算法來滿足傳統K-means 演算法的不足,我們的研究是限定在一固定的 T 個期間,每個時間期間都先各自使用 K-means 演算法把資料分為 K 群,分群完後我們會反覆調整每一時間區間的分群結果,每一個群根據前後時間的分群結果加以調整,除了要讓同一群體內的資料點具有最小的群內誤差外,也要使得前後時間點的相似群的差異能夠越小越好。最終,我們可以得到 K 個群體在 T 個期間的演化軌跡,我們的實驗證明這些不同期間的相對群體差異會減小,並且可以沿著時間產生較為平順的群體演化軌跡。;In a large data, it is very common to divide the large data into multiple clusters. Therefore, in the past research, most of the methods are clustering the data in static time, so the data clustering in a single time period is independent and is not affected by the previous or latter time period. In hence, the clustering results at different time period are inconsistent. However, the actual data will evolve over time, we would like to know how a cluster at a certain time point is affected by the time before and after, and evolves into a cluster at a later time point to find out the evolutionary trajectory. That means we need to find out the trajectory of cluster evolution, and hope that each cluster can smoothly evolve from the previous time point to the next. Therefore, this study is different from previous studies in that we require that the clusters difference between adjacent time be as small as possible. Based on the above concepts, we propose a Multi-time Periods K-means algorithm to meet the shortcomings of the traditional K-means algorithm. Our study is limited to a time length of T, the time T is divided into equal time period t, each time period is performing K-means algorithm first. After clustering, we will adjust the results of each time interval repeatedly. Each cluster will adjusted according to the clustering results of the time before and after. Therefore, our research hopes that the clusters within each time period have the smallest intra-group error and inter-group error. Finally, we can get the evolutionary trajectory of each cluster. Our experiments show that the relative clustering differences in different periods are reduced and that a smoother evolution trajectory can be generated over time. |