摘要(英) |
In the previous works, most of the clustering methods can’’t give each of the clusters a semantic description. There is a novel clustering method, CLTree, which can solve such a problem. However, the attributes which are used in clustering are the same as the ones used in establishing the semantic descriptions. We may meet a situation that the attributes used in clustering are different from the ones in establishing semantic descriptions. For example, when redesigning the architecture of a web site, we may use the browsing log to be the clustering attributes since the pages which are often access together imply that they have more similar properties. At the same time, we would choose other attributes, such as subjects, keywords, or last modified time of web pages, to build the hierarchical directory because browsing log is meaningless in interpreting the website’’s architecture. Note that we use classification attributes denote attributes which can used in establishing semantic descriptions.
In this paper, we extend the concept of CLTree and develop three clustering algorithms with ability of semantic description. These algorithms can use different classification attributes and clustering attributes. |
參考文獻 |
[1] A.K. Jain, M.N. Murty, and P.J. Flynn, Data clustering: a review, ACM Computing Surveys, 31(3):264--323, 1999.
[2] B. Liu, Y. Xia, and P. Yu, Clustering through decision tree construction, In SIGMOD-00, 2000.
[3] C.H. Cheng, A.W. Fu, and Y. Zhang, Entropy-based subspace clustering for mining numerical data, KDD-99, 84-93, 1999.
[4] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
[5] J.R. Quinlan, C4.5 : Programs for Machine Learning, Morgan Kaufmann, 1993.
[6] J.R. Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research, 4:77-90, 1996.
[7] M. Halkidi, Y. Batistakis, M. Vazirgiannis, Clustering algorithms and validity measures, Tutorial paper, Proceedings of SSDBM Conference, 3 -22, Virginia, USA, 2001.
[8] M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms, IEEE, 2002.
[9] N. Ye, and X. Li, A scalable, incremental learning algorithm for classification problems, Computers & Industrial Engineering Journal, 43(4):677-692, 2002.
[10] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining aplications, In Proc. of the ACM SIGMOD, 1999. |