基於字詞關係動態建立階層分群

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：71

、訪客IP：3.138.122.162

姓名

陳信夫(Hsin-fu Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

基於字詞關係動態建立階層分群
(Dynamic Hierarchical Clustering Based on Taxonomy)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

資訊爆炸時代的來臨，越來越多使用者在網路上搜尋相關資料進行閱讀。本研究目標是將大量文件資料進行階層分群（Hierarchical Clustering），並以字詞關係建置具有上下包含關係的分類學（Taxonomy），以用來成為階層群集的標籤。運用上，能方便使用者快速瞭解文件集有哪些主題，迅速選擇所需主題的文件進行閱讀。本研究提出的系統架構有效地改善了階層群集研究上的五個議題：高維度的向量、動態的特徵選取與文件分群、文件處理順序、文件跨領域分群與群集標籤之間的關係。

摘要(英)

With the popularity of Internet, the World Wide Web contains a giant amount of information. To search relevant information from large number of texts becomes a challenge to the users. Hierarchical clustering is one of the methods to conquer this problem. Because its features let users can browse the topic gradually and find out the most relevant documents they have interesting. But there are still have some challenge in hierarchical clustering must be addressed, like high dimensionality of the data, dynamic data sets, the sensitivity of input order, documents has several concept, and the relationship of clusters and tags.
In this paper, we propose an approach of dynamic hierarchical clustering based on taxonomy to conquer those challenges. The experimental result shows that our method not only suitable for constructing hierarchical clustering in dynamic data sets, but also offer a easier structure to browse than traditional algorithms, BKM and UPGMA. In addition, the clusters are labeled meaningful tags with the relationship of containment can let users understand the whole concept of clusters rapidly.

關鍵字(中)

★ 階層分群演算法
★ 動態分群演算法
★ 分類學
★ 文件分群

關鍵字(英)

★ Dynamic clustering algorithm
★ Hierarchical clustering
★ Taxonomy

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
第一章緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 研究方法 3
1.4 論文架構 4
第二章文獻探討 5
2.1 特徵選取 5
2.1.1 詞彙頻率（Term Frequency, TF） 5
2.1.2 詞彙頻率與反向文件頻率（TF-IDF） 5
2.1.3 高頻項目集（Frequent Itemset） 7
2.1.4 資訊關聯（Mutual Information） 7
2.1.5 正規化谷歌距離（NGD, Normalized Google Distance） 8
2.2 分群演算法 9
2.2.1 切割式群集演算法 10
2.2.2 凝聚式階層分群演算法(Agglomerative Hierarchical Clustering) 11
2.2.3 分裂式階層分群演算法（Divisive Hierarchical Clustering） 13
2.3 分類學 15
2.3.1 詞彙句法樣式法（Lexico-syntactic Patterns） 16
2.3.2 機器可讀字典（Machine-readable Dictionaries） 17
2.3.3 資訊理論 (Information Theory) 18
2.4 小結 18
第三章系統設計與架構 19
3.1 系統架構 19
3.2 資料前處理 20
3.2.1 Part-of-speech and word combination 20
3.2.2 The length of the word 21
3.2.3 The number of Google search results 21
3.2.4 NGD Calculate 22
3.2.5 Ranking and Filtering 23
3.3文件概念分群 24
3.3.1 Updated Beta-similarity Graph 25
3.3.2 Updated Max-S Graph 26
3.3.3 Updated Star Cover 27
3.4 建置分類學 28
3.4.1 NGD Calculate 28
3.4.2 Conditional Probability Calculate 29
3.4.3 BTRank 30
3.5文件階層分群 33
第四章實驗結果與討論 36
4.1 資料集介紹 36
4.1.1 Wikipedia（維基百科） 36
4.1.2 MeSH（Medical Subject Headings） 37
4.1.3 Painters and Paintings 38
4.1.4 資料集與實驗的對應 39
4.2 評估方法 39
4.2.1 F1 score 39
4.2.2 Fβ score 40
4.2.3 FCubed 41
4.3 資料前處理實驗結果 43
4.4 建置分類學實驗結果 44
4.5 文件概念分群與文件階層分群實驗結果 46
4.6 階層結構分析 48
4.7 系統效能分析 49
4.7.1 時間複雜度 49
4.7.2 系統總體時間分析 50
第五章結論與未來研究方向 52
5.1 結論 52
5.2 未來研究方向 53
參考文獻 55
中文部分 55
英文部分 55
網頁部分 58

參考文獻

1. 王千豪（民96），基於近似詞彙樣式匹配與共現關聯度之文件分群，未出版碩士論文，私立大同大學資訊經營學系(所)。
2. 張家寧（民98），以概念萃取為基礎之文件分群與視覺化，未出版碩士論文，國立交通大學資訊科學與工程研究所。
3. 楊雅婷、阮明淑（民95）, 「分類相關概念之術語學研究」, 國家圖書館館刊, No. 2, 25-50。
4. 陳志豐（民97），基於高頻項目集結合近似樣式匹配之文件分群，未出版碩士論文，私立大同大學資訊經營學系(所)。
5. 潘麒全（民92），可修正的二分群集法，未出版碩士論文，私立中原大學資訊管理研究所。
6. Amigo, E., Gonzalo, J., Artiles, J., & Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr., 12(4), 461-486.
7. Beil, F., Ester, M., & Xu, X. (2002). Frequent term-based text clustering. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada.
8. Berland, M., & Charniak, E. (1999). Finding parts in very large corpora. Paper presented at the Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Maryland.
9. Caramia, M., Felici, G., & Pezzoli, A. (2004). Improving search results with data mining in a thematic search engine. Comput. Oper. Res., 31(14), 2387-2404.
10. Chen, P.-I., & Lin, S.-J. (2010). Automatic keyword prediction using Google similarity distance. Expert Systems with Applications, 37(3), 1928-1938.
11. Chung, S., & McLeod, D. (2005). Dynamic Pattern Mining: An Incremental Data Clustering Approach (pp. 85-112).
12. Cilibrasi, R. L., & Vitanyi, P. M. B. (2007). The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 19(3), 370-383.
13. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. Paper presented at the Proceedings of the 14th conference on Computational linguistics - Volume 2, Nantes, France.
14. Henschel, A., Woon, W. L., Wachter, T., & Madnick, S. (2009). Comparison of generality based algorithm variants for automatic taxonomy generation. Paper presented at the Proceedings of the 6th international conference on Innovations in information technology, AI-Ain, United Arab Emirates.
15. Heymann, P., & Garcia-Molina, H. (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems.
16. Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, United States.
17. Lin, F.-r., & Hsueh, C.-m. (2003, 6-9 Jan. 2003). Knowledge map creation and maintenance for virtual communities of practice. Paper presented at the System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on.
18. Lin, F.-r., & Yu, J.-H. (2009). Visualized cognitive knowledge map integration for P2P networks. Decis. Support Syst., 46(4), 774-785.
19. Makrehchi, M., & Kamel, M. S. (2007). Automatic Taxonomy Extraction Using Google and Term Dependency. Paper presented at the Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence.
20. Oliveira, A., Pereira, F., & Cardoso, A. (2002). Automatic Reading and Learning from Text. Paper presented at the Symposium on Artificial Intelligence.
21. Ong, T.-H., Chen, H., Sung, W.-k., & Zhu, B. (2005). Newsmap: a knowledge map for online news. Decision Support Systems, 39(4), 583-597.
22. Rajaraman, K., & Tan, A.-H. (2002). Knowledge discovery from texts: a concept frame graph approach. Paper presented at the Proceedings of the eleventh international conference on Information and knowledge management, McLean, Virginia, USA.
23. Reynaldo, G.-G., & Aurora, P.-P. (2010). Dynamic hierarchical algorithms for document clustering. Pattern Recognition Letters, 31(6), 469-477.
24. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5), 513-523.
25. Sanderson, M., & Croft, B. (1999). Deriving concept hierarchies from text. Paper presented at the SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.
26. Shih, J.-Y., Chang, Y.-J., & Chen, W.-H. (2008). Using GHSOM to construct legal maps for Taiwan's securities and futures markets. Expert Syst. Appl., 34(2), 850-858.
27. Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques.
28. Tsui, E., Wang, W. M., Cheung, C. F., & Lau, A. S. M. (2010). A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags. Inf. Process. Manage., 46(1), 44-57.
29. Widyantoro, D. H., Ioerger, T. R., & Yen, J. (2002). An Incremental Approach to Building a Cluster Hierarchy. Paper presented at the Proceedings of the 2002 IEEE International Conference on Data Mining.
30. Wong, W., & Fu, A. (2000). Incremental Document Clustering for Web Page Classification.
31. Woon, W. L., & Madnick, S. (2009). Asymmetric information distances for automated taxonomy construction. Knowl. Inf. Syst., 21(1), 91-111.
32. Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., & Liu, X. (1999). Learning Approaches for Detecting and Tracking News Events. IEEE Intelligent Systems, 14(4), 32-43.
33. Zhang, W., Yoshida, T., Tang, X., & Wang, Q. (2010). Text clustering using frequent itemsets. Knowledge-Based Systems, 23(5), 379-388.
34. 視覺素養學習網（無日期），2011年5月21日取自http://vr.theatre.ntu.edu.tw/fineart/index.html。
35. 國際數據資訊公司（2010），2011年5月21日取自http://www.idc.com/。
36. Medical Subject Headings（2011），2011年5月21日取自http://www.nlm.nih.gov/mesh/。
37. Wikipedia（2001），2011年5月21日取自http://www.wikipedia.org/。

指導教授

林熙禎(Shi-jen Lin)

審核日期

2011-6-29

推文