運用NGD建立適用於使用者回饋資訊不足之文件過濾系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：3.16.48.120

姓名

李浩平(Hao-ping Lee) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

運用NGD建立適用於使用者回饋資訊不足之文件過濾系統
(A NGD Based Document Filtering System for Limited User Feedback)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著網際網路的發展，使用者可以輕易地經由網路取得大量的資訊，卻也同時必須面對著資訊過載(Information Overload)的問題。因此如何從大量的資訊中，提取出使用者感興趣的資訊，就成為資訊爆炸時代非常重要的議題，資訊過濾的研究議題也因此應運而生。然而，不同於傳統的分類問題著重於對靜態的資料進行分類，資訊過濾系統必須時常面對使用者興趣隨著時間逐漸甚至快速改變的情況，我們將這種資料分布會隨著時間遞移而有所改變的問題稱為概念漂移(Concept drift)。當使用者的興趣發生概念漂移，資訊過濾系統必須要有足夠的能力去偵測概念漂移的發生，並即時的調整更新使用者的興趣模型。傳統的資訊過濾系統通常必須透過蒐集大量的使用者回饋資訊，反應使用者的改變，才能維持穩定的過濾效果。本研究運用NGD能夠即時計算字詞間語意關係的特性，提出個一個能以極少量訓練文件為基礎建立使用者興趣模型的動態文件過濾系統，改進突發性概念飄移發生後，文件過濾系統的反應效率不足的問題。

摘要(英)

Due to the development of the Internet, people can access mass information easily from a variety of search engines and portal; however, in the meantime, people also have to face the problem of “information overload”. Therefore, how to extract useful information for the users from the mass information has become a vital issue in the information explosion era, and the research of information filtering has been caused. Nevertheless, different from the traditional classification which focused on the classification of static information, information filtering system has to face the situation that the interests of users would change dynamically. The phenomenon that the distribution of data changes over time is called “Concept drift”. When concept drift happens to the interests of users, the information filtering system has to have sufficient ability to detect the happening of concept drift; furthermore, it has to adjust and update the interest models of users in time. Traditionally, the information filtering system has to collect a lot of feedback information to reflect the interest change of user, so that the filer could be stable and effective. In order to improve the inefficiency of information filtering system when concept drift happens, this research applied the characteristic of NGD, which could recognize the relationships between the meanings of different terms, to propose a dynamic information filtering system which could establish the interest models of users by limited training documents.

關鍵字(中)

★ 概念漂移
★ 文件過濾
★ 資訊過濾
★ NGD

關鍵字(英)

★ Concept drift
★ Information filtering
★ NGD
★ Document filtering

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 3
1.4 研究方法 3
1.5 論文架構 4
第二章文獻探討 6
2.1 資訊過濾(Information filtering) 6
2.1.1 內容式資訊過濾(Content-based filtering) 6
2.1.2 協同式過濾(Collaborative-based filtering) 7
2.2 文件特徵字詞選擇 7
2.2.1 詞彙頻率與反向文件頻率(TF-IDF) 7
2.2.2 字詞共現關係(Terms Co-occurrence) 8
2.2.3 資訊獲利(Information Gain) 9
2.2.4 Google相似度距離(Google similarity distance) 9
2.3 小樣本訓練集 11
2.3.1 最大期望法 11
2.3.2 共同訓練(Co-training) 12
2.4 概念漂移 13
2.4.1 概念飄移的定義與問題 13
2.4.2 概念漂移學習方法 14
2.4.3 小結 17
2.5 支持向量機(Support vector machine) 17
2.6 K核心(K-core) 18
第三章研究方法與系統架構 20
3.1 系統架構 20
3.2 文件前處理 22
3.2.1 詞性與關鍵字合併(Part-of-speech and keyword combination) 23
3.2.2 字詞長度(Length of word) 23
3.2.3 Google搜尋結果數 24
3.3 核心特徵選擇 25
3.3.1 建立字詞關係網路 25
3.3.2萃取核心特徵 26
3.4 建立共同語意中心 27
3.5 文件過濾 29
3.6 概念飄移處理 30
第四章實驗結果與討論 33
4.1 實驗環境 33
4.2 實驗資料集 33
4.3 評估準則 34
4.4 實驗設計 35
4.5實驗結果 36
4.5.1 過濾效能評估 36
4.5.2 訓練文件數 38
4.5.3 語意中心字詞數 39
4.5.4 概念漂移 41
4.6系統執行效能分析 43
4.6.1時間複雜度 43
4.6.2 實際執行速度 44
第五章結論與未來研究方向 47
5.1 結論 47
5.2 未來研究方向 48
參考文獻 50
中文部分 50
英文部分 50

參考文獻

中文部分
1. 張家寧, 陳信源, 葉鎮源, 黃明居, 柯皓仁, 楊維邦 (2008). 以概念萃取為基礎之文件分群. 2008 資訊科技國際研討會. 台中:朝陽科技大學.
英文部分
2. Basilico, J., & Hofmann, T. (2004). Unifying collaborative and content-based filtering. Paper presented at the Proceedings of the Twenty-first International Conference on Machine learning. Banff, Alberta, Canada.
3. Bellman, R. (1961). Adaptive control processes - A guided tour. New Jersey, United States: Princeton University Press.
4. Belkin, N. J., & Croft, W. B. (1992). Information filtering and information retrieval: two sides of the same coin? Commun. ACM, 35(12), 29-38.
5. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. Paper presented at the Proceedings of the Eleventh Annual Conference on Computational learning theory, Madison, Wisconsin, United States.
6. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3), 1-27.
7. Chang, H.-C., & Hsu. C.-C., (2005). Using topic keyword clusters for automatic document clustering. Paper presented at the Third International Conference on Information Technology and Applications, Sydney, Australia.
8. Chen, P.-I., & Lin, S.-J. (2010). Automatic keyword prediction using Google similarity distance. Expert Systems with Applications, 37(3), 1928-1938.
9. Chen, P.-I., & Lin, S.-J. (2011). Word AdHoc Network: Using Google Core Distance to extract the most relevant information. Know.-Based Syst., 24(3), 393-405.
10. Cilibrasi, R. L., & Vitanyi, P. M. B. (2007). The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 19(3), 370-383.
11. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38.
12. Dong, Z., & Dong Q., (2003). HowNet - a hybrid language and knowledge resource. Paper presented at the Proceedings of IEEE 2003 International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China.
13. Dries, A., & Rückert, U. (2009). Adaptive concept drift detection. Statistical Analy Data Mining, 2(5-6), 311-327.
14. Hanani, U., Shapira, B., & Shoval, P. (2001). Information Filtering: Overview of Issues, Research and Systems. User Modeling and User-Adapted Interaction, 11(3), 203-259.
15. Joachims, T. (1997). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Paper presented at the Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, Tennessee, United States..
16. Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In C. Nédellec & C. Rouveirol (Eds.), Machine Learning: ECML-98 (Vol. 1398, pp. 137-142): Springer Berlin / Heidelberg.

17. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal., 8(3), 281-300.
18. Klinkenberg, R., & Joachims, T. (2000). Detecting Concept Drift with Support Vector Machines. Paper presented at the Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, California, United States.
19. Klinkenberg, R., & Renz, I. (1998). Adaptive Information Filtering: Learning in the Presence of Concept Drifts. Paper presented at the Workshop Notes of the ICML/AAAI-98 Workshop Learning for Text Categorization, Madison, Wisconsin, United States.
20. Kumar, S., Raghavan, V. S., & Deng, J. (2006). Medium Access Control protocols for ad hoc wireless networks: A survey. Ad Hoc Netw., 4(3), 326-358.
21. Liu, Y.-C., Wang, X.-L., & Liu, B.-Q. (2004). A feature selection algorithm for document clustering based on word co-occurrence frequency. Paper presented at the Proceedings of 2004 International Conference on Machine Learning and Cybernetics, Shanghai, China.
22. Miller, G. A. (1995). WordNet: a lexical database for English. Commun. ACM, 38(11), 39-41.
23. Montebello, M. (1998). Information overload-an IR problem? Paper presented at the Proceedings of the String Processing and Information Retrieval: A South American Symposium, Santa Cruz de La Sierra , Bolivia.
24. Nigam, K., & Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. Paper presented at the Proceedings of the ninth international conference on Information and knowledge management, McLean, Virginia, United States.
25. Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2), 103-134.
26. Pazzani, M., & Billsus, D. (2007). Content-Based Recommendation Systems. In P. Brusilovsky, A. Kobsa & W. Nejdl (Eds.), The Adaptive Web (Vol. 4321, pp. 325-341): Springer Berlin / Heidelberg.
27. Pontil, M., & Verri, A. (1998). Support Vector Machines for 3D Object Recognition. IEEE Trans. Pattern Anal. Mach. Intell., 20(6), 637-646.
28. Quinlan, J. R. (1986). Induction of Decision Trees. Mach. Learn., 1(1), 81-106.
29. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Paper presented at the Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658-2663.
30. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523.
31. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Comput. Surv., 34(1), 1-47.
32. Seidman, S. (1983). Network structure and minimum degree. Social Networks, 5, 269-287.
33. Tsymbal, A.(2004). The Problem of Concept Drift: Definitions and Related Work.Techical report, Department of Computer Science, Trinity College: Dublin, Ireland.
34. Tsymbal, A., Pechenizkiy, M., Pádraig, Cunningham, d., & Puuronen, S. (2008). Dynamic integration of classifiers for handling concept drift. Inf. Fusion, 9(1), 56-68.
35. Tufis, D., & Mason, O. (1998). Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger. Paper presented at the Proceedings of the First International Conference on Language Resource & Evaluation, Granada, Spain.
36. Vapnik, V. (1998). Statistical Learning Theory. New York, United States: Wiley-Interscience.
37. Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. Paper presented at the Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge discovery and data mining, Washington, D.C, United States.
38. Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications, Cambridge, U.K: Cambridge University Press.
39. Xun, W., Yi, X., & Biwei, L. (2006). A Hybrid Information Filtering Model. Paper presented at the 2006 International Conference on Computational Intelligence and Security, Guangzhou, China.

指導教授

林熙禎(Shi-jen Lin)

審核日期

2011-7-3

推文