關鍵字為基礎的多主題概念飄移學習

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：89

、訪客IP：18.119.135.63

姓名

林文羽(Wun-Yu Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

關鍵字為基礎的多主題概念飄移學習

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著網際網路(Internet)的資訊蓬勃發展，使用者可以輕易的從各個搜尋引擎與入口網站取得大量的資訊。然而，在此同時，使用者也得面對資訊過載(Information Overload)的問題，資訊過濾(Information filtering)也就應運而生。然而，使用者的興趣並非一成不變，它會隨著時空的變化產生改變。這種目標概念隨著時空改變而轉變的現象稱之為概念飄移(Concept drift)。以往的研究多關注在單標籤分類(Single label classification)所發生的概念飄移，然而現實生活上使用者對於資訊的需求是多元、多主題的，並且每個主題在時空的影響下擁有各自的喜好變化；同時文件也常屬於多個類別，若僅依照文件的主要概念，將之分類，則可能讓使用者錯過潛在感興趣的相關文件。因此本研究提出一個以字詞網路為基礎的使用者模型，透過它可以依照使用者對於多個主題的喜好對文件進行過濾，而在喜好發生變化時，也能夠適當的偵測並更新模型。

摘要(英)

With the rapidly growing of internet, users can easily access mass information from a variety of search engines and portals. However, users also have to face the problem of “Information Overload” in the meantime. Therefore, the research of information filtering has been caused. Nevertheless, the users’ interest are not static, they will change with time and space. The phenomenon that the distribution of data changes over time is called “Concept drift”. Previous researches about concept drift usually focus on the situation of single label classification. But in fact, the demand for information is diverse and user may be interested in multiple target concepts. And each concept has its own drift pattern. Furthermore, documents often belong to more than one class. People will miss potentially relevant documents if only considering the main concept in classification. Therefore, this paper proposes a keyword-network based user model, through which people can filter incoming documents according to their preference. When one of target concept has drift, the user model also has the ability to adapt this change.

關鍵字(中)

★ 概念飄移
★ 資訊過濾
★ 使用者模型

關鍵字(英)

★ Concept Drift
★ Information Filtering
★ User Modeling

論文目次

摘要 iii
Abstract iv
目錄 v
圖目錄 viii
表目錄 x
一、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 情境說明 3
1-4 問題定義 5
1-5 研究目的 6
1-6 論文架構 7
二、文獻探討 8
2-1 使用者模型 8
2-1-1 向量 8
2-1-2 詞彙袋 8
2-1-3 網路基礎使用者模型 8
2-1-4本體基礎使用者模型 9
2-2 文件前處理與特徵選取 9
2-2-1 前處理 9
2-2-1-1 詞性與關鍵字合併 10
2-2-1-2 字詞長度 10
2-2-1-3 Wikipedia搜尋結果數 11
2-2-2 特徵選取 11
2-2-3 Google相似度距離 11
2-3 概念飄移 13
2-3-1 概念飄移的定義與問題 13
2-3-2 概念飄移學習方法 14
2-3-2-1 持續學習器 14
2-3-2-2 以偵測為基礎的學習器 15
2-4 多標籤文件分類 16
2-4-1 隨機挑選與去除多標籤資料 17
2-4-2 標籤冪集 18
2-4-3 二元關聯 18
2-4-4 樣本分解 19
2-4-5 小結 20
2-5 複雜網路分析 20
2-5-1 Degree 20
2-5-2 K核心 21
2-5-3 參與中間度分群 21
2-5-4 社群結構 24
三、系統架構與設計 25
3-1 研究限制 25
3-2 系統架構 25
3-3 文件前處理 27
3-4 特徵選取 27
3-5 參與中間度分群 27
3-6 文件過濾 30
3-7 概念飄移偵測與處理 32
四、實驗結果與討論 34
4-1 實驗環境 34
4-2 實驗資料集 34
4-3 評估準則 36
4-4 實驗設計 37
4-4-1 實驗一：特徵選取的差異 37
4-4-2 實驗二：本研究方法的門檻值實驗 38
4-4-2-1 參與中間度分群門檻βsingle、βmulti 38
4-4-2-2 四種相關性方法比較與γ、相關性門檻值α的訂定 42
4-4-3 實驗三：找出潛在相關文件的能力評估 48
4-4-4 實驗四：使用者模型學習能力評估 51
4-4-5 實驗五：多主題概念飄移情境模擬實驗 54
4-5 系統執行效能分析 58
4-5-1 時間複雜度 58
4-5-2 實際執行時間 59
五、結論與未來研究方向 63
5-1 結論 63
5-2 未來研究方向 64
5-3 管理意涵 65
參考文獻 66
中文部分 66
英文部分 66
附錄一 70
附錄二 71

參考文獻

中文部分
〔1〕李浩平，「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」，國立中央大學，碩士論文, 民國100年。
〔2〕鄭奕駿，「離線搜尋Wikipedia以縮減NGD運算時間之研究」，國立中央大學，碩士論文, 民國101年。
英文部分
〔3〕 Boutell, M. R., Luo, J., Shen, X., and Brown, C. M., "Learning multi-label scene classification", Pattern recognition, vol. 37, pp. 1757-1771, 2004.
〔4〕 Brandes, U., "A faster algorithm for betweenness centrality", Journal of Mathematical Sociology, vol. 25, pp. 163-177, 2001.
〔5〕 Chang, H.-C. and Chiun-Chieh, H., "Using topic keyword clusters for automatic document clustering", IEICE TRANSACTIONS on Information and Systems, vol. 88, pp. 1852-1860, 2005.
〔6〕 Chen, P.-I. and Lin, S.-J., "Automatic keyword prediction using Google similarity distance", Expert Systems with Applications, vol. 37, pp. 1928-1938, 2010.
〔7〕 Chen, P.-I. and Lin, S.-J., "Word AdHoc network: using Google core distance to extract the most relevant information", Knowledge-Based Systems, vol. 24, pp. 393-405, 2011.
〔8〕 Cilibrasi, R. L. and Vitanyi, P. M., "The google similarity distance", Knowledge and Data Engineering, IEEE Transactions, vol. 19, pp. 370-383, 2007.
〔9〕 De Bra, P. and Calvi, L., "AHA: a generic adaptive hypermedia system," in Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, 1998, pp. 5-12.
〔10〕 Diestel, R., "Graph theory. 2005," ed: Springer-Verlag, 2005.
〔11〕 Dijkstra, E. W., "A note on two problems in connexion with graphs", Numerische mathematik, vol. 1, pp. 269-271, 1959.
〔12〕 Diplaris, S., Tsoumakas, G., Mitkas, P. A., and Vlahavas, I., "Protein classification with multiple algorithms," in Advances in Informatics, ed: Springer, 2005, pp. 448-456.
〔13〕 Girvan, M. and Newman, M. E., "Community structure in social and biological networks", Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826, 2002.
〔14〕 Hanani, U., Shapira, B., and Shoval, P., "Information filtering: Overview of issues, research and systems", User Modeling and User-Adapted Interaction, vol. 11, pp. 203-259, 2001.
〔15〕 Joachims, T., Text categorization with support vector machines: Learning with many relevant features: Springer, 1998.
〔16〕 Klinkenberg, R. and Joachims, T., "Detecting concept drift with support vector machines," in Proceedings of the Seventeenth International Conference on Machine Learning (ICML), 2000.
〔17〕 Liu, Y.-C., Wang, X.-L., and Liu, B.-Q., "A feature selection algorithm for document clustering based on word co-occurrence frequency," in Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference 2004, pp. 2963-2968.
〔18〕 Magnini, B. and Strapparava, C., "User modelling for news web sites with word sense based techniques", User Modeling and User-Adapted Interaction, vol. 14, pp. 239-257, 2004.
〔19〕 Newman, M. E. and Girvan, M., "Finding and evaluating community structure in networks", Physical review E, vol. 69, p. 026113, 2004.
〔20〕 Page, E., "Continuous inspection schemes", Biometrika, vol. 41, pp. 100-115, 1954.
〔21〕 Quinlan, J. R., "Induction of decision trees", Machine learning, vol. 1, pp. 81-106, 1986.
〔22〕 Razmerita, L., Angehrn, A., and Maedche, A., "Ontology-based user modeling for knowledge management systems," in User Modeling 2003, ed: Springer, 2003, pp. 213-217.
〔23〕 Salton, G. and Buckley, C., "Term-weighting approaches in automatic text retrieval", Information processing & management, vol. 24, pp. 513-523, 1988.
〔24〕 Schwarzkopf, E., Heckmann, D., Dengler, D., and Kröner, A., "Mining the structure of tag spaces for user modeling," in Complete On-Line Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling. Corfu, Griechenland, 2007, pp. 63-75.
〔25〕 Seidman, S. B., "Network structure and minimum degree", Social networks, vol. 5, pp. 269-287, 1983.
〔26〕 Tsoumakas, G. and Katakis, I., "Multi-label classification: An overview", International Journal of Data Warehousing and Mining (IJDWM), vol. 3, pp. 1-13, 2007.
〔27〕 Tsymbal, A., "The problem of concept drift: definitions and related work", Computer Science Department, Trinity College Dublin, 2004.
〔28〕 Tsymbal, A., Pechenizkiy, M., Cunningham, P., and Puuronen, S., "Dynamic integration of classifiers for handling concept drift", Information Fusion, vol. 9, pp. 56-68, 2008.
〔29〕 Tufis, D. and Mason, O., "Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger," in Proceedings of the First International Conference on Language Resources and Evaluation (LREC), 1998, pp. 589-596.
〔30〕 Vitányi, P. M., Balbach, F. J., Cilibrasi, R. L., and Li, M., "Normalized information distance," in Information theory and statistical learning, ed: Springer, 2009, pp. 45-82.
〔31〕 White, S., O’Madadhain, J., Fisher, D., and Boey, Y.-B., "JUNG: Java Universal Network/Graph Framework", available now at: http://jung.sourceforge.net/index.html, 2004.
〔32〕 Xioufis, E. S., Spiliopoulou, M., Tsoumakas, G., and Vlahavas, I., "Dealing with concept drift and class imbalance in multi-label stream classification," in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Two, 2011, pp. 1583-1588.
〔33〕 Zhang, P., Zhu, X., and Shi, Y., "Categorizing and mining concept drifting data streams," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 812-820.
〔34〕 Žliobaitė, I., "Learning under concept drift: an overview", arXiv preprint arXiv:1010.4784, 2010.

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2013-7-22

推文