利用關聯規則與潛在語意分析以 運用相關回饋資訊於文件分類的方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：3.139.98.233

姓名

涂謹瀅(Jin-ying Tu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

利用關聯規則與潛在語意分析以運用相關回饋資訊於文件分類的方法
(The method of combining association rules with latent semantic analysis using relevance feedback information in document classification)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

在資訊檢索中，向量空間模型 (Vector Space Model)為常見表示方法，過去在向量空間模型上的相關回饋研究，是以使用者對於系統所回傳的相關文件清單，萃取字詞作為回饋的特徵值，然而此方法僅考慮字詞出現的頻率，而透過潛在語意分析 (Latent Semantic Analysis, LSA)，能找出字詞與文件間隱含的關係。本研究發展出一套特徵擷取的方法，分為兩大部分。第一部分為關聯規則特徵器，針對使用者回饋前20篇相關與非相關文件各別實施關聯規則，將文件視為一連串的交易，交易內的項目即為字詞，接著將高於最小支持度 (Minimum Support)及最小信賴度 (Minimum Confidence)門檻值的字詞取出來，將這些關聯性強的字詞作為文件特徵。第二部分為特徵結合器，除了關聯規則特徵器萃取出強關聯的字詞，再加上萃取僅出現在相關或非相關文件且出現次數不高的字詞，能代表特定類別的關鍵字。文件套用字詞特徵後，以TF-IDF計算字詞權重，接著將字詞-文件矩陣實施奇異值分解 (Singular Value Decomposition, SVD)，選擇適當維度降維後，重建字詞-文件矩陣，發掘字詞與文件間潛在的語意關係。實驗結果發現，經本研究特徵擷取方法，能有效改善未經特徵篩選且以TF-IDF作為文件特徵的分類效能，其中，以特徵結合器加上潛在語意分析的文件分類效果最佳。本研究證明實作關聯規則與潛在語意分析，運用在相關回饋資訊上，除了降低儲存空間外，更能有效改善文件分類準確度。

摘要(英)

In the field of information retrieval, vector space model (VSM) is a common representation method. In the method, the main technique in the application of relevance feedback was based on the aggregation of term frequencies in feedback documents. To uncover and apply the hidden relationships between terms and documents, this study has developed a feature selection method. It includes two parts. The first part is related to association rules feature. It aims to deal with the top 20 relevant and non-relevant documents from user feedback and extract association rules. Let documents be a set of transactions and terms be a subset of the items. Extract the terms that are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Then, set these association terms as documents features. The second part is related to feature-combination. In addition to association rules terms, feature-combination extracts those occurs in relevance and non-relevance documents only and appears infrequently. These keywords can represent specific class. After the application of features on documents, terms will be weighted by TF-IDF. Let term-document matrix implement singular value decomposition (SVD), then choose the appropriate dimension to reduce and re-build term-document matrix. Re-build matrix can explore potential semantic relationships between terms and documents. Experiment results show that our feature selection methods effectively improve classification performance compared with feature selection by TF-IDF as document characteristics. The best document classification result is feature-combined plus LSA method. This study demonstrates that utilizing association rules and LSA in the application of relevance feedback information in document classification could not only reduce storage space but also improve classification accuracy.

關鍵字(中)

★ 相關回饋
★ 潛在語意分析
★ 關聯規則

關鍵字(英)

論文目次

中文摘要 i
英文摘要 ii
銘謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
一、緒論 1
1-1 研究動機 1
1-2 研究目的 2
1-3研究範圍與限制 2
1-4 論文架構 3
二、文獻探討 4
2-1 向量空間模型 4
2-2 相關回饋 5
2-3 潛在語意分析 7
2-4 關聯規則 9
2-4-1 Apriori 演算法 11
2-5 支援向量機 12
三、系統設計 13
3-1 系統架構 13
3-2 關聯規則特徵器 15
3-2-1 Apriori Algorithm 16
3-3 特徵結合器 17
3-3-1 Find Ronly_Terms and NRonly Terms 18
四、實驗分析 20
4-1 實驗資料 20
4-2 實驗評估指標 22
4-3 實驗結果 23
4-3-1 BASELINE 24
4-3-2 方法一: 關聯規則特徵器 24
4-3-3 方法二: 特徵結合器 26
4-3-4 LSA在不同維度下情況 28
4-4 實驗討論 30
五、結論 32
5-1 研究結論 32
5-2 未來研究方向 33
參考文獻 34

參考文獻

1. Salton, G., A. Wong, and C.-S. Yang, A vector space model for automatic indexing. Communications of the ACM, 1975. 18(11): p. 613-620.
2. Ruthven, I. and M. Lalmas, A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 2003. 18(2): p. 95-145.
3. Salton, G. and C. Buckley, Term-weighting approaches in automatic text retrieval. Information processing & management, 1988. 24(5): p. 513-523.
4. Salton, G. and M.E. Lesk, Computer evaluation of indexing and text processing. Journal of the ACM (JACM), 1968. 15(1): p. 8-36.
5. Salton, G. and M.J. McGill, Introduction to modern information retrieval. 1986, New York, NY, USA: McGraw-Hill, Inc.
6. Rocchio, J.J., Relevance feedback in information retrieval. The Smart retrieval system - experiments in automatic document processing, 1971: p. 313-323.
7. Efthimiadis, E.N., Query expansion. Annual review of information science and technology, 1996. 31: p. 121-187.
8. Onoda, T., H. Murata, and S. Yamada. Active Learning with Support Vector Machines in the Relevance Feedback Document Retrieval. in Control, Automation, Robotics and Vision, 2006. ICARCV′06. 9th International Conference on. 2006. IEEE.
9. Kang, J.W., et al. A term cluster query expansion model based on classification information in natural language information retrieval. in Artificial Intelligence and Computational Intelligence (AICI), 2010 International Conference on. 2010. IEEE.
10. Chen, Z. and Y. Lu, Using text classification method in relevance feedback, in Intelligent Information and Database Systems. 2010, Springer Berlin Heidelberg: HaiDian District, Beijing, P.R.China. p. 441-449.
11. Chen, Z. and Y. Lu. A SVM based method for active relevance feedback. in Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on. 2010. Singapore: IEEE.
12. Song, W. and S.C. Park, Genetic algorithm for text clustering based on latent semantic indexing. Computers & Mathematics with Applications, 2009. 57(11-12): p. 1901-1907.
13. Deerwester, S.C., et al., Indexing by latent semantic analysis. JASIS, 1990. 41(6): p. 391-407.
14. Landauer, T.K., P.W. Foltz, and D. Laham, An introduction to latent semantic analysis. Discourse processes, 1998. 25: p. 259-284.
15. Kakkonen, T., E. Sutinen, and J. Timonen. Noise reduction in LSA-based essay assessment. in Proceedings of the 5th WSEAS International Conference on Simulation, Modeling and Optimization (SMO’05). 2005.
16. Wild, F., et al., Parameters driving effectiveness of automated essay scoring with LSA, in Proceedings of the 9th CAA Conference. 2005: Loughborough: Loughborough University. p. 485-494.
17. Berry, M.W., S.T. Dumais, and G.W. O′Brien, Using linear algebra for intelligent information retrieval. SIAM review, 1995. 37(4): p. 573-595.
18. Han, J., M. Kamber, and J. Pei, Data mining: concepts and techniques. 2006: Morgan kaufmann.
19. Han, J. and M. Kamber, Data mining: concepts and techniques (the Morgan Kaufmann Series in data management systems). 2000.
20. Borgelt, C. and R. Kruse, Induction of association rules: Apriori implementation, in Compstat. 2002, Physica-Verlag HD: Germany. p. 395-400.
21. Agrawal, R., T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. in ACM SIGMOD Record. 1993. ACM.
22. Yin-Fu, H. and L. San-Des. Applying Multidimensional Association Rule Mining to Feedback-Based Recommendation Systems. in Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on. 2011.
23. Ribeiro, M.X., et al. Statistical Association Rules and Relevance Feedback: Powerful Allies to Improve the Retrieval of Medical Images. in Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE International Symposium on. 2006.
24. Agrawal, R. and R. Srikant. Fast algorithms for mining association rules. in Proc. 20th int. conf. very large data bases, VLDB. 1994.
25. Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3): p. 273-297.
26. Sebastiani, F., Machine learning in automated text categorization. ACM computing surveys (CSUR), 2002. 34(1): p. 1-47.
27. Chang, C.C. and C.J. Lin. LIBSVM --A Library for Support Vector Machines. Available from: http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
28. Lemur Project. Available from: http://www.lemurproject.org/.
29. MathWorks. 1994-2014; Available from: http://www.mathworks.com/products/matlab/.

指導教授

周世傑(Shih-chieh Chou)

審核日期

2014-7-11

推文