利用關聯規則與潛在語意分析以 運用相關回饋資訊於文件分類的方法

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	涂謹瀅	zh_TW
DC.creator	Jin-ying Tu	en_US
dc.date.accessioned	2014-7-11T07:39:07Z
dc.date.available	2014-7-11T07:39:07Z
dc.date.issued	2014
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=101423049
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	在資訊檢索中，向量空間模型 (Vector Space Model)為常見表示方法，過去在向量空間模型上的相關回饋研究，是以使用者對於系統所回傳的相關文件清單，萃取字詞作為回饋的特徵值，然而此方法僅考慮字詞出現的頻率，而透過潛在語意分析 (Latent Semantic Analysis, LSA)，能找出字詞與文件間隱含的關係。本研究發展出一套特徵擷取的方法，分為兩大部分。第一部分為關聯規則特徵器，針對使用者回饋前20篇相關與非相關文件各別實施關聯規則，將文件視為一連串的交易，交易內的項目即為字詞，接著將高於最小支持度 (Minimum Support)及最小信賴度 (Minimum Confidence)門檻值的字詞取出來，將這些關聯性強的字詞作為文件特徵。第二部分為特徵結合器，除了關聯規則特徵器萃取出強關聯的字詞，再加上萃取僅出現在相關或非相關文件且出現次數不高的字詞，能代表特定類別的關鍵字。文件套用字詞特徵後，以TF-IDF計算字詞權重，接著將字詞-文件矩陣實施奇異值分解 (Singular Value Decomposition, SVD)，選擇適當維度降維後，重建字詞-文件矩陣，發掘字詞與文件間潛在的語意關係。實驗結果發現，經本研究特徵擷取方法，能有效改善未經特徵篩選且以TF-IDF作為文件特徵的分類效能，其中，以特徵結合器加上潛在語意分析的文件分類效果最佳。本研究證明實作關聯規則與潛在語意分析，運用在相關回饋資訊上，除了降低儲存空間外，更能有效改善文件分類準確度。	zh_TW
dc.description.abstract	In the field of information retrieval, vector space model (VSM) is a common representation method. In the method, the main technique in the application of relevance feedback was based on the aggregation of term frequencies in feedback documents. To uncover and apply the hidden relationships between terms and documents, this study has developed a feature selection method. It includes two parts. The first part is related to association rules feature. It aims to deal with the top 20 relevant and non-relevant documents from user feedback and extract association rules. Let documents be a set of transactions and terms be a subset of the items. Extract the terms that are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Then, set these association terms as documents features. The second part is related to feature-combination. In addition to association rules terms, feature-combination extracts those occurs in relevance and non-relevance documents only and appears infrequently. These keywords can represent specific class. After the application of features on documents, terms will be weighted by TF-IDF. Let term-document matrix implement singular value decomposition (SVD), then choose the appropriate dimension to reduce and re-build term-document matrix. Re-build matrix can explore potential semantic relationships between terms and documents. Experiment results show that our feature selection methods effectively improve classification performance compared with feature selection by TF-IDF as document characteristics. The best document classification result is feature-combined plus LSA method. This study demonstrates that utilizing association rules and LSA in the application of relevance feedback information in document classification could not only reduce storage space but also improve classification accuracy.	en_US
DC.subject	相關回饋	zh_TW
DC.subject	潛在語意分析	zh_TW
DC.subject	關聯規則	zh_TW
DC.title	利用關聯規則與潛在語意分析以運用相關回饋資訊於文件分類的方法	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	The method of combining association rules with latent semantic analysis using relevance feedback information in document classification	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 101423049 完整後設資料紀錄