博碩士論文 101423049 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator涂謹瀅zh_TW
DC.creatorJin-ying Tuen_US
dc.date.accessioned2014-7-11T07:39:07Z
dc.date.available2014-7-11T07:39:07Z
dc.date.issued2014
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=101423049
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在資訊檢索中,向量空間模型 (Vector Space Model)為常見表示方法,過去在向量空間模型上的相關回饋研究,是以使用者對於系統所回傳的相關文件清單,萃取字詞作為回饋的特徵值,然而此方法僅考慮字詞出現的頻率,而透過潛在語意分析 (Latent Semantic Analysis, LSA),能找出字詞與文件間隱含的關係。本研究發展出一套特徵擷取的方法,分為兩大部分。第一部分為關聯規則特徵器,針對使用者回饋前20篇相關與非相關文件各別實施關聯規則,將文件視為一連串的交易,交易內的項目即為字詞,接著將高於最小支持度 (Minimum Support)及最小信賴度 (Minimum Confidence)門檻值的字詞取出來,將這些關聯性強的字詞作為文件特徵。第二部分為特徵結合器,除了關聯規則特徵器萃取出強關聯的字詞,再加上萃取僅出現在相關或非相關文件且出現次數不高的字詞,能代表特定類別的關鍵字。文件套用字詞特徵後,以TF-IDF計算字詞權重,接著將字詞-文件矩陣實施奇異值分解 (Singular Value Decomposition, SVD),選擇適當維度降維後,重建字詞-文件矩陣,發掘字詞與文件間潛在的語意關係。實驗結果發現,經本研究特徵擷取方法,能有效改善未經特徵篩選且以TF-IDF作為文件特徵的分類效能,其中,以特徵結合器加上潛在語意分析的文件分類效果最佳。本研究證明實作關聯規則與潛在語意分析,運用在相關回饋資訊上,除了降低儲存空間外,更能有效改善文件分類準確度。zh_TW
dc.description.abstractIn the field of information retrieval, vector space model (VSM) is a common representation method. In the method, the main technique in the application of relevance feedback was based on the aggregation of term frequencies in feedback documents. To uncover and apply the hidden relationships between terms and documents, this study has developed a feature selection method. It includes two parts. The first part is related to association rules feature. It aims to deal with the top 20 relevant and non-relevant documents from user feedback and extract association rules. Let documents be a set of transactions and terms be a subset of the items. Extract the terms that are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Then, set these association terms as documents features. The second part is related to feature-combination. In addition to association rules terms, feature-combination extracts those occurs in relevance and non-relevance documents only and appears infrequently. These keywords can represent specific class. After the application of features on documents, terms will be weighted by TF-IDF. Let term-document matrix implement singular value decomposition (SVD), then choose the appropriate dimension to reduce and re-build term-document matrix. Re-build matrix can explore potential semantic relationships between terms and documents. Experiment results show that our feature selection methods effectively improve classification performance compared with feature selection by TF-IDF as document characteristics. The best document classification result is feature-combined plus LSA method. This study demonstrates that utilizing association rules and LSA in the application of relevance feedback information in document classification could not only reduce storage space but also improve classification accuracy.en_US
DC.subject相關回饋zh_TW
DC.subject潛在語意分析zh_TW
DC.subject關聯規則zh_TW
DC.title利用關聯規則與潛在語意分析以 運用相關回饋資訊於文件分類的方法zh_TW
dc.language.isozh-TWzh-TW
DC.titleThe method of combining association rules with latent semantic analysis using relevance feedback information in document classificationen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明