博碩士論文 102423026 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:36 、訪客IP:52.14.8.34
姓名 張明竣(Ming-Chun Chang)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 應用相關回饋之語詞資訊於概念建立之方法
(The application of the term information residing in relevance feedback for concept construction)
相關論文
★ 信用卡盜刷防治簡訊規則製作之決策支援系統★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 過去的概念萃取研究中,對於一篇文件應該萃取多少概念去表達,沒有一個依據。因此,本研究旨在探討概念萃取數量與文件集離散程度間的關聯,並且利用公開資料集TREC-6做實驗,驗證其是否對文件分類效能上有所提升。如果將文件集分群後,文件平均分佈在各個群中,代表文件集的離散程度很高,本研究假設應該萃取更多概念才能夠足以表達文件集中大部份文章。反之如果文件集中在某幾個群中,代表文件集的離散程度很低,表示文件的分佈是很集中的,萃取少量概念就足以表達大部份文章。在本研究中提出動態概念萃取策略,利用文件分群得知其離散程度,並利用此數據動態控制萃取的概念數量,經實驗驗證可以初步證實本研究所提出的動態概念萃取策略,對於文件分類上的效能有進一步提升。
摘要(英) In the past, we did not have a method to determine how many concpts to represent the document. The aim of this study is to discuss the relation between the number of concepts extraction and the dispersion of document dataset. This study uses public document dataset TREC-6 to validate the effectiveness of text classification. This study proposes that a document dataset has high dispersion if the documents distribute evenly in a cluster. In this case, this study assumes that more concepts are needed to represent the document. On the contrary, if the documents has a centralized distribution in a cluster, the document dataset has low dispersion. In this case, this study assumes that less concepts are needed to represent the document. This study proposes a dynamic concept extraction method which applies the degree of dispersion as the basis to dynamically determine the number of concepts. Empirical results show that the proposes method can improve the effectiveness in text classification.
關鍵字(中) ★ 概念萃取
★ 文件概念化
★ 相關回饋
★ 向量空間模型
★ 文件集離散程度
關鍵字(英) ★ concept extraction
★ bag-of-concepts
★ relevance feedback
★ vector space model
★ dispersion of document dataset
論文目次 一、緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 研究範圍 2
1.4 研究限制 2
1.5 論文架構 2
二、文獻探討 3
2.1 向量空間模型 (Vector Space Model) 3
2.1.1 Bag-of-Words 3
2.1.1 Bag-of-Concepts 4
2.1.1 Combination Bag of Words & Bag of Concepts 5
2.2 相關回饋 (Relevance Feedback) 5
2.3 概念萃取 (Concept Extraction) 相關研究 6
2.3.1 需要外部資源協助概念建立 6
2.3.2 不需要外部資源協助概念建立 7
2.4 分群相關研究 8
2.4.1 K-means 8
2.4.2 階層式分群法 (Hierarchical Clustering) 8
2.4.3 分群假說 (Cluster Hypothesis) 9
三、研究方法 10
3.1 系統架構 11
3.2 文件集過濾流程 12
3.3 文件概念化流程 14
3.3.1 概念萃取 15
3.3.2 文件概念化 17
3.4 文件分類器 19
四、實驗結果評估與分析 20
4.1、實驗環境 20
4.2、實驗資料 20
4.3、實驗評估 23
4.4、實驗設計與流程 24
4.4.1、模型一:只利用正向概念表示文件 24
4.4.2 模型一實驗結果討論 25
4.4.3、模型二:只利用負向概念表示文件 28
4.4.4 模型二實驗結果討論 29
4.4.5、模型三:利用完整正向、負向概念表示文章 32
4.4.6 模型三實驗結果討論 33
五、結論 36
5.1、結論與研究貢獻 36
5.2、未來研究方向 37
參考文獻 38
參考文獻 [1] G. Salton, A. Wong and C. Yang, ′A vector space model for automatic indexing′, Commun. ACM, vol. 18, no. 11, pp. 613-620, 1975.
[2] G. Salton and C. Buckley, ′Term-weighting approaches in automatic text retrieval′, Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988.
[3] J. Wu, ′Applying relevance feedback to construct a vector space model with concepts as the dimension value′, National Central University, Taiwan, 2013.
[4] M. Sahlgren and R. Cöster, ′Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization′, in Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland, August 23-27, 2004, p. 487.
[5] A. Alahmadi, A. Joorabchi and A. E. Mahdi, ′A New Text Representation Scheme Combining Bag-of-Words and Bag-of-Concepts Approaches for Automatic Text Classification′, in GCC Conference and Exhibition (GCC), 2013 7th IEEE, Doha, November 17-20, 2013, pp. 108-113.
[6] G. Salton and M. McGill, Introduction to modern information retrieval. New York: McGraw-Hill, 1983, pp. 117-122.
[7] G. Salton, The SMART Retrieval System—Experiments in Automatic Document Processing. NJ, USA: Prentice-Hall, 1971.
[8] L. Cai and T. Hofmann, ′Text Categorization by Boosting Automatically Extracted Concepts′, in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, Toronto, Canada, July 28-August 1, 2003, pp. 182-189.
[9] G. Evgeniy and S. Markovitch, ′Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge′, in proceedings of the 21st national conference on Artificial intelligence - Volume 2, Boston, USA, July 16-20, 2006, pp. 1301-1306.
[10] N. N. Myat and K. H. S. Hla, ′A combined approach of formal concept analysis and text mining for concept based document clustering′, in Proceedings of the 2005 IEEE/WIC/ACM international conference on Web Intelligence, Compiegne University of Technology, France, September 19-22, 2005, pp. 330-333.
[11] P. G. Anick and S. Tipirneni, ′The paraphrase search assistant: terminological feedback for iterative information seeking′, in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, California, USA, August 15-19, 1999, pp. 153-159.
[12] C. C. Su, ′Document clustering based on vector space model with concepts as the dimension value′, National Central University, Taiwan, 2007.
[13] F. Crestani and S. Wu, ′Testing the cluster hypothesis in distributed information retrieval′, Information Processing & Management, vol. 42, no. 5, pp. 1137-1150, 2006.
[14] D. Lawrie, W. Bruce Croft and A. Rosenberg, ′Finding topic words for hierarchical summarization′, in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, USA, September 9-12, 2001, pp. 349-357.
[15] F. Sebastiani, ′Machine learning in automated text categorization′, CSUR, vol. 34, no. 1, pp. 1-47, 2002.
[16] C. Chang and C. Lin, ′LIBSVM′, ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1-27, 2011.
[17] E. Voorhees and D. Harman, ′Overview of the Sixth Text REtrieval Conference (TREC-6)′, Information Processing & Management, vol. 36, no. 1, pp. 3-35, 2000.
[18] X. Hu, X. Zhang, C. Lu, E. K. Park and X. Zhou, ′Exploiting Wikipedia as external knowledge for document clustering′, in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, June 28-July 1, 2009, pp. 389-396.
[19] D. Harman, ′Relevance feedback revisited′, in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, Copenhagen, Denmark, June 21-24, 1992, pp. 1-10.
[20] I. RUTHVEN and M. LALMAS, ′A survey on the use of relevance feedback for information access systems′, Knowl. Eng. Rev., vol. 18, no. 2, pp. 95-145, 2003.
[21] S. Robertson and K. Jones, ′Relevance weighting of search terms′, J. Am. Soc. Inf. Sci., vol. 27, no. 3, pp. 129-146, 1976.
指導教授 周世傑(Shih-Chieh Chou) 審核日期 2015-7-27
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明