以統計分析探討文件分類程序對期刊論文分類效果之影響

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：3.149.25.73

姓名

賴昆佑(Kun-You Lai) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以統計分析探討文件分類程序對期刊論文分類效果之影響
(The Study of the Effects of Text Categorization Processes on Journal Papers Classification by Statistical Analysis)

相關論文

★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例	★ 生物晶片之基因微陣列影像分析之研究
★ 台灣資訊家電產業IPv6技術地圖與發展策略之研究	★ 台灣第三代行動通訊產業IPv6技術地圖與發展策略之研究
★ 影響消費者使用電子書閱讀器採納意願之研究	★ 以資訊素養映對數位學習平台功能之研究
★ 台商群聚指標模式與資料分析之研究	★ 未來輪輔助軟體發展之需求擷取研究
★ 以工作流程圖展現未來研究方法配適於前瞻研究流程之研究	★ 以物件導向塑模未來研究方法配適於前瞻研究之系統架構
★ 應用TRIZ 探討核心因素建構電子商務新畫布	★ 企業策略資訊策略人力資源管理策略對組織績效的影響
★ 採用Color Petri Net方法偵測程式原始碼緩衝區溢位問題	★ 簡單且彈性化的軟體代理人通訊協定之探討與實作
★ 利用分析層級程序法探討台灣中草藥製造業之關鍵成功因素	★ 利用微陣列資料分析於基因調控網路之建構與預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

期刊論文提供專業領域知識，然資訊超載造成檢索時間成本浪費，應用文件分類技術可讓使用者迅速取得相關領域之期刊論文。文件分類程序包含「前處理」、「文件特徵建構」、「分類方法應用」與「分類結果評估」等四個階段。針對期刊論文之分類效果，本研究以統計假設檢定探討期刊論文分類程序中，特徵權重方法、文章欄位差異與應用不同分類器對分類效果之影響，並與本研究設計之抽樣分配分類器進行比較。由實驗模擬與統計假設檢定分析顯示，第一，以特徵比例作為特徵權重方法分類效果顯著優於特徵頻率。第二，文章欄位以「摘要」之分類效果最佳，優於標題與關鍵字，後兩者則無顯著差異。第三，期刊論文分類以支持向量機分類效果最佳，其次為貝式機率分類器、決策樹以及抽樣分配分類器。第四，應用文件分類技術將期刊論文分類之方法可行。另外針對抽樣分配分類器部分，亦提出分析結果與建議，以提升未來研究所需。

摘要(英)

Journal papers provide professional domain knowledge. Nevertheless, emerging of information overloading causes considerable cost of time. Application of text categorization technology could help users to retrieve domain journal papers efficiently. Four phases of text categorization process are “text pre-processing”, “document feature construction”, “applying classification methods” and “evaluation”. This research probes for the effectiveness of: feature weighting, fields of articles and classifiers during the process of journal papers categorization, and also applied sampling distribution classifier within the process. The hypothesis test analysis shows that: 1st, feature ratio performs well significantly than feature frequency. 2nd, fields of abstract are more effective than titles and keywords of journal papers, and there are no difference between the latter two. 3rd, Support vector machines are most effective, then naïve-bayes, decision trees and sampling distribution classifier in order. And 4th, text categorization of journal papers is feasible. Additionally, analysis and recommendation of sampling distribution classifier are also proposed for the future study.

關鍵字(中)

★ 統計檢定
★ 分類器
★ 期刊論文分類
★ 文件分類

關鍵字(英)

★ text categorization
★ journal papers classification
★ classifiers
★ hypothesis test

論文目次

摘要 v
Absrtact vi
目錄 ix
1序論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 論文架構 5
2. 文獻探討 6
2.1 文字探勘與機器學習 6
2.2 文件分類 10
2.3 分類器 16
3. 研究方法 21
3.1 研究問題 21
3.2 研究流程 24
3.3 分類方法 26
4. 實驗結果與分析 31
4.1 實驗設計 31
4.2 特徵權重方法分類效果差異檢定 35
4.3 文章欄位分類效果差異檢定 41
4.4 抽樣分配分類器之信心水準分類效果差異檢定 50
4.5 分類器分類效果差異檢定 58
5. 結論與建議 65
5.1 結論 65
5.2 研究限制 66
5.3 未來研究方向 67
參考文獻 68

參考文獻

[1]. Fuller M., Zobel J., Conflation based Comparison of Stemming Algorithms, Proceedings of the Third Australian Document Computing Symposium Sydney, Australia, August, 1998.
[2]. Rish Irina, “An Empirical Study of the Naïve Bayes Classifier”, IJCAI2001 Workshop on Empirical Methods in Artificial Intelligence. 2001.
[3]. Margaret H. Dunham, “Data Mining: Introductory and Advanced Topics”, Prentice Hall, 2003.
[4]. Sebastiani F., “Machine Learning in Automated Text Categorization”, ACM Computing Surveys (CSUR), Vol.34, Issue 1, pp.1-47, 2002.
[5]. Sebastiani F., “Text Categorization”, Text Mining and its Applications, WIT Press, Southampton, UK, pp.109-129, 2005.
[6]. Yang Y., Pedersen J., “A Comparative Study on Feature Selection in Text Categorization”, International Conference of Machine Learning (ICML-97), pp.412-420.
[7]. Yang. Y, Liu X., “A re-examination of text categorization methods”, Proceedings of SIGIR’ 99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp.42-49.
[8]. Joachims T., “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proceedings of the European. Conference on Machine Learning, 1998.
[9]. Salton G., Buckley C., “Term Weighting Approaches in Automatic Text Retrieval”, Information Process, man, 24, 5, 1988, pp.513-523.
[10]. Burges C. “A tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 1998, 2(2). pp.121-167.
[11]. Joachims T. “Transductive Inference for Text Classification Using Support Vector Machines”, Proceedings of ICML-99, 16th International Conference on Machine Learning, pp.200-209.
[12]. Coppin B., “Artificial Intelligence Illuminated” Jones and Barlett, 2003.
[13]. Aas K., Eikvil.L., “Text Categorization: A Survey”, Report No. 941, Norwegian Computing Center , June, 1999.
[14]. Garcia Adeva J.J., Pikatza J.M., Florez S., Sobrado F.J., “Intrusion Detection Using Text Mining in a Web-Based Telemedicine System”, Proceedings of the 18th Australian Joint Conference on Artificial Intelligence, 2005.
[15]. Feldman R., Fresko M., “Knowledge Management: A Text Mining Approach”, Proceedings of the 2nd International Conference on Practical Aspects of Knowledge Management, 1998.
[16]. Felldman R., Dagan I., “Knowledge Discovery in Textual Databases (KDT) ”, 1st International Conference on Knowledge Discovery (KDD-95). 1995.
[17]. Lu W., Chien L. Lee H., “Translation of Web Queries Using Anchor Text Mining”, ACM Transaction on Asian Language Information Processing (TALIP), Vol. 1, Issue 2, pp.159-172, 2002.
[18]. Kao A. Poteet. S., “Text Mining and Natural Language Processing – Introduction for the Special Issue”, Springer-Verlag, New York, 2006.
[19]. Stumme G., Hotho A., Beremdt B., “Usage Mining for and on the Semantic Web”, The Semantic Web – ISWC 2002, 1st International Semantic Web Conference (2002), Vol. 2342 of Lecture Notes in Computer Science, pp.264-278.
[20]. Snowball Stemming Algorithms for Use in Information Retrieval, 2003. http://www.snowball.tartarus.org/
[21]. Heller K., Ghahramani Z., “Bayesian Hierarchical Clustering”, ACM International Conference Proceeding Series; Vol. 119, Proceedings of the 22nd international conference on Machine Learning, pp.297-304. 2005.
[22]. Moretti S., “Minimum Cost Spanning Trees Situations and Gene Expression Data Analysis”, ACM International Conference Proceeding Series; Vol. 199 Proceedings form the 2006 workshop on Game theory for Communications and Networks, 2006.
[23]. Goldberg D.E., “Gene Algorithm in Search, Optimization and Machine Learning”, Addison-Wesley, New York, 1989.
[24]. Davis L., “Handbook of Genetic Algorithms”, Van Nostrand Reinhold, New York, 1992.
[25]. Maulik U., Bandyopadhyay S., “Genetic Algorithm-Based Clustering Technique”, Pattern Recognition Vol. 33, pp.1455-1465, 2000.
[26]. Jeffrey L., Solka, Avory C., Bryant, Edward J. Wegman, “Text Data Mining with Minimal Spanning Trees”, Handbook of Statistics, Vol. 24, 2005.

指導教授

薛義誠(Yih-Chearng Shiue)

審核日期

2007-7-12

推文