應用智慧分類法提升文章發佈效率於一企業之知識分享平台

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：7

、訪客IP：3.144.28.50

姓名

詹智鈞(Chih-Chun Chan) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

應用智慧分類法提升文章發佈效率於一企業之知識分享平台
(Applying Intelligence Classification to Enhance Article Publishing Efficiency on A Knowledge Sharing Platform)

相關論文

★ 家庭智能管控之研究與實作	★ 開放式監控影像管理系統之搜尋機制設計及驗證
★ 資料探勘應用於呆滯料預警機制之建立	★ 探討問題解決模式下的學習行為分析
★ 資訊系統與電子簽核流程之總管理資訊系統	★ 製造執行系統應用於半導體機台停機通知分析處理
★ Apple Pay支付於iOS平台上之研究與實作	★ 應用集群分析探究學習模式對學習成效之影響
★ 應用序列探勘分析影片瀏覽模式對學習成效的影響	★ 一個以服務品質為基礎的網際服務選擇最佳化方法
★ 維基百科知識推薦系統對於使用e-Portfolio的學習者滿意度調查	★ 學生的學習動機、網路自我效能與系統滿意度之探討-以e-Portfolio為例
★ 藉由在第二人生內使用自動對話代理人來改善英文學習成效	★ 合作式資訊搜尋對於學生個人網路搜尋能力與策略之影響
★ 數位註記對學習者在線上學習環境中反思等級之影響	★ Web 2.0 社交網站的開發與實作:以國立中央大學e-Portfolio為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

文件在進行分類處理時，除了需要花費時間閱讀以瞭解其內容主題，有時候可能也需要俱備一定的專業知識才能理解文件內容，因此文件分類是一件相當花費時間且需要特定的專家才能完成的一項工作，在資訊化已相當普及的今天，文件資料儲存的平台與讀者的閱讀習慣從紙本書籍轉換到數位資料上，因此如何利用電腦運算處理自動化的優勢來解決分類問題的重要性也日益增加，以節省文件分類的時間與降低人工分類的困難度。
本研究應用SVM分類器於一企業的知識分享平台的文件發佈流程中，並以其由人工進行分類好的文件分類做為測試資料進行分類效能評測，測試文件性質為來自產業情報網站上的科技產業新聞文章，由實驗結果發現SVM分類器在此類文件的分類準確率達到86%，在處理多類別分類的問題時也達到86%的準確度，因此SVM分類器很適合應用在此類科技產業新聞文件的分類處理。

摘要(英)

During processing the document classification, in addition to takes time for reading to understand the document content, sometimes also need some expertise to understand the document content. Therefore, document classification is a work which is very time consuming and requires specific experts to complete. Nowadays, information technology has been quite popular, and the documents storage platform and the reading habits of readers had changed from paper to digital content. Accordingly, the importance of how to use the advantages of computing process automation to solve the classification problem is getting increasingly, so that to save time and reduce the difficulties of artificial document classification.
In this study, we applied SVM classifier in a knowledge sharing platform for enterprise document publishing process, and use its classified documents processed by document publisher as our experiment testing data. The documents gathered from the technology industry news articles. The experiment results of SVM classifier in the classification accuracy rate is 86%, in dealing with the case of multi-class classification is also 86% accuracy. Hence, the SVM classifier is suitable for applications in such technology industry news articles document classification.

關鍵字(中)

★ 文件分類
★ 文件發佈
★ 支援向量機

關鍵字(英)

★ Article publishing
★ SVM
★ Document classification

論文目次

摘要........................................ III
Abstract..................................... IV
誌謝.......................................... V
目錄......................................... VI
圖目錄....................................... IX
表目錄....................................... XI
第一章、緒論................................. 1
1.1 研究背景.................................. 1
1.2 研究動機.................................. 1
1.3 研究方法.................................. 2
第二章、相關研究............................. 4
2.1 分類型態定義.............................. 4
2.2 文件表示法................................ 4
2.3 分類方法 ................................. 5
2.3.1最近隣居分類法........................... 6
2.3.2貝氏分類法............................... 6
2.3.3 Rocchio分類法........................... 7
2.3.4 類神經網路分類法........................ 8
2.3.5 決策樹分類法............................ 9
2.4 多類別分類問題........................... 11
第三章、研究方法............................ 12
3.1 支援向量機............................... 12
3.1.1 線性支援向量機......................... 13
3.1.2 非線性支援向量機....................... 14
3.2 多類別支援向量機......................... 17
3.2.1 一對多之多類別支援向量機............... 17
3.2.2 一對一之多類別支援向量機............... 18
3.3支援向量機之模型驗證...................... 19
3.3.1 Cross-validation....................... 20
3.3.2 Holdout Method......................... 20
第四章、系統實作............................ 22
4.1 文件處理................................. 24
4.1.1 文件前置處理........................... 25
4.1.2 特徵篩選............................... 25
4.1.3 斷詞................................... 25
4.1.4 詞性標註............................... 25
4.1.5 過濾雜訊詞............................. 26
4.1.6 關鍵字權重計算......................... 26
4.1.7 SVM文件表示方法........................ 27
4.2 文件分類................................. 28
4.3 系統開發環境............................. 32
4.4 系統展示................................. 32
第五章、實驗結果............................ 35
第六章、結論................................ 38
參考文獻..................................... 40

參考文獻

Anette Hulth., & Beata B. Megyesi. (2006). A Study on Automatically Extracted Keywords in Text Categorization.
Arunkumar Chinnasamy., Wing-Kin Sung., & Ankush Mittal. (2005). Protein Structure and Fold Prediction Using Tree-augmented Naïve Bayesian Classifier. J.BioInformatics and Computational Biology 3 (4), 803-820
B. Masand., G. Linoff., & D.Waltz. (1992). Classifying news stories using memory based reasoning. In 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), 59-64.
Chih-Chung Chang., & Chih-Jen Lin. (2001). LIBSVM : a library for support vector machines.
Corinna Cortes., & V. Vapnik. (1995). Support-Vector Networks. Machine Learning, 20.
G. Salton., & C. S. Yang. (1973). On the Specification of Term Values in Automatic Indexing, Journal of Documentation, 29(4), 351-372.
D. D. Lewis. (1998). Naïve (Bayers) at forty: The independence assumption in information retrieval. European Conference on Machine Learning, pp.4-15.
Fang Yuan., Liu Yang., & Ge Yu. (2005). Improving The K-NN and Applying it to Chinese Text Classification. International Conference on Machine Learning and Cybernetics, Vol.3, pp.1547-1533.
Fu Chang., Chin-Chin Lin., & Chun-Jen Chen. (2004). A Hybrid Method for Multiclass Classification and Its Application to Handwritten Character Recognition. Institute of Information Science, Academia Sinica, Taipei, Taiwan, Tech. Rep. TR-IIS-04-016.
G. Salton., A. Wong., & C. S. Yang. (1975). A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, 613-620.
IBM. (1998). Intelligent iner for Text: Getting Started, IBM Corp.
Jiu-Zhen Liang. (2004). SVM multi-classifier and web document classification. International Conference on Machine Learning and Cybernetics, Vol.3 , pp.1347-1351.
J. Rocchio. (1971). Relevance Feedback in Information Retrieval. Prentice-Hall, ch. 14, 313–323.
Martin A. Hunt., et al. (2000). Paradigm for selecting the optimum classifier in semiconductor automatic defect classification applications. Proceedings of SPIE Vol. 3998.
Robertson, S.E., & Sparck Jones, K. (1976). Relevance weighting of search terms, Journal of the American Society for Information Science, 27, 129-146.
Teng-Kai Fan., & Chia-Hui Chang. (2007). Exploring Evolutionary Technical Trends From Academic Research Papers.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning, New York: Morgan Kaufman.
Y. Yang. (1994). Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Ireland, 13-22.

指導教授

楊鎮華(Stephen Yang)

審核日期

2010-12-22

推文