摘要(英) |
During processing the document classification, in addition to takes time for reading to understand the document content, sometimes also need some expertise to understand the document content. Therefore, document classification is a work which is very time consuming and requires specific experts to complete. Nowadays, information technology has been quite popular, and the documents storage platform and the reading habits of readers had changed from paper to digital content. Accordingly, the importance of how to use the advantages of computing process automation to solve the classification problem is getting increasingly, so that to save time and reduce the difficulties of artificial document classification.
In this study, we applied SVM classifier in a knowledge sharing platform for enterprise document publishing process, and use its classified documents processed by document publisher as our experiment testing data. The documents gathered from the technology industry news articles. The experiment results of SVM classifier in the classification accuracy rate is 86%, in dealing with the case of multi-class classification is also 86% accuracy. Hence, the SVM classifier is suitable for applications in such technology industry news articles document classification.
|
參考文獻 |
Anette Hulth., & Beata B. Megyesi. (2006). A Study on Automatically Extracted Keywords in Text Categorization.
Arunkumar Chinnasamy., Wing-Kin Sung., & Ankush Mittal. (2005). Protein Structure and Fold Prediction Using Tree-augmented Naïve Bayesian Classifier. J.BioInformatics and Computational Biology 3 (4), 803-820
B. Masand., G. Linoff., & D.Waltz. (1992). Classifying news stories using memory based reasoning. In 15th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), 59-64.
Chih-Chung Chang., & Chih-Jen Lin. (2001). LIBSVM : a library for support vector machines.
Corinna Cortes., & V. Vapnik. (1995). Support-Vector Networks. Machine Learning, 20.
G. Salton., & C. S. Yang. (1973). On the Specification of Term Values in Automatic Indexing, Journal of Documentation, 29(4), 351-372.
D. D. Lewis. (1998). Naïve (Bayers) at forty: The independence assumption in information retrieval. European Conference on Machine Learning, pp.4-15.
Fang Yuan., Liu Yang., & Ge Yu. (2005). Improving The K-NN and Applying it to Chinese Text Classification. International Conference on Machine Learning and Cybernetics, Vol.3, pp.1547-1533.
Fu Chang., Chin-Chin Lin., & Chun-Jen Chen. (2004). A Hybrid Method for Multiclass Classification and Its Application to Handwritten Character Recognition. Institute of Information Science, Academia Sinica, Taipei, Taiwan, Tech. Rep. TR-IIS-04-016.
G. Salton., A. Wong., & C. S. Yang. (1975). A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, 613-620.
IBM. (1998). Intelligent iner for Text: Getting Started, IBM Corp.
Jiu-Zhen Liang. (2004). SVM multi-classifier and web document classification. International Conference on Machine Learning and Cybernetics, Vol.3 , pp.1347-1351.
J. Rocchio. (1971). Relevance Feedback in Information Retrieval. Prentice-Hall, ch. 14, 313–323.
Martin A. Hunt., et al. (2000). Paradigm for selecting the optimum classifier in semiconductor automatic defect classification applications. Proceedings of SPIE Vol. 3998.
Robertson, S.E., & Sparck Jones, K. (1976). Relevance weighting of search terms, Journal of the American Society for Information Science, 27, 129-146.
Teng-Kai Fan., & Chia-Hui Chang. (2007). Exploring Evolutionary Technical Trends From Academic Research Papers.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning, New York: Morgan Kaufman.
Y. Yang. (1994). Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Ireland, 13-22.
|