參考文獻 |
[1] J.-H. Kim and K.-S. Choi, “Patent document categorization based on semantic structural information”, Information Processing and Management, Vol. 43(5), pp. 1200-1215, 2007.
[2] C. J. Fall, A. Törcsvári, P. Fievét and G. Karetka, “Automated categorization of German-language patent documents”, Expert Systems with Applications, Vol. 26(2), pp. 269–277, 2004.
[3] D. Tikk, G. Biró and A. Törcsvári, “A hierarchical online classifier for patent categorization”, Emerging Technologies of Text Mining: Techniques and Applications, pp. 244–267, 2007.
[4] A. Hotho, A. Nurnberger, and G. Paab, “A Brief Survey of Text Mining”, GLDV-Journal for Computational Linguistics and Language Technology, Vol. 20(2): pp. 19-62, 2005.
[5] E. D’hondt, S. Verberne, N. Weber, CHA. Koster, and L. Boves, “Using skipgrams and PoS-based feature selection for patent classification”, Computational Linguistics in the Netherlands Journal, pp. 52-70, 2012.
[6] R. Burgin, M. Dillon, “Improving Disambiguation in FASIT”, Journal of American Society for Information Science, Vol. 43(2), pp. 101-114, 1992.
[7] J. L. Fagan, “The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval”, Journal of American Society for Information Science, Vol. 40(2), pp. 115-132, 1989.
[8] L. P. Jones, E. W. Gassie and S. Radhakrishnan, “INDEX: The Statistical Basis for an Automatic Conceptual Phrase-indexing System”, Journal of American Society for Information Science, Vol. 41(2), pp. 87-98, 1990.
[9] H. Paijmans, “Comparing the Document Representation of Two IR Systems: CLARIT and TOPIC”, Journal of American Society for Information Science, Vol. 44(7), pp. 383-392, 1993.
[10] W. Zimin and T. Gwyneth, “ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval”, Journal of American Society for Information Science, Vol. 46(2), pp. 83-96, 1995.
[11] R. Ellen and W. Janyce, “Learning Extraction Patterns for Subjective Expressions”, In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03), 2003.
[12] H. Nanba, T. Takezawa, K. Uchiyama, and A. Aizawa, “Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques”, In LREC, pp. 3447-3451, 2012.
[13] C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA., 1998.
[14] Y. Liu, B.T. McInnes, T. Pedersen, G. Melton-Meaux, and S. Pakhomov, “Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet”, In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 363-372, 2012.
[15] D. Lin, “Using Syntactic Dependency as Local Context to Resolve Word-Sense Ambiguity”, In Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics. Somerset, N.J.: Association for Computational Linguistics, 1997.
[16] P. Resnik, “Selectional Preference and Sense Disambiguation”, In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics, pp. 52–57, 1997.
[17] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, New York: The ACM Press, 1999.
[18] C.T. Meadow, B.R. Boyce and D.H. Kraft, Text Information Retrieval Systems, 2nd edition. San Diego: Academic Press, 2000.
[19] W.B. Frakes and C.J. Fox, “Strength and similarity of affix removal stemming algorithms”, SIGIR Forum, Vol. 37(1), pp. 26-30, 2003.
[20] G. Forman, “Choose your words carefully: An Empirical Study of Feature Selection Metrics for Text Classification”, Proceedings of the 6th Eur. Conf. on Principles Data Mining and Knowledge Discovery (PKDD), vol. 2431, pp. 150-162, 2002.
[21] Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization”, Proc. of the 14th International Conference on Machine Learning ICML97, pp. 412-420, 1997.
[22] M. Ikonomakis, S. Kotsiantis, and V. Tampakas, “Text Classification Using Machine Learning Techniques”, WSEAS Transactions on Computers, Issue 8, Vol. 4, pp. 966-974, 2005.
[23] L. Ballesteros and W.B. Croft, “Resolving ambiguity for cross-language retrieval”, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64-71, 1998.
[24] P. Sheridan and J.P. Ballerini, “Experiments in multilingual information retrieval using the SPIDER system”, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 58-65, 1996.
[25] A. Fontaine, Sub-element indexing and probabilistic retrieval in the POSTGRES database system, Technical Report CSD-95-876, University of California at Berkeley, 1995.
[26] G. Maayan and D. Feitelson, “Hierarchical Indexing and Document Matching in BoW”, Proceedings of the first ACM/IEEE-CS joint conference on Digital Libraries, Roanoke, Virginia, pp. 259-267, 2001.
[27] K. Tzeras and E.G.M. Petrakis, “Similarity searching in text databases with multiple field types”, Proceedings, the fifteenth International Conference on Data Engineering, pp.100, 1999.
[28] E.G.M. Petrakis and K. Tzeras K, “Similarity Searching in the CORDIS Text Database”, Software Practice and Experience, Vol. 30(13), pp. 1447-1464, 2000.
[29] T.C. Du, F. Li and I. King, “Managing knowledge on the web - extracting ontology from html web”, Decision Support System, Vol. 47(4), pp. 319-331, 2009.
[30] D. Buttler, L. Liu and C. Pu, “A fully automated object extraction system for the World Wide Web”, Proceedings of the 2001 International Conference on Distributed Computing Systems, pp. 361–370. 2001.
[31] K.H. Lee, Y.C. Choy and S.B. Cho, “An Efficient Algorithm to Compute Differences between Structured Documents”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16(8): pp. 965-979, 2004.
[32] T. Mitchell, Machine Learning, New York: McGraw-Hill, 1997.
[33] G. Salton, A. Wong and C.S. Yang, “A vector space model for automatic indexing”, Communications of the ACM, Vol. 18(11), pp. 613-620, 1975.
[34] G. Salton, J. Allan and C. Buckley, “Automatic structuring and retrieval of large text files”, Communications of the ACM, Vol. 37(2), pp. 97-108, 1994.
[35] J. L. Herlocker, J. A. Konstan, A. Borchers and J. Riedll, “An algorithmic framework for performing collaborative filtering”, Proceedings of the 22nd Conference on Research and Development in Information Retrieval (SIGIR’99), pp. 230-237, 1999.
[36] M. Porter, “An algorithm for suffix stripping”, Program, Vol. 14(3), pp. 130-137, 1980.
[37] G. Salton and M. J. McGill, Text Analysis and Automatic Indexing in Introduction to Modern Information Retrieval, New York, USA: McGraw Hill, 1983.
[38] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval”, Information Processing & Management, Vol. 24(5), pp. 513-523. 1988.
[39] D. Eisinger, G. Tsatsaronis, M. Bundschus, and S.M. Wieneke Ul, “Automated Patent Categorization and Guided Patent Search using IPC as Inspired by MeSH and PubMed”, Journal of biomedical semantics, 4.Suppl 1: S3., 2013.
[40] Y. Li and K. Bontcheva, “Adapting support vector machines for F-term-based classification of patents”, ACM Transactions on Asian Language Information Processing (TALIP), Vol. 7(2), pp. 1-19, 2008.
[41] R. Feldman, and J. Sanger, “The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data”, New York, USA: Cambridge University Press, 2007.
[42] Y. J. Li, C. Luo and S.M. Chung, “Text clustering with feature selection by using statistical data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20(5), pp. 641-652, 2008.
[43] M. Rogati, and Y. Yang, “High-performing feature selection for text classification”, CIKM’02, pp. 659-661, 2002.
[44] H. Liu, and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17(4), pp. 491-502, 2005.
[45] M.N. Ribeiro, M.J.R. Neto and R.B.C. Prudêncio, “Local feature selection in text clustering”, In: 15th ICONIP, Springer, pp. 45-52, 2008.
[46] A. Özgür, L. Özgür and T. Güngör, “Text Categorization with Class-Based and Corpus-Based Keyword Selection”, In Proceedings of the 20th International Symposium on Computer and Information Sciences, pp. 606-615, 2005.
[47] S. Tong and D. Koller, “Support vector machine active learning with applications to text classification”, Proceedings of the 17th International Conference on Machine Learning, pp. 401-412, 2000.
[48] R.C. Chen and C.H. Hsieh, “Web page classification based on a support vector machine using a weighted vote schema”, Expert Systems with Applications, Vol. 31 (2), 2006.
[49] P. Kingsbury and M. Palmer, “Propbank: the next level of Treebank”, Proceedings of Treebanks and Lexical Theories, 2003.
[50] M. Marcus, “The Penn TreeBank: A revised corpus design for extracting predicate-argument structure”, Proceedings of the ARPA Human Language Technology Workshop, Princeton, NJ, 1994.
[51] M. Marcus, B. Santorini and M.A. Marcinkiewicz, “Building a large annotated corpus of English: the Penn Treebank”, Computational Linguistics, Vol 19, 1993.
[52] S. Shehata, F. Karray and M. Kamel, “Enhancing Text Retrieval Performance using Conceptual Ontological Graph”, In ICDM Workshops, pp. 39-44, 2006.
[53] S. Shehata, F. Karray and M. Kamel, “A concept-based model for enhancing text categorization”, 13th, ACM KDD, pp. 629-637, 2007.
[54] L. Khan and L. Wang, “Automatic ontology derivation using clustering for image classification”, Multimedia Information Systems, pp. 56-65, 2002.
[55] L. Cai and T. Hofmann, “Text Categorization by Boosting automatically Extracted Concepts”, 26th Annual International ACM-SIGIR Conference, pp. 182-189, 2003.
[56] R. E. Schapire and Y. Singer, “Boostexter: A boosting-based system for text categorization”, Machine Learning, Vol. 39(2/3), 135-168, 2000.
[57] T. Magerman, B.V. Looy and X. Song, “Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications”, Scientometrics, Vol. 82(2), pp. 289-306, 2010.
[58] C.N. Silla Jr and A.A. Freitas, “A survey of hierarchical classification across different application domains”, Data Mining and Knowledge Discovery, Vol. 22(1-2), pp. 31-72, 2011.
[59] L. Cai and T. Hofmann, “Hierarchical document categorization with support vector machines”, CIKM’ 04: Proceedings of the 13th ACM conference on Information and knowledge management, pp. 78-87, 2004.
[60] L. Cai and T. Hofmann, “Exploiting known taxonomies in learning overlapping concepts”, In: Proceedings of International Joint Conferences on Artificial Intelligence, 2007.
[61] A. Elisseff and J. Weston, “A kernel method for multi-labelled classification”, In Proceedings of the Neural Information Processing Systems conference (NIPS), pp. 681–687, 2001.
[62] D. Koller and M. Sahami, “Hierarchically classifying documents using very few words”, In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 170-178, 1997.
[63] G. Tsoumakas and I. Katakis, “Multi label classification: An overview”, International Journal of Data Warehouse and Mining, Vol. 3(3), pp. 1-13, 2007.
[64] T. Fagni and F. Sebastiani, “On the selection of negative examples for hierarchical text categorization”, In: Proc. of the 3rd Language Technology Conference, pp. 24-28, 2007.
[65] F. Wu, J. Zhang and V. Honavar, “Learning classifiers using hierarchically structured class taxonomies”, In: Proc. of the Symp. on Abstraction, Reformulation, and Approximation, Springer, Vol. 3607, pp. 313-320, 2005.
[66] S. Dumais and H. Chen, “Hierarchical classification of Web content”, In: Belkin NJ, Ingwersen P, Leong MK (eds) Proc. of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval, pp. 256-263, 2000.
[67] A. Esuli, T. Fagni and F. Sebastiani, “Boosting multi-label hierarchical text categorization”, Information Retrieval, Vol. 11(4), pp. 287-313, 2008.
[68] N. Holden and A. Freitas, “Improving the performance of hierarchical classification with swarm intelligence”, In: Proc. 6th European Conf. on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBio), Springer, Lecture Notes in Computer Science, Vol. 4973, pp. 48-60, 2008.
[69] A. Secker, M. Davies, A. Freitas, J. Timmis, M. Mendao and D. Flower, “An experimental comparison of classification algorithms for the hierarchical prediction of protein function”, Expert Update (the BCS-SGAI Magazine), Vol. 9(3), pp. 17-22, 2007.
[70] A. Secker, M. Davies, A. Freitas, E. Clark, J. Timmis and D. Flower, “Hierarchical classification of g-protein-coupled-receptors with data-driven selection of attributes and classifiers”, International Journal of Data Mining and Bioinformatics, Vol. 4(2), pp. 191-210, 2010.
[71] E. Costa, A. Lorena, A. Carvalho, A. Freitas and N. Holden, “Comparing several approaches for hierarchical classification of proteins with decision trees”, In: Advances in Bioinformatics and Computational Biology, Springer, Lecture Notes in Bioinformatics, Vol. 4643, pp. 126-137. 2007.
[72] H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon and J. Struyf, “Hierarchical multiclassification”, In: Proceedings of the ACM SIGKDD 2002 Workshop on Multi-Relational Data Mining (MRDM 2002), pp. 21–35, 2002.
[73] S. Kiritchenko, S. Matwin, R. Nock and A. Famili, “Learning and evaluation in the presence of class hierarchies: Application to text categorization”, In: Proc. of the 19th Canadian Conf. on Artificial Intelligence, Lecture Notes in Artificial Intelligence, Vol. 4013, pp. 395-406, 2006.
[74] L.S. Larkey, “Some issues in the automatic classification of US patents”, In: AAAI-98 Working Notes, 1998.
[75] L. Wanner, R. Baeza-Yates, S. Bru¨gmann, J. Codina, B. Diallo, E. Escorsa, M. Giereth, Y. Kompatsiaris, S. Papadopoulos, E. Pianta, G. Piella, I. Puhlmann, G. Rao, M. Rotard, P. Schoester, L. Serafini and V. Zervaki, “PATExpert: Towards Content-Oriented Patent Document Processing”, World Patent Information Journal, Vol. 30(1), pp. 21-33, 2008.
[76] T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J. Honkela, V. Paatero and A. Saarela, “Self organization of a massive document collection”, IEEE Trans on Neural Networks, Vol. 11(3), pp. 574-585, 2000.
[77] A.J.C. Trappey, F.C. Hsu, C.V. Trappey and C.I. Lin, “Development of a patent document classification and search platform using a back-propagation network”, Expert Systems with Applications, Vol. 31(4), pp. 755-765, 2006.
[78] A. Juan and E. Vidal, “On the use of Bernoulli mixture models for text classification”, Pattern Recognition, Vol. 35(12), pp. 2705-2710, 2002.
[79] K. Nigam, A.K. McCallum, S. Thrun, and T.M. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM”, Machine Learning, Vol. 39, nos. 2/3, pp. 103-134, 2000.
[80] K.M. Schneider, “A New Feature Selection Score for Multinomial Naive Bayes Text Classification Based on KL-Divergence”, 42nd Meeting of the Association for Computational Linguistics, pp. 186-189, 2004.
[81] S.B. Kim, K.S. Han, H.C. Rim and S. H. Myaeng, “Some effective techniques for naive Bayes text classification”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18(11), pp. 1457-1466, 2006.
[82] S. Tan, “Neighbor-weighted k-nearest neighbor for unbalanced text corpus”, Expert Systems with Applications, Vol. 28, pp. 667-671, 2005.
[83] E.H. Han, G. Karypis, and V. Kumar, “Text categorization using weight adjusted k-nearest neighbor classification”, In Proceeding of the fifth pacific-asia conference on advances in knowledge discovery and data mining (PAKDD01), pp. 53-65, 2001.
[84] Y. Bao and N. Ishii, “Combining multiple k-Nearest Neighbor Classifiers for Text Classification by Reducts”, Proc.5th International Conference on Discovery Science, pp. 361-368, 2002.
[85] S. Chakrabarti, S. Roy and M. V. Soundalgekar, “Fast and accurate text classification via multiple linear discriminant projections”, The VLDB Journal, pp. 170-185, 2003.
[86] D. Isa, L. H. Lee, V. P. Kallimani and R. RajKumar, “Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20, 2008.
[87] C.J. Fall, A. Törcsvári, K. Benzineb and G. Karetka, “Automated categorization in the international patent classification”, ACM SIGIR Forum archive, Vol. 37(1), pp. 10–25, 2003.
[88] S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Using taxonomy, discriminants, and signatures for navigating in text databases”, In Proc. of 23rd VLDB conference, pp. 446–455, 1997.
[89] S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies”, The VLDB Journal, Vol. 7(3), pp. 163–178, 1998.
[90] S. Godbole and S. Sarawagi, “Discriminative Methods for Multi-labeled Classification”, Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004), 2004.
[91] J. B. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of the Fifth Symposium on Math, Statistics, and Probability, pp. 281-297, 1967. |