Google文字關聯在多領域文件分類上的應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.20.205.228

姓名

陳棅易(Ping-I Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

Google文字關聯在多領域文件分類上的應用
(Using Google’s Keyword Relation in Multi-Domain Document Classification)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

傳統的文件分類需先將文件都下載到電腦上，接著透過關鍵字重要性計算將潛在關鍵字抽取出來做為文件之代表序列，最後利用文件向量比對演算法進行分類。但是，在網路資訊發展越趨成熟的年代，使用者常常透過網頁瀏覽多種不同領域知識的文件或網頁。若要針對各領域訓練出關鍵字以抽取出代表性序列達到跨領域知識分類的目的，將會造成極大的資源浪費也缺乏效率。而且各領域的序列維度也將會因資訊的無限更新與擴充，而變得極為龐大需要耗費大量運算與儲存資原。本篇論文介紹使用我們自行改良之GCD演算法為基礎，透過每個關鍵字在Google中所擁有網頁數的比率來計算文字的重要性來組成一個關鍵字網路(WANET)。接著利用序列攫取演算法找出文字網路中最具代表性的K個關鍵字 (K≦4)做為代表性序列。由於我們的代表性序列太短，因此傳統的向量比對演算法無法適用在此環境。因此，我們也利用搜尋引擎為基礎的概念做出Google Purity measurement演算法做為向量比對的依據。本系統由於所有演算法都是以搜尋引擎的網頁數值來做為計算依據，所以可達成即時跨領域分類的目的。我們也透過實驗證實了若欲分類的文件包含的專業詞彙較少被其他領域引用的狀態下，可以達到極高的分類精準度。我們系統唯一的缺點在於對Google Query次數太頻繁導致整體執行效率較傳統的向量比對方式差，但是由於我們不需要預先蒐集訓練集，向量也不會跟著文件增加而一直無限制成長。所以長期來看我們提出的方法會比傳統作法有效率。我們相信未來可透過更進一步的改良，使得整體精準度與計算效率能有效提升，將能更加使使用者能有效的整理學習過的資訊，亦能透過相同的演算法找出有用的資訊即時推薦給使用者做為輔助閱讀的依據。

摘要(英)

How to automatically classify information in an efficient way is becoming more and more important in recent years. We can collect all kinds of knowledge from search engines to improve the quality of decision making, and use document classification systems to manage the knowledge repository. Document classification systems always need to construct a keyword vector, which always contains thousands of words, to represent the knowledge domain. Thus, the computation complexity of the classification algorithm is very high. Also, users need to download all the documents before extracting the keywords and classifying the documents. In this thesis, we described a new algorithm called “Word AdHoc Network” and used it to extract the most important sequences of keywords for each document. The keyword sequence is composed of no more than four keywords. We will also use a new similarity measurement algorithm, called “Google Purity,” to calculate the similarity between the extracted keyword sequences to classify similar documents together. By using this system, we can easily classify the information in different knowledge domains at the same time, and all the executions are real-time without any pre-established keyword repository. Our experiments show that the classification results are very accurate and useful. The only weakness of our system is that the execution time of our system is longer than the cosine method. But we can save the time of choosing those training data and the vectors of each domain can remain only 4-gram. This new system can improve the efficiency of document classification and make it more usable in Web-based information management.

關鍵字(中)

★ 文字向量序列
★ 文字檢索
★ 文件分類
★ 相似度比對

關鍵字(英)

★ keyword sequence
★ information retrieval
★ classification
★ similarity distance

論文目次

摘要 i
Abstract ii
致謝辭 iii
Table of Contents iv
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Motivations 1
1.2 Contributions 3
1.3 Thesis Organization 4
Chapter 2 Literature Review 5
2.1 Automatic Term Recognition (ATR) and Term Weighting 9
2.1.1 Statistics Approach 9
2.1.2 Machine Learning Approach 11
2.1.3 Semantic Similarity 14
2.2 Word Sequences 17
2.2.1 N-gram 18
2.2.2 Multiple keyword sequence extraction 19
2.3 Keyword Expansion 19
2.4 User Profiles 21
2.5 Collaborative Filtering (CF) Recommendation 22
2.6 Re-ranking the Search Results 25
2.7 Keyword Suggestions To Improve User Browsing Behavior 25
2.8 Document Clustering and Web Taxonomy 27
2.8.1 Vector space model 28
2.8.2 Frequent itemset mining 28
2.8.3 K-means and agglomerative 30
2.9 Joint Inference 31
2.10 Knowledge Map and Sentimental Analysis 33
2.11 Summary 34
Chapter 3 Proposed Method 36
3.1 Sequence extraction algorithm 36
3.1.1 1-gram filtering method 37
3.1.2 Google Core Distance (GCD) 41
3.1.3 PageRank algorithm 44
3.1.4 BB’s Graph-Based Clustering algorithm 47
3.1.5 Hop-by-Hop Routing algorithm (HHR) 48
3.2 Similarity Measurement using Gooogle Purity 49
3.2.1 The Google Purity 50
3.2.2 Base Sequence Selection 52
3.2.3 Similarity measurement 53
Chapter 4 Experimental Results 58
4.1 Time Variance Effect of the Goolge Search Results 59
4.2 N-gram Evaluation 62
4.3 Sequence Strength Measurement 63
4.4 Execution Time 64
4.5 Pure Sequence Rate 65
4.6 Classifying Accuracy Measurement 67
4.7 Summary 76
Chapter 5 Conclusions and Future Works 78
References 80

參考文獻

[1]. G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17(6), pp. 734-749, 2005.
[2]. E. Agichtein, S. Lawrence, and L. Gravano, “Learning to find answers to questions on the Web,” ACM Transactions on Internet Technology, Vol. 4(2), pp. 129- 162, 2004.
[3]. G. Andrew, T. Grenager, and C. Manning, (2004). “Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Tasks,” EMNLP 2004, pp. 150-157, 2004.
[4]. L. Azzopardi, M. Girolami, and M. Crowe, “Probabilistic hyperspace analogue to language,” SIGIR’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 575-576, 2005.
[5]. S. Banerjee and T. Pedersen, “An adapted Lesk algorithm for word sense disambiguation using Word-Net,” Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, 2002.
[6]. S. Batra and S. Bawa, “Web Service Categorization Using Normalized Similarity Score,” International Journal of Computer Theory and Engineering, Vol. 2(1), pp. 1793-8201, 2010.
[7]. S. Batra and S. Bawa, “Semantic Categorization of Web Services,” International Journal of Recent Trends in Engineering, Vol. 2(3), pp. 19-23, 2009.
[8]. D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log,” Proceedings of ACM SIGKDD, 2000.
[9]. K. Bharat, “SearchPad: explicit capture of search context to support Web search,” Computer Networks, Vol. 33(1-6), pp. 493-501, 2001.
[10]. T. Biru, A. EI-Hamdouchi, R. S. Rees, and P. Willett, “Inclusion of relevance information in the term discrimination model,” Journal of Documentation, Vol. 45, pp. 85-109, 1989.
[11]. J. Borges and M. Levene, “Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions,” IEEE Transactions on Knowledge and Data Engineering, Vol. 19(4), pp. 441-452, 2007.
[12]. J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pp. 43–52, 1998.
[13]. G. Browne, S. Curley, and P. Benson, “Evoking information in probability assessment: Knowledge maps and reasoning-based directed questions,” Managements Science, Vol. 43(1), pp. 1-14, 1997.
[14]. V. Bush, As We May Think, Atlantic Monthly, Vol. 176, pp. 101-108, 1945.
[15]. P.I. Chen and S.J. Lin, “Automatic keyword prediction using Google similarity distance,” Expert Systems with Applications, Vol. 37(3), pp. 1928-1938, 2010.
[16]. P.I. Chen and S.J. Lin, “Word AdHoc Network: Using Google Core Distance to Extract the Most Relevant Information,” Knowledge-Based Systems, Vol. 24(3), pp. 393-405, 2011.
[17]. P.I. Chen, S.J. Lin, and Y.C. Chu, “Using Google Latent Semantic Distance to Extract the Most Relevant Information,” Expert Systems with Applications, Vol. 38, pp. 7349-7358, 2011.
[18]. L. F. Chien, “PAT-tree-based Keyword Extraction for Chinese Information Retrieval,” Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50-59, 1997.
[19]. L. F. Chien, “PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval,” Information Processing and Management, Vol. 35, pp. 501-521, 1999.
[20]. S. L. Chuang and L. F. Chien, “Enriching Web taxonomies through subject categorization of query terms from search engine logs,” Decision Support Systems, Vol. 35(1), pp. 113-127, 2003.
[21]. S. L. Chuang and L. F. Chien, “Automatic query taxonomy generation for information retrieval applications,” Online Information Review, Vol. 27(4), pp. 243-255, 2003.
[22]. R. L. Cilibrasi and P.M.B. Vitanyi, “The Google Similarity Distance,” IEEE Transactions on Knowledge and Data Engineering, Vol. 19(3), pp. 370-383, 2007.
[23]. C. Collosal, “How well does the world wide web represent human language?” The Economist, 2005.
[24]. H. Cui, J.R. Wen, J. Y. Nie, W. Y. Ma, “Query Expansion by Mining User Logs,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15(4), pp. 829-839, 2003.
[25]. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41(6), pp. 391-407, 1990.
[26]. D. Doermann, “The Indexing and Retrieval of Document Images: A Survey,” Computer Vision and Image Understanding, Vol. 70(3), pp. 287-298, 1998.
[27]. H. Edith, A.G. Rene, J.A. Carrasco-Ochoa, and J.F. Martinez-Trinidad, “Document clustering based onmaximal frequent sequences,” Proceedings of the FinTAL 2006, pp. 257-267, 2006.
[28]. G. Ercan, I. Cicekli, “Using Lexical Chains for Keyword Extraction,” Information Processing and Management, Vol. 43(6), pp. 1705-1714, 2007.
[29]. R. Feldman, I. Dagen and H. Hirsh, “Mining Text Using Keywords Distributions,” Journal of Intelligent Information Systems, Vol. 10(3), pp. 281-300, 1998.
[30]. F. Feng and W. Croft, “Probabilistic techniques for phrase extraction,” Information Processing and Management, Vol. 37, pp. 199–200, 2001.
[31]. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, “Placing Search in Context: The Concept Revisited,” ACM Trans on Information Systems, Vol. 20(1), pp. 116-131, 2001.
[32]. E. Frank, G. W. Paynter, and I. H. Witten, “Domain-Specific Keyphrase Extraction,” Proceedings of the 16th International Joint Conference on Aritifcal Intelliegence, pp. 668-673, 1999.
[33]. G. E. Freund and P. Willett, “Online identification of word variants and arbitrary truncation searching using a string similarity measure,” Information Technology: Research and Development, Vol. 1(3), pp.177-187, 1982.
[34]. B. Fung, K. Wang, and M. Ester, “Hierarchical document clustering using frequent itemsets,” Proceedings of the 3rd SIAM International Conference on Data Mining, 2003.
[35]. E. Gabrilovich and S. Markovitch, “Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis,” IJCAI 2007, pp. 1606-1611, 2007.
[36]. D. Godoy and A. Amandi, “Modeling user interests by conceptual clustering,” Information Systems, Vol. 31(4-5), pp.247-265, 2006.
[37]. T. Hofmann, “Probabilistic latent semantic indexing,” Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50-57, 1999.
[38]. X. H. Hu and B. Wu, “Automatic Keyword Extraction Using Linguistic Features,” Sixth IEEE International Conference on Data Mining, pp.19-23, 2006.
[39]. A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 2003.
[40]. R. Islam and R. Islam, “An Improved Keyword Extraction Method Using Graph Based Random Walk Model,” Proceedings of the Eleventh IEEE International Conference on Computer and Information Technology, pp. 25-27, 2008.
[41]. M. Jansen, A. Spink, J. Bateman, and T. Saracevic, “Real Life Information Retrieval: A Study of User Queries on the Web,” Proceedings of the ACM SIGIR Forum, Vol. 32, pp. 5-17, 1998.
[42]. K.S. Jones, “Information Retrieval and Artificial Intelligence,” Artificial Intelligence, Vol. 114(1-2), pp. 257-281, 1999.
[43]. J. Bar-llan, M.H. Mazlita, and L. Mark, “Methods for comparing rankings of search engine results,” Computer Networks, Vol. 50(10), pp. 1448-1463, 2006.
[44]. K. Kageura and B. Umino, “Methods of automatic term recognition,” Terminology, Vol. 3(2), pp. 259, 1996.
[45]. L. Khan, D. McLeod, and E. Hovy, “Retrieval effectiveness of an ontology-based model for information selection,” The VLDB Journal, pp.71-85, 2004.
[46]. M. Kitamura and Y. Matsumoto, “Automatic extraction of word sequence correspondences in parallel corpora,” Proceeding of the 4th Workshop on Very Large Corpora, pp. 78-89, 1996.
[47]. M. Kobayashi and K. Takeda, “Information retrieval on the web,” ACM Computing Surveys (CSUR), Vol. 32(2), pp.144-173, 2000.
[48]. I. Konstas, V. Stathopoulos, and J. M. Jose, “On social networks and collaborative recommendation,” SIGIR’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 195–202, 2009.
[49]. T. Kudoh and Y. Matsumoto, “Use of support vector learning for chunk identification,” Proceedings of the CoNLL-2000 and LLL-2000, 2000.
[50]. K. C. Lee, J. S. Kim, N. H. Chung, and S. J. Kwon, “Fussy Cognitive Map Approach to Web-Mining Inference Amplification,” Expert System with Applications, Vol. 22, pp. 197-211, 2002.
[51]. K. W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based Clustering of Search Engine Queries,” IEEE Transactions on Knowledge and Data Engineering, Vol. 20(11), pp. 1505-1518, 2008.
[52]. S. J. Li, H. F. Wang, S. W. Yu, and C. S. Xin, “News-Oriented Keyword Indexing with Maximum Entropy Principle,” Proceedings of the PACLIC17, pp. 277-281, 2003.
[53]. Y. Li, C. Zhang, and J. R. Swan, “An Information Filtering Model on the Web and Its Application in JobAgent,” Knowledge-Based Systems, Vol. 13(5), pp. 285-296, 2000.
[54]. Y. J. Li, S. M. Chung, and J. D. Holt, “Text document clustering based on frequent word meaning sequences,” Data and Knowledge Engineering, Vol. 64, pp. 381-404, 2008.
[55]. F. R. Lin and C. M. Hsueh, “Knowledge map creation and maintenance for virtual communities of practice,” Information Processing and Management, Vol. 42(2), pp. 551-568, 2006.
[56]. C. Y. Lin and E. H. Hovy, “Automatic evaluation of summaries using n-gram co-occurrence statistics,” Proceedings of the Human Language Technology Conference, 2003.
[57]. R. Lindsey, M. Stipicevic, V. D. Veksler, and W. D. Gray, “BLOSSOM: Best path Length On a Semantic Self-Organizing Map,” Proceedings of the 30th Annual Meeting of the Cognitive Science Society, 2008.
[58]. F. Liu, C. Yu, and W. Meng, “Personalized Web Search for Improving Retrieval Effectiveness,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16(1), pp. 28- 40, 2004.
[59]. Z. Y. Lu, Y. Y. Yao, and N. Zhong, “Web Log Mining,” Web Intelligence, pp. 174-194, 2003.
[60]. H. P. Luhn, “A statistical approach to mechanized encoding and searching of literary information,” IBM Journal of Research and Development, 1957.
[61]. K. Lund and C. Burgess, “Hyperspace analogue to language (HAL): A general model semantic representation,” Brain and Cognition, Vol. 30(3), pp. 5, 1996.
[62]. C. Luo, Y. Li, and S. M. Chung, “Text document clustering based on neighbors,” Data and Knowledge Engineering, Vol. 68(11), pp.1271-1288, 2009.
[63]. C. Mangold, “A survey and classification of semantic search approaches,” International Journal of Metadata Semantics and Ontology, Vol. 2(1), pp. 23-34, 2007.
[64]. M. Makrehchi and M. S. Kamel, “Automatic Taxonomy Extraction Using Google and Term Dependency,” Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp.321-325, 2007.
[65]. Y. Matsuo and M. Ishizuka, “Keyword Extraction from a Single Document Using Word Co-ocuurrence Statistical Information,” International Journal on Artificial Intelligence Tools, Vol. 13(1), pp. 157-169, 2004.
[66]. I. Matveeva, G. Levow, A. Farahat, and C. Royer, “Generalized latent semantic analysis for term representation,” Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria, 2005.
[67]. P. Mcnamee and J. Mayfield, “Entity extraction without language-specific resources,” Proceedings of CoNLL-2002, pp. 183-186, 2002.
[68]. T. Meng and H. F. Yan, “On the peninsula phenomenon in web graph and its implications on web search,” Computer Networks, Vol. 51(1), pp. 177-189, 2007.
[69]. R. Mihalcea and A. Csomai, “Wikify!: linking documents to encyclopedic knowledge,” Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 233-242, 2007.
[70]. R. Mihalcea and P. Tarau, “TextRank - bringing order into texts,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004.
[71]. M. O'Mahony, N. Hurley, N. Kushmerick, and G. Silvestre, “Collaborative recommendation: A robustness analysis,” ACM Transactions on Internet Technology (TOIT), Vol. 4(4), pp. 344-377, 2004.
[72]. B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” Proceedings of the Association for Computational Linguistics, pp. 271-278, 2004.
[73]. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation,” IBM Research Report RC22176 (W0109-022), 2001.
[74]. L. A. F. Park and K. Ramamohanarao, “Efficient storage and retrieval of probabilistic latent semantic information for information retrieval,” The VLDB Journal, Vol. 18(1), pp. 141-155, 2009.
[75]. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, “Identity uncertainty and citation matching,” Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2002.
[76]. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, “Identity uncertainty and citation matching,” Proceedings of NIPS-03, 2003.
[77]. T. Pedersen, S. Patwardhan, and J. Michelizzi, “Wordnet::similarity—measuring the relatedness of concepts,” Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), pp. 1024-1025, 2004.
[78]. H. Poon and P. Domingos, “Joint inference in information extraction,” Proceedings of the 22nd national conference on Artificial intelligence, pp. 913-918, 2007.
[79]. A. Ratnaparkhi, J. Reynar, and S. Roukos, “A maximum entropy model for prepositional phrase attachment,” Proceeding of the Human Language Technology Workshop. Plainsboro, NJ: Advanced Research Projects Agency, pp. 250-255, 1994.
[80]. G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, Vol. 18(11), pp.613-620, 1975.
[81]. G. Salton, C. S. Yang, and C. T. Yu, “A Theory of Term Importance in Automatic Text Analysis,” Journal of the American society for Information Science, Vol. 26(1), pp. 33-44, 1975.
[82]. J. J. Sandvig, B. Mobasher, and R. Burke, “Robustness of collaborative recommendation based on association rule mining,” Proceedings of the 2007 ACM conference on Recommender systems, 2007.
[83]. B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Reidl, “Item-based collaborative filtering recommendation algorithms,” In World Wide Web, pp. 285-295, 2001.
[84]. K. Sato and H. Saito, “Extracting word sequence correspondences with support vector machines,” Proceedings of the 19th international conference on Computational linguistics, Taipei, Taiwan, pp.1-7, 2002.
[85]. F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, Vol. 34(1), pp. 1-47, 2002.
[86]. K. Seymore, A. McCallum, and R. Rosenfeld, “Learning hidden Markov model structure for information extraction,” Proceedings of AAAI’99 workshop on machine learning for information extraction, 1999.
[87]. U. Shardanand and P. Maes, “Social Information Filtering: Algorithms for Automating Word of Mouth,” Proceedings of the Computer-Human Interaction Conference (CHI'95), 1995.
[88]. B. Sigurd, E. O. Mats, and J. V. Weijer, “Word length, sentence length and frequency - Zipf revisited,” Studia Linguistica, Vol. 58(1), pp. 37-52, 2004
[89]. A. Singhal, “Modern Information Retrieval: A Brief Overview,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 24(4), pp. 35-43, 2001.
[90]. B. Smyth, D. Wilson, and D. O’Sullivan, “Improving the quality of the personalised electronic programme guide,” Proceedings of the TV’02 the 2nd Workshop on Personalisation in Future TV, pp. 42-55, 2002.
[91]. D. Song and P. D. Bruza, “Discovering information flow using a high dimensional conceptual space,” Proceedings of the 24th ACM SIGIR, pp. 327-333, 2001.
[92]. K. Sparck Jones and P. Willett, “Readings in Information Retrieval,” Morgan Kaufmann, 1997.
[93]. M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” Proceedings of the KDD-2000 Workshop TextMining, 2000.
[94]. A. Takasu, “Bibliographic attribute extraction from erroneous references based on a statistical model,” Proceedings of joint conference on digital libraries (JCDL), 2003.
[95]. P. D. Turney, “Learning Algorithms for Keyphrase Extraction,” Information Retrieval, Vol. 2, pp. 303-336, 2000.
[96]. W. Vestal, “Knowledge Mapping 101,” Presentation at USAID Knowledge for Development Seminar September 22, 2003.
[97]. J. Xu and W. B. Croft, “Improving the effectiveness of information retrieval with local context analysis,” ACM Transaction Information System (TOIS), Vol. 18(1), pp. 79-112, 2000.
[98]. J. Yang, W. Wang, H. Wang, and P. S. Yu, “Delta-clusters: Capturing subspace correlation in a large data set,” Proceedings of the ICDE, pp. 517-528, 2002.
[99]. Q. Yang, H. Zhang, I. Tian, and Y. Li, “Mining Web Logs for Prediction Models in WWW Caching and Prefetching,” Proceedings of the Seventh ACM SIGKDD Internal Conference of Knowledge Discovery and Data Mining, pp. 473-478, 2001.
[100]. C. Z. Zhang, H. L. Wang, Y. Liu, D. Wu, Y. Liao, and B. Wang, “Automatic Keyword Extraction from Documents Using Conditional Random Fields,” Journal of Computational Information Systems, Vol. 4(3), pp.1169-1180, 2008.
[101]. D. Zhang and W. S. Lee, “Web Taxonomy Integration using Support Vector Machines,” Proceedings of the 13th International World Wide Web Conference, 2004.
[102]. W. Zhang, T. Yoshida, and X. J. Tang, “Text classification based on multi-word with support vector machine,” Knowledge-Based Systems, Vol. 21(8), pp. 879-886, 2008.
[103]. N. Zhong, “Representation and Construction of Ontologies for Web Intelligence,” International Journal Foundation of Computer Science, Vol. 13(4), pp. 555-570, 2002.
[104]. Z. Zhuang and S. Cucerzan, “Re-ranking search results using query logs,” Proceedings of the 15th ACM international conference on Information and knowledge management, 2006.

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2011-12-19

推文