博碩士論文 93441023 詳細資訊


姓名 蘇育民(Yu-Min Su)  查詢紙本館藏   畢業系所 企業管理學系
論文名稱 以作者與關鍵字之多領域特性串構小世界
(Weaving the Small Worlds with the Multi-domain Property of Authors and Keywords)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 知識管理一個重要的關鍵步驟,就是對文件或文件的作者進行分群;傳統的集群演算法可以使相似的文件或作者相互聚合進而產生數個文件群或作者群,經過集群程序後,每一份文件或作者都可以找出其所隸屬的特定群體。然而,在很多實務應用上卻經常發現,同一文件或作者隸屬於兩個或兩個以上的群體。本論文研究目標,希望能藉由學術文獻共引用資料以發展出一個能將資料進行多領域集群的方法論,對作者共引用資料進行多領域專長作者的研究,以及運用文件的關鍵字集進行跨領域關鍵字的研究,並發展一個跨領域關鍵字推薦系統以強化相關文件或知識庫的搜尋能力。
傳統上要將科學引用文獻的作者分群最常使用的方法是使用作者共引用分析(Author Co-citation Analysis; ACA),它能對被引用文獻的作者進行集群,從而將每位作者歸屬到某一作者群,但是,此方法有一缺點,即它只考量到被引用文獻的第一作者,並對第一作者進行分群,很明顯的,它忽略了論文其餘共同作者對該論文的貢獻。本專題研究計畫提出一個完全作者集配對演算法(Complete Author Pair Algorithm; CAP),它不像傳統的共引用分析法只能對被引用文獻的第一作者進行分群,它提出一個完全作者集(Complete Author Set)的概念,能將被引用文獻的全部共同作者進行集群,而且能將作者歸屬到多個作者群,從而辨認出具有多領域專長的研究學者;根據對資訊科學類文獻的研究,在美國計算機協會(ACM)的資訊科學分類系統(Computing Classification System; CCS)架構下,大約有10% - 20%的資訊科學社群的學者進行多領域的論文研究寫作,傳統的共引用分析方法無從驗證此類進行多專長跨領域研究的學術活動。本論文研究計畫建置數個知名期刊的被引用文獻作者資料庫,為求資料庫資料的正確性,將使用人工上網擷取資料而不使用程式自動擷取,以免影響實驗結果的精準度;本研究運用完全作者集配對演算法對資訊科學類被引用文獻作者資料庫進行實驗,測試在不同參數值與不同集群方法時,完全作者集配對演算法在辨認多專長作者上的精確率(precision)與回應率(recall)。
如同在科學研究社群上的學者會進行多領域之研究,關鍵字也是會具有多領域的性質,在美國計算機協會(ACM)的資訊科學分類系統(Computing Classification System; CCS)架構下,大約有5% - 15%的資訊科學類之關鍵字具有多領域性質,傳統上用來將關鍵字分群以進行文件搜尋的方法是共字分析法(Co-word Analysis),但此法也無法使同一關鍵字分到不同領域,也就是說,共字分析並不能體現關鍵字具有多領域屬性的事實,本研究運用先前所發展出之完全集(Complete Set)共引用概念,將同一篇學術論文的關鍵字視為一完全關鍵字集(Complete Keyword Set),發展出完全關鍵字集配對演算法(Complete Keyword Pair Algorithm; CKP),藉由此方法來找出具有多領域性質的關鍵字,本專題將這種多領域性質的關鍵字稱之為橋性關鍵字 (bridge-keyword)。本論文研究計劃建置一個完全使用人工上網擷取的JACM期刊被引用文獻關鍵字資料庫,來對關鍵字進行完全關鍵字集配對演算法的實驗與測試不同參數值下演算法在找出多領域關鍵字的精確率與回應率,並將運用此多領域關鍵字技術發展跨領域關鍵字推薦系統,以協助文件搜尋者能延伸並擴展其文件搜尋到其他相關領域,突破目前無論是學術界或實務界其運用與發展的集群式推薦系統,都只能在同一領域上進行推薦的現況。
摘要(英) Grouping documents or authors into related domains are crucial steps in implementing Knowledge Management. Traditionally, authors and documents are grouped into one domain only. However, there are many applications, authors and documents should be grouped into multiple groups. The dissertation aims to develop a methodology to cluster data items into multiple groups based co-reference data, namely author co-citation data banks and the keywords co-reference data banks.
The author co-citation analysis (ACA) method is commonly used to group authors of reference papers. Since the traditional ACA method analyzes only first authors of reference papers, it disregards the contributions of other coauthors and can only group each first author into one cluster. This study proposes an innovative ACA algorithm called “Complete Author Pair (CAP) algorithm”, which groups complete author sets of reference papers into clusters and thus finds authors who may have expertise in more than one area. Firstly, the CAP algorithm is implemented in a data bank that collected paper references from two IS journals during 2001-2003. The results show that the CAP algorithm can identify multi-expertise authors with 70% of precision, recall, and F score when comparing against ACM CCS. The results also show that CAP algorithm with K-means method and the complete linkage method yield the best performance among six clustering methods evaluated in this experiment. Secondly, the CAP algorithm is implemented in two citation data banks that collected paper references from two ACM journals during 2002-2005. The results show that the CAP algorithm in discovering multi-expertise authors runs up to 90% of average precision in each citation bank when comparing against ACM CCS.
The co-word analysis method is commonly used to cluster related keywords into the same keyword domain. In other words, traditional co-word analysis cannot cluster the same keywords into more than one keyword domain, and disregards the multi-domain property of keywords. This study proposes an innovative keyword co-citation algorithm called “Complete Keyword Pair (CKP) algorithm”, which groups complete keyword sets of reference papers into clusters, and thus finds keywords belonging to more than one keyword domain. These keywords are termed as bridge-keywords. A recommendation system based on CKP can recommend keywords in other domains through the bridge keywords to help users extend the document search area. The CKP algorithm is implemented in a JACM citation bank of source papers from JACM during 2000–2006. Results of this study show that the CKP algorithm can discover bridge-keywords with average precision of 80% in the JACM citation bank during 2000–2006 when compared against the benchmark of ACM CCS.
關鍵字(中) ★ 集群
★ K平均集群
★ 推薦系統
★ 橋性關鍵字
★ 完全關鍵字集配對演算法(CKP)
★ 聚合層級集群
★ 資訊科學分類系統(CCS)
★ 同現分析
★ 共字分析
★ 關鍵字領域
★ 完全作者集配對演算法(CAP)
★ 完全集
★ 作者共引用分析(ACA)
關鍵字(英) ★ Recommendation systems
★ Bridge-keywords
★ Author co-citation analysis
★ Complete set
★ Complete author pair algorithm(CAP algorithm)
★ Clustering
★ K-means
★ Agglomerative Hierarchical Clustering (AHC)
★ Computing classification system (CCS)
★ Co-occurrence analysis
★ Co-w
論文目次 摘要 i
ABSTRACT iii
TABLE OF CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES ix
1 INTRODUCTION 1
1.1 Motivation 1
1.2 Objectives 6
1.3 Organization of the Dissertation 7
2 RELATED WORK 9
2.1 Author Co-citation Analysis 9
2.2 Co-word Analysis 12
3 METHODOLOGY 16
3.1 Complete Author Pair (CAP) Method 16
3.1.1 Procedure of CAP 16
3.1.2 Definition 17
3.1.3 Creation of co-citation frequency matrix 19
3.1.4 Generation of Pearson’s correlation matrix 20
3.1.5 Generating clusters of complete author sets 21
3.1.6 Deriving author domains 23
3.1.7 Algorithms of Complete Author Pair (CAP) 25
3.1.8 Reducing number of complete author pairs with author support threshold 27
3.2 Complete Keyword Pair (CKP) Method 30
3.2.1 Procedure of CKP 30
3.2.2 Definition 31
3.2.3 Creation of co-citation frequency matrix 33
3.2.4 Generation of Pearson’s correlation matrix 34
3.2.5 Generating clusters of complete keyword sets 35
3.2.6 Deriving keyword domains 38
3.2.7 Algorithms of Complete Keyword Pair (CKP) 40
3.2.8 Reducing number of complete keyword pairs with keyword support threshold 41
3.3 Prototyping of CKP Keyword Recommendation System 46
3.3.1 Query expansion 46
3.3.2 CKP Keyword Recommendation System 47
4 EXPERIMENTS 50
4.1 Benchmark of Effectiveness Evaluation: ACM CCS 50
4.2 Experiment I: Experiment with References in Two IS Journals 53
4.2.1 Citation data banks derived from two IS journals 53
4.2.2 Discovering multi-expertise authors by CAP algorithm employed six clustering methods against ACM CCS 53
4.2.3 Measures 56
4.2.3 Evaluation 56
4.2.4 Discussion 57
4.3 Experiment II: Experiment with References in Two ACM Journals 59
4.3.1 Citation data banks derived from two ACM journals 59
4.3.2 Discovering multi-expertise authors by CAP algorithm against ACM CCS 62
4.3.3 Measures 65
4.3.4 Evaluation 66
4.3.5 Discussion 67
4.4 Experiment III: Experiment with References in JACM 69
4.4.1 Citation data bank derived from JACM 69
4.4.2 Discovering bridge-keywords by CKP algorithm against ACM CCS 71
4.4.3 Measures 73
4.4.4 Evaluation 73
4.4.5 Tuning Parameter K in CKP 74
4.4.6 Analyses of length threshold of complete keyword sets 77
4.4.7 Discussion 79
5 CONCLUSION 81
5.1 Research limitation 81
5.2 Conclusion 82
5.3 Contribution 85
5.4 Future work 86
REFERENCES 87
APPENDIX: ACM CCS 93
參考文獻 Ahlgren, P., Jarneving, B. and Rousseau, R. (2003), “Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient,” Journal of the American Society for Information Science and Technology, 54(6), 550–560.
Association for Computing Machinery (ACM) (2007), ACM Computing Classification System toc, http://www.acm.org/about/class.
Association for Computing Machinery (ACM) (2007), The ACM Portal, http://portal.acm.org.
Avancini, H. and Straccia, U. (2004), “Personalization, collaboration, and recommendation in the digital libraryenvironment CYCLADES,” Proceedings of the IADIS Conference on Applied Computing, March 2004, 67–74.
Chang, C.C. and Chen, R.S. (2006), “Using data mining technology to solve classification problems: A case study of campus digital library,” The Electronic Library, 24(3), 307–321.
Chen, H. and Lynch, K.J. (1992), “Automatic construction of networks of concepts characterizing document databases,” IEEE Transactional on Systems, Man, and Cybermetics, 22(5), 885–902.
Chen, H., Ng, T.D., Martinez, J. and Schatz, B.R. (1997), “A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system,” Journal of the American Society for Information Science, 48(1), 17–31.
Ding, Y., Chowdhury, G. and Foo, S. (1999), “Mapping the intellectual structure of information retrieval studies: an author cocitation analysis, 1987–1997,” Journal of Information Science, 25(1), 67–78.
Ding, Y., Chowdhury, G. and Foo, S. (2000), “Organising keywords in a web search environment: a methodology based on co-word analysis,” Proceedings of the 6th International Society for Knowledge Organization (ISKO 6) Conference, 2000, 28–34, Toronto, Canada.
Egghe, L. and Rousseau, R. (1990), Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science, Elsevier Science Publisher, Netherlands.
Eom, S.B. (1996), “Mapping the intellectual structure of research in decision support systems through author cocitation analysis (1971–1993),” Decision Support Systems, 16(4), 315–338.
Fuhr, N., Gövert, N. and Klas, C.P. (2001), “Recommendation in a collaborative digital library environment,” Technical Report, University of Dortmund, Germany.
Gao, X., Murugesan, S. and Lo, B.W.N. (2006), “A simple method to extract key terms,” Int. J. Electronic Business, 4(3/4), 221–238.
Han, J. and Kamber. M. (2006), Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA.
Haruechaiyasak, C., Shyu, M.L. and Chen, S.C. (2005), “A web-page recommender system via a data mining framework and the Semantic Web concept,” Int. J. Computer Applications in Technology, 27(4), 298–311.
He, Y. and Hui, S.C. (2000), “Mining a web citation database for author co-citation analysis,” Information Processing and Management, 38(4), 491–508.
Huang, Y.P., Tsai, C.A., Sandnes, F.E. (2005), “Using association rules for expanding search engine recommendation keywords in English and Chinese queries,” Proceedings of the 8th IASTED International Conference on Intelligent Systems and Control ISC 2005, 465–470, Cambridge, MA, USA.
Johnson, A.G. (1988), Statistics, Harcourt Brace Jovanovich, Orlando, FL.
Kitamura, Y., Nanbu, T. and Tatsumi, S. (1999), “A keyword recommendation system for GenBank,” Genome Informatics, 10, 206–207.
LaBrie, R.C. and St. Louis, R.D. (2006) “Dynamic hierarchies for business intelligence information retrieval,” Int. J. Internet and Enterprise Management, 3(1), 3–23.
Liang, T.P., Yang, Y.F., Chen, D.N. and Ku, Y.C. (2007), “A semantic-expansion approach to personalized knowledge recommendation,” Decision Support Systems, Available online.
Liao, S.H. and Wen, C.H. (2007), “Artificial neural networks classification and clustering of methodologies and applications – literature analysis from 1995 to 2005,” Expert Systems with Applications, 32(1), 1–11.
Lin, X., White, H.D. and Buzydlowski, J. (2003), “Real-time author co-citation mapping for online searching,” Information Processing and Management, 39(5), 689–706.
Lindsey, D. (1980), “Production and citation measures in the sociology of science: the problem of multiple authorship,” Social Studies of Science, 10(2), 145–162.
Lorence, D. and Abraham, J. (2006), “Analysis of semantic search within the domains of uncertainty: using keyword effectiveness indexing as an evaluation tool,” Int. J. Electronic Healthcare, 2(3), 263–276.
Matsuo, Y. and Ishizuka, M. (2004), “Keyword extraction from a single document using word co-occurrence statistical information,” Int. J. on Artificial Intelligence Tools, 13(1), 157–169.
McCain, K.W. (1990), “Mapping authors in intellectual space: a technical overview,” Journal of the American Society for Information Science, 41(6), 433–443.
Nichols, D.M., Twidale, M.B. and Paice, C.D. (1997), “Recommendation and usage in the digital library,” Technical Report CSEG/2/97, Computing Department, Lancaster University, UK.
Olivares-Benitez, E., Rodriguez-Salvador, M. and Scharnweber, D. (2005) “Technology mapping of the scientific research in biomaterials: a trends study of years 2000–2002,” Int. J. Technology Intelligence and Planning, 1(3), 306–324.
Persson, O. (2001), “All author citations versus first author citations,” Scientometrics, 50(2), 339–344.
Roussinov, D. and Zhao, J.L. (2003), “Automatic discovery of similarity relationships through Web mining,” Decision Support Systems, 35(1), 149–166.
Saviotti, P., de Loose, M.-A., Nesta, L. and Maupertuis, M.-A. (2003) “Knowledge dynamics and the mergers of firms in the biotechnology based sectors,” Int. J. Biotechnology, 5(3/4), 371–401.
Schatz, B.R., Johnson, E.H., Cochrane, P.A. and Chen, H. (1996), “Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval,” Proceedings of the 1st ACM International Conference on Digital libraries (Bethesda, MD, March), 1996, 126–133, ACM Press, New York, NY.
Shiri, A.A., Revie, C. and Chowdhury, G. (2002), “Thesaurus-assisted search term selection and query expansion: a review of user-centred studies,” Knowledge organization, 29(1), 1–19.
Tanaka, M., Nakazono, S., Matsuno, H., Tsujimoto, H., Kitamura, Y. and Miyano, S. (2000), “Intelligent system for topic survey in MEDLINE by keyword recommendation and learning text characteristics,” Genome Informatics, 11, 73–82.
Tunali, T. and Zincir-Heywood, N. (2004) “A heuristic approach to network optimized mapping of a distributed resource discovery architecture,” Int. J. Computer Applications in Technology, 19(1), 43–50.
Vezina, R. and Militaru, D. (2004) “Collaborative filtering: theoretical positions and a research agenda in marketing,” Int. J. Technology Management, 28(1), 31–45.
Villarroel, M., Fuente, P., Pedrero, A., Vegas, J. and Adiego, J. (2002), “Obtaining feedback for indexing from highlighted text,” The Electronic Library, 20(4), 306–313.
White, H.D. and Griffith, B.C. (1981), “Author cocitation: a literature measure of intellectual structure,” Journal of the American Society for Information Science, 32(3), 163–171.
White, H.D. and McCain, K.W. (1998), “Visualizing a discipline: an author co-citation analysis of information science, 1972-1995,” Journal of the American Society for Information Science, 49(4), 327–355.
Whittaker, J., Courtial, J.P. and Law, J. (1989), “Creativity and conformity in science: titles, keywords and co-word analysis,” Social Studies of Science, 19(3), 473–496.
Yang, C., Yang, K.C. and Yuan, H.C. (2007), “Improving the search process through ontology-based adaptive semantic search,” The Electronic Library, 25(2), 234–248.
Yang, Y. and Li, J.Z. (2005), “Interest-based recommendation in digital library,” Journal of Computer Science, 1(1), 40–46.
Zhao, D. (2006), “Going beyond counting first authors in author co-citation analysis,” Proceedings 68th Annual Meeting of the American Society for Information Science and Technology (ASIS&T), 42(1).
指導教授 許秉瑜(Ping-Yu Hsu) 審核日期 2009-6-26
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡