以Normalized Google Distance辨識學名與別名-以化學物質為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：60

、訪客IP：18.191.150.207

姓名

陳靜儀(Ching-yi Chen) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

以Normalized Google Distance辨識學名與別名-以化學物質為例
(Identifying Alias of Chemical Material based on Normalized Google Distance)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

化學物質名稱複雜多變，很難用幾個關鍵字進行充分描述，而一般的使用者大多不具備化學相關的專業知識，在碰到不懂的化學物質名稱時，通常是透過各種搜尋引擎或線上化學辭典，使用者就可以輕易的取得大量的資訊。然而，在學術上所使用的化學物質名稱大多是從英文翻譯過來，同一化學物質往往會有許多不同的別名，造成在資訊檢索上出現問題。
近期研究提出NGD演算法，利用Google搜尋引擎即時回傳的搜尋結果數，計算兩個字詞之間的抽象距離，進而判斷出兩個字詞的語義相關程度。因此本研究提出兩種方法，辨識化學物質學名與別名的相關程度，”簡易法” 是以化學物質學名與別名，計算兩字詞間的NGD。”類別附加法” 是將化學物質學名加上其分類名稱後，和別名計算NGD。並算出在這兩個方法下，正確答案的平均距離為何，比較兩個方法何者較佳。實驗結果顯示”類別附加法” 以化學物質學名加上其分類名稱後，在Google搜尋引擎能取得較準確的搜尋結果數，使得正確答案的平均距離較短。

摘要(英)

Since Names of Chemical material can be very complex and lay people mostly do not have relevant expertise in chemicals, they usually find related information through search engines or look up an online chemical dictionary. However, the chemical material names used in academy usually translated from English, and the same chemicals often have many different aliases. This English Chinese translation creates many problems when querying information for chemicals.
Recent studies have proposed to use NGD to determine semantic relevance between two words. Therefore, this study proposes to find alias based on NGD with two methods, namely, novel and category affixed methods. The Experimental results show that the latter method can derive better result.

關鍵字(中)

★ NGD
★ 文字探勘

關鍵字(英)

★ NGD
★ Text mining

論文目次

摘要 i
ABSTRACT ii
圖目錄 v
表目錄 vi
一、緒論 1
1-1研究背景 1
1-2研究動機 1
1-3研究目的 2
1-4論文架構 2
二、文獻探討 4
2-1 TF-IDF 4
2-2 同義詞相關研究 4
2-3 Normalized Google Distance(NGD) 6
2-4 文字探勘 9
三、實驗方法 12
3-1實驗設計 12
3-2資料來源 13
3-3資料分類 14
3-4實驗方法 16
四、實驗結果與分析 18
4-1實驗結果 18
五、結論 20
5-1結論 20
5-2研究限制 20
參考文獻 22
附錄一、化學物質分類－純淨物 25
附錄二、平均距離（1） 26
附錄三、平均距離（2） 27

參考文獻

[1] Brachman, R.J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G. and Simoudis, E., “Mining Business Databases”, Communication of the ACM,vol. 39, no. 11, pp.42-48, 1996.
[2] Cilibrasi, R.L., & Vitanyi, P.M.B. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 370 – 383.
[3] Fayyad U., Piatetsky-Shapiro G., Smyth P., Uthurusamy R., From data mining to knowledge discovery: “An Overview. In Advances in Knowledge Discovery and Data Mining.” MIT Press, Cambridge, Mass., 1996, pp.1-36.
[4] Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, pp. 37-54, 1996.
[5] Feldman R., Dagan I., “Knowledge discovery in textual databases(KDT).” Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, 1995, AAAI Press, pp.112-117.
[6] Hu, X., Zhang, X., Lu, C., Park, E. K., & Zhou, X.(2009). Exploiting Wikipedia as external knowledge for document clustering. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, France, 389-396.
[7] J.P. Bagrow, D. ben-Avraham, On the Google-fame of scientists and other populations, AIP Conference Proceedings 779:1(2005), 81–89.
[8] Lancia, F. (2005). Word co-occurrence and theory of meaning. Online: http://www.soc.ucsb.edu/faculty/mohr/classes/soc4/summer_08/pages/Resources/Readings/TheoryofMeaning.pdf
[9] Losiewicz, P., Oard, D. W., &; Kostoff, R. N. (2000). Textual data mining to support science and technology management. Online: http://www.onr.navy.mil/sci_tech/special/technowatch/textmine.htm
[10] Lu, Y., Zhang, C. and Hou, H. (2009). ‘Using Multiple Hybrid Strategies to Extract Chinese Synonyms from Encyclopedia Resource’, Proceedings of the 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC 2009), Kaohsiung, Taiwan, December 7-9, pp. 1089-1093.
[11] Milne, D., Witten, I. H., & Nichols, D. M. (2007). A knowledge-based search engine powered by Wikipedia. Proceedings of the 6th ACM Conference on Information and Knowledge Management, Portugal, 445-454.
[12] Montebello, M.,“Information overload-an IR problem?”,String Processing and Information Retrieval: A South American Symposium, September 1998.
[13] P-I, Chen, and S.-J., Lin, “Automatic keyword prediction using Google similarity distance”,Expert Systems with Applications, 37(3), pp. 1928-1938., 2010.
[14] P.-I, Chen, and S.-J., Lin, “Word AdHoc Network: Using Google Core Distance to extract the most relevant information”,Knowledge-Based Systems., 24 (2011), pp.393–405, 2011.
[15] Rush, J. E., R. Salvador and A. Zamora, “Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria”, Journal of the American Society for Information Science, Vol. 22, No. 3, pp. 260-274, 1971.
[16] Salton, G., and Buckley, C.,“Term-weighting approaches in automatic text retrieval”,Information Processing & Management, 24(5), pp. 513-523, 1988.
[17] Simoudis E., “Reality check for data mining.” 1996, IEEE Expert, (11:5)
[18] Tan, A.-H. (1999), “Text Mining: The state of the art and the challenges”, in Proceedings, PAKDD’99 workshop on Knowledge Discovery from Advanced Databases, Beijing, April, 1999.
[19] T. Pedersen, S. Patwardhan, and J. Michelizzi,“WordNet::Similarity - Measuring the Relatedness of Concepts,” Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), July 25-29, 2004, San Jose, CA, 2004.
[20] Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., & Studer, R. (2006). Semantic Wikipedia. Proceedings of the 15th International Conference on World Wide Web, UK, 585-594.
[21] Yi-Hung Liu, Yen-Liang Chen, Wu-Liang Ho, Predicting associated statutes for legal problems. Inf. Process. Manage. 51(1): 194-211, 2015.
[22] 李浩平，「運用NGD建立適用於使者回饋資訊不足之文件過濾系統」，國立中央大學，碩士論文，民國100年。
[23] 祝亞琪，「運用NGD提升程式碼搜尋品質」，國立中央大學，碩士論文，民國99年。
[24] 楊佩臻，「利用文句關係網路自動萃取文件摘要之研究」，國立中央大學，碩士論文，民國102年。
[25] 鄭奕駿，「離線搜尋Wikipedia以縮減NGD運算時間之研究」，國立中央大學，碩士論文，民國101年。
[26] Chemical Abstracts Service https://www.cas.org/content/chemical-substances/faqs
[27] Google indexed number http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/
[28] Wikipedia化學物質名稱的分類http://zh.wikipedia.org/wiki/Category:%E5%8C%96%E5%AD%A6%E7%89%A9%E8%B4%A8
[29] WordNet http://wordnet.princeton.edu/wordnet/
[30] 國家教育研究院的「雙語詞彙、學術名詞暨詞書資訊網」http://terms.naer.edu.tw/
[31] 中國化工網ChemNet http://cheman.chemnet.com/notices/

指導教授

許秉瑜(Ping-yu Hsu)

審核日期

2015-7-23

推文