博碩士論文 105423052 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:42 、訪客IP:18.118.184.211
姓名 鍾育東(Yu-Tung Chung)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 利用文字探勘技術比較各旅遊地區熱詞差異 — 以「背包客棧」為例
(Mandarin Text Mining in Tourism: A Case Study of Backpackers Forum)
相關論文
★ 以破壞性創新理論分析中國山寨產業--以手機產業為例★ 初探線上遊戲對未來領導力的影響
★ 研究機構之開放創新模式-以工研院為例★ 影響個人在虛擬社群環境中知識分享因素之探討
★ Wiki使用者與使用行為之研究★ 醫療院所科技化服務創新與組織能力關係之研究
★ 社會網路服務網站的利益—以Facebook為例★ 協同寫作工具對寫作成效的影響
★ 部落格之網路口碑評比機制平台管理與應用★ 虛擬貨幣交易平台之實現
★ 數位匯流創新經營模式研究 - 以台灣電信業者為例★ SNS遊戲影響社會網路服務持續使用之探討
★ 網路團體購物之使用者行為分析★ 探討微網誌使用者持續使用意圖之研究
★ 如何透過Facebook成員轉送線上內容來行銷?★ 臉書看世界,你!Travel了嗎
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著網際網路的蓬勃發展,網路已成為重要的資訊來源,而社群網路及網路論壇的興起更使得用戶生成內容(User-Generated Content)被大量地創造、分享;這些用戶生成內容比以往的網頁內容更獲得其他使用者的信任,並且每個人都可以撰寫自己的評論。另一方面,旅遊一直是需要蒐集大量資料的活動之一,舉凡蒐集景點、交通規劃至周邊美食都仰賴事前的縝密安排。但目前網路上旅遊相關的資訊較為行程導向、缺乏對一地區之整體推薦或是探究一地區相對於其他地區的特色與獨到之處。
因此,本研究選定台灣著名的旅遊論壇「背包客棧」中不同旅遊地區子版塊的討論內容分析,以比較不同旅遊地區之討論熱詞差異,以期找到各地區獨有的特徵並幫助旅行業者規畫行程時更切合旅客的需求。
為回答上述研究問題,本研究使用Python爬蟲程式爬取背包客棧中各討論版的文章共計7883篇,並用TF-IDF公式計算各討論版中較常出現之詞組,並比較各討論版的熱詞異同及關聯。
研究結果有幾下幾點發現。第一,不論地理層級,最常出現的討論主題為其地區內之景點、交通、住宿、金錢及簽證相關話題,顯示這是背包客棧的使用者最常討論的主題。第二,可以藉由出現的關鍵字看出兩地之間的關聯,且聯繫並非雙向關係而是單向的。第三,本研究使用關聯分析與視覺化套件繪出其網狀圖,可由此對不同關鍵詞之間的交互作用有更直觀的了解。
摘要(英) With the rapid growth of the Internet, the world wide web has become the most important source of our daily information. Social media and online forums have enabled online users to express their opinions, which is called "User-Generated Content". These user-generated contents are more likely to gain trust from other users and everyone can write their own posts. On the other hand, tourism has always been one of the most info-heavy tasks. From collecting the scenery spot, planning the transportation, to the food or anything else, all of these rely on a thorough study to arrange a wonderful trip. But nowadays online information about tourist attraction is more route-oriented and lack of overall recommendation or probe into the difference between the two countries in a specific region.

Thus, this research utilizes the posts on the famous Taiwanese tourist website "Backpackers Forum", comparing the difference of most interesting topic/words in different forum section/geographic area, in expectation to find the unique characteristics of the areas and serve as an insight for travel agency.

To do so, this research uses Python to write a web crawler to crawl and store the posts on different sections of the forum, and use TF-IDF to calculate the most frequent words/topics and compared with other section to find the different patterns.

The research has the finding as below. First, regardless of the geographical hierarchy, the most common topics are tourist spot of the region, transportation, lodging, and budget & visa. Second, we can observe the relationship between two locations, and the relation is uni-direction. Third, the research use association rules analysis to visualize the relationship between the words, giving a better understanding of the connection of the topics.
關鍵字(中) ★ 用戶生成內容
★ 網路爬蟲
★ 文字探勘
★ 詞頻分析
★ 關聯規則
關鍵字(英) ★ user-generated content
★ web crawler
★ text mining
★ TF-IDF
★ association rules
論文目次 目錄
摘要 I
Abstract II
誌謝 III
目錄 IV
圖目錄 V
表目錄 VI
一、 緒論 1
1-1 研究背景與動機 1
1-2 研究目的 3
1-3 研究架構 4
二、 文獻探討 5
2-1 線上評論 5
2-2 網路爬蟲 6
2-3 文字探勘 7
2-4 詞頻分析 9
2-4-1 中文斷詞 10
2-4-2 TF-IDF 10
2-5 關聯規則 12
三、 資料與方法 13
3-1 研究流程 13
3-2 研究資料 14
3-3 研究方法 22
四、 結果與討論 25
4-1 不同區域(洲)之熱詞異同 27
4-2 不同國家間之熱詞異同 29
4-3 單一國家不同地區間之熱詞異同 35
4-4 熱詞出現之關聯規則分析 38
五、 結論與建議 41
5-1 研究結論 41
5-2 研究貢獻 42
5-3 研究限制 43
5-4 研究建議 44
參考文獻 45
參考文獻 Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).
[02] Alba, J., Lynch, J., Weitz, B., Janiszewski, C., Lutz, R., Sawyer, A., & Wood, S. (1997). Interactive Home Shopping: Consumer, Retailer, and Manufacturer Incentives to Participate in Electronic Marketplaces. Journal of Marketing, 61(3), 38–53. https://doi.org/10.2307/1251788
[03] Bickart, B., & Schindler, R. M. (2001). Internet forums as influential sources of consumer information. Journal of Interactive Marketing, 15(3), 31–40. https://doi.org/10.1002/dir.1014
[04] Bucur, C. (2015). Using Opinion Mining Techniques in Tourism. Procedia Economics and Finance, 23, 1666–1673. https://doi.org/10.1016/S2212-5671(15)00471-2
[05] Castillo, C. (2005). Effective web crawling. Acm Sigir Forum, 39, 55–56. Acm.
[06] Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. 1, 101. https://doi.org/10.3115/992066.992085
[07] Dellarocas, C. (2003). The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms. Management Science. https://doi.org/10.1287/mnsc.49.10.1407.17308
[08] Eccleston, D. & Griseri, L. (2008) How does Web 2.0 stretch traditional influencing patterns? International Journal of Market Research, 50, 5, pp. 575-590.
[09] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. 18.
[10] Hardey, M. (2011). Generation C: Content, Creation, Connections and Choice. International Journal of Market Research, 53(6), 749–770. https://doi.org/10.2501/IJMR-53-6-749-770
[11] Hennig-Thurau, T., Gwinner, K. P., Walsh, G., & Gremler, D. D. (2004). Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52. https://doi.org/10.1002/dir.10073
[12] Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
[13] Jones, Q., Ravid, G., & Rafaeli, S. (2004). Information Overload and the Message Dynamics of Online Interaction Spaces: A Theoretical Model and Empirical Exploration. Information Systems Research. Retrieved from https://pubsonline.informs.org/doi/abs/10.1287/isre.1040.0023
[14] Kotsiantis, S., & Kanellopoulos, D. (2006). Association Rules Mining: A Recent Overview. GESTS International Transactions on Computer Science and Engineering, Vol. 32, 12.
[15] Liang, N. Y. (1987). CDWS : An automatic word segmentation system for written Chinese texts. Journal of Chinese Information Processing, 1(2), 44–52.
[16] Loh, S., Lorenzi, F., SaldañA, R., & Licthnow, D. (2003). A TOURISM RECOMMENDER SYSTEM BASED ON COLLABORATION AND TEXT ANALYSIS. Information Technology & Tourism, 6(3), 157–165. https://doi.org/10.3727/1098305031436980
[17] Miguéns, J., Baggio, R., & Costa, C. (2008). Social media and Tourism Destinations: TripAdvisor Case Study. Advances in Tourism Research, 6.
[18] Nahm, U. Y., & Mooney, R. J. (2002, March). Text mining with information extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases (pp. 60-67). Stanford CA.
[19] Patil, Y., & Patil, S. (2016). Review of Web Crawlers with Specification and Working. International Journal of Advanced Research in Computer and Communication Engineering, 5(1), 220–223.
[20] Pinkerton, B. (1994). Finding What People Want : Experiences with the WebCrawler. Proc. of the Second International WWW Conference. Retrieved from https://ci.nii.ac.jp/naid/10000036123/
[21] Richins, M. L., & Root-Shaffer, T. (1988). The Role of Evolvement and Opinion Leadership in Consumer Word-Of-Mouth: an Implicit Model Made Explicit. ACR North American Advances, NA-15. Retrieved from http://acrwebsite.org/volumes/6790/volumes/v15/NA-15
[22] Sundaresan, N., & Yi, J. (2000). Mining the Web for relations. Computer Networks, 33(1), 699–711. https://doi.org/10.1016/S1389-1286(00)00085-2
[23] Thelwall, M. (2001). A web crawler design for data mining. Journal of Information Science, 27(5), 319–325. https://doi.org/10.1177/016555150102700503
[24] Ye, Q., Law, R., Gu, B., & Chen, W. (2011). The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings. Computers in Human Behavior, 27(2), 634–639. https://doi.org/10.1016/j.chb.2010.04.014
[25] Yoke Chun, T. (1999). World wide web robots: an overview. Online and CD-Rom Review, 23(3), 135–142. https://doi.org/10.1108/14684529910334047
[26] Backpackers.com.tw Competitive Analysis, Marketing Mix and Traffic. (n.d.). Retrieved from https://www.alexa.com/siteinfo/backpackers.com.tw
[27] Backpackers.com.tw Traffic Statistics. (n.d.). Retrieved from https://www.similarweb.com/website/backpackers.com.tw#overview
[28] 如何使用 jieba 結巴中文分詞程式 (2014) Retrieved from https://blog.fukuball.com/如何使用-jieba-結巴中文分詞程式/
[29] 結巴中文斷詞台灣繁體版本 (n.d.). Retrieved from https://github.com/ldkrsi/jieba-zh_TW
[30] tf-idf - 維基百科,自由的百科全書 (n.d.). Retrieved from https://zh.wikipedia.org/wiki/Tf-idf
[31] How to process textual data using TF-IDF in Python (2018) Retrieved from https://medium.freecodecamp.org/how-to-process-textual-data-using-tf-idf-in-python-cd2bbc0a94a3
[32] #327 Network from correlation matrix (n.d.). Retrieved from https://python-graph-gallery.com/327-network-from-correlation-matrix/
[33] [關聯分析] Apriori演算法介紹 (附Python程式碼) (2018) Retrieved from https://www.maxlist.xyz/2018/11/03/python_apriori/
[34] Create a rotating proxy crawler in Python 3 (2017) Retrieved from https://codelike.pro/create-a-crawler-with-rotating-ip-proxy-in-python/
指導教授 粟四維(Wesley Shu) 審核日期 2019-7-4
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明