姓名 鍾育東(Yu-Tung Chung)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 利用文字探勘技術比較各旅遊地區熱詞差異 — 以「背包客棧」為例
(Mandarin Text Mining in Tourism: A Case Study of Backpackers Forum)
摘要(中) 隨著網際網路的蓬勃發展,網路已成為重要的資訊來源,而社群網路及網路論壇的興起更使得用戶生成內容(User-Generated Content)被大量地創造、分享;這些用戶生成內容比以往的網頁內容更獲得其他使用者的信任,並且每個人都可以撰寫自己的評論。另一方面,旅遊一直是需要蒐集大量資料的活動之一,舉凡蒐集景點、交通規劃至周邊美食都仰賴事前的縝密安排。但目前網路上旅遊相關的資訊較為行程導向、缺乏對一地區之整體推薦或是探究一地區相對於其他地區的特色與獨到之處。
摘要(英) With the rapid growth of the Internet, the world wide web has become the most important source of our daily information. Social media and online forums have enabled online users to express their opinions, which is called "User-Generated Content". These user-generated contents are more likely to gain trust from other users and everyone can write their own posts. On the other hand, tourism has always been one of the most info-heavy tasks. From collecting the scenery spot, planning the transportation, to the food or anything else, all of these rely on a thorough study to arrange a wonderful trip. But nowadays online information about tourist attraction is more route-oriented and lack of overall recommendation or probe into the difference between the two countries in a specific region.

Thus, this research utilizes the posts on the famous Taiwanese tourist website "Backpackers Forum", comparing the difference of most interesting topic/words in different forum section/geographic area, in expectation to find the unique characteristics of the areas and serve as an insight for travel agency.

To do so, this research uses Python to write a web crawler to crawl and store the posts on different sections of the forum, and use TF-IDF to calculate the most frequent words/topics and compared with other section to find the different patterns.

The research has the finding as below. First, regardless of the geographical hierarchy, the most common topics are tourist spot of the region, transportation, lodging, and budget & visa. Second, we can observe the relationship between two locations, and the relation is uni-direction. Third, the research use association rules analysis to visualize the relationship between the words, giving a better understanding of the connection of the topics.
關鍵字(中) ★ 用戶生成內容
★ 網路爬蟲
★ 文字探勘
★ 詞頻分析
★ 關聯規則
關鍵字(英) ★ user-generated content
★ web crawler
★ text mining
★ association rules
論文目次 目錄
摘要 I
Abstract II
誌謝 III
目錄 IV
圖目錄 V
表目錄 VI
一、 緒論 1
1-1 研究背景與動機 1
1-2 研究目的 3
1-3 研究架構 4
二、 文獻探討 5
2-1 線上評論 5
2-2 網路爬蟲 6
2-3 文字探勘 7
2-4 詞頻分析 9
2-4-1 中文斷詞 10
2-4-2 TF-IDF 10
2-5 關聯規則 12
三、 資料與方法 13
3-1 研究流程 13
3-2 研究資料 14
3-3 研究方法 22
四、 結果與討論 25
4-1 不同區域(洲)之熱詞異同 27
4-2 不同國家間之熱詞異同 29
4-3 單一國家不同地區間之熱詞異同 35
4-4 熱詞出現之關聯規則分析 38
五、 結論與建議 41
5-1 研究結論 41
5-2 研究貢獻 42
5-3 研究限制 43
5-4 研究建議 44
參考文獻 45
指導教授 粟四維(Wesley Shu) 審核日期 2019-7-4
