博碩士論文 105423052 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator鍾育東zh_TW
DC.creatorYu-Tung Chungen_US
dc.date.accessioned2019-7-4T07:39:07Z
dc.date.available2019-7-4T07:39:07Z
dc.date.issued2019
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=105423052
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract隨著網際網路的蓬勃發展,網路已成為重要的資訊來源,而社群網路及網路論壇的興起更使得用戶生成內容(User-Generated Content)被大量地創造、分享;這些用戶生成內容比以往的網頁內容更獲得其他使用者的信任,並且每個人都可以撰寫自己的評論。另一方面,旅遊一直是需要蒐集大量資料的活動之一,舉凡蒐集景點、交通規劃至周邊美食都仰賴事前的縝密安排。但目前網路上旅遊相關的資訊較為行程導向、缺乏對一地區之整體推薦或是探究一地區相對於其他地區的特色與獨到之處。 因此,本研究選定台灣著名的旅遊論壇「背包客棧」中不同旅遊地區子版塊的討論內容分析,以比較不同旅遊地區之討論熱詞差異,以期找到各地區獨有的特徵並幫助旅行業者規畫行程時更切合旅客的需求。 為回答上述研究問題,本研究使用Python爬蟲程式爬取背包客棧中各討論版的文章共計7883篇,並用TF-IDF公式計算各討論版中較常出現之詞組,並比較各討論版的熱詞異同及關聯。 研究結果有幾下幾點發現。第一,不論地理層級,最常出現的討論主題為其地區內之景點、交通、住宿、金錢及簽證相關話題,顯示這是背包客棧的使用者最常討論的主題。第二,可以藉由出現的關鍵字看出兩地之間的關聯,且聯繫並非雙向關係而是單向的。第三,本研究使用關聯分析與視覺化套件繪出其網狀圖,可由此對不同關鍵詞之間的交互作用有更直觀的了解。 zh_TW
dc.description.abstractWith the rapid growth of the Internet, the world wide web has become the most important source of our daily information. Social media and online forums have enabled online users to express their opinions, which is called "User-Generated Content". These user-generated contents are more likely to gain trust from other users and everyone can write their own posts. On the other hand, tourism has always been one of the most info-heavy tasks. From collecting the scenery spot, planning the transportation, to the food or anything else, all of these rely on a thorough study to arrange a wonderful trip. But nowadays online information about tourist attraction is more route-oriented and lack of overall recommendation or probe into the difference between the two countries in a specific region. Thus, this research utilizes the posts on the famous Taiwanese tourist website "Backpackers Forum", comparing the difference of most interesting topic/words in different forum section/geographic area, in expectation to find the unique characteristics of the areas and serve as an insight for travel agency. To do so, this research uses Python to write a web crawler to crawl and store the posts on different sections of the forum, and use TF-IDF to calculate the most frequent words/topics and compared with other section to find the different patterns. The research has the finding as below. First, regardless of the geographical hierarchy, the most common topics are tourist spot of the region, transportation, lodging, and budget & visa. Second, we can observe the relationship between two locations, and the relation is uni-direction. Third, the research use association rules analysis to visualize the relationship between the words, giving a better understanding of the connection of the topics.en_US
DC.subject用戶生成內容zh_TW
DC.subject網路爬蟲zh_TW
DC.subject文字探勘zh_TW
DC.subject詞頻分析zh_TW
DC.subject關聯規則zh_TW
DC.subjectuser-generated contenten_US
DC.subjectweb crawleren_US
DC.subjecttext miningen_US
DC.subjectTF-IDFen_US
DC.subjectassociation rulesen_US
DC.title利用文字探勘技術比較各旅遊地區熱詞差異 — 以「背包客棧」為例zh_TW
dc.language.isozh-TWzh-TW
DC.titleMandarin Text Mining in Tourism: A Case Study of Backpackers Forumen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明