中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/81232
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 78818/78818 (100%)
造访人次 : 34816801      在线人数 : 657
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81232


    题名: 利用文字探勘技術比較各旅遊地區熱詞差異 — 以「背包客棧」為例;Mandarin Text Mining in Tourism: A Case Study of Backpackers Forum
    作者: 鍾育東;Chung, Yu-Tung
    贡献者: 資訊管理學系
    关键词: 用戶生成內容;網路爬蟲;文字探勘;詞頻分析;關聯規則;user-generated content;web crawler;text mining;TF-IDF;association rules
    日期: 2019-07-04
    上传时间: 2019-09-03 15:40:01 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著網際網路的蓬勃發展,網路已成為重要的資訊來源,而社群網路及網路論壇的興起更使得用戶生成內容(User-Generated Content)被大量地創造、分享;這些用戶生成內容比以往的網頁內容更獲得其他使用者的信任,並且每個人都可以撰寫自己的評論。另一方面,旅遊一直是需要蒐集大量資料的活動之一,舉凡蒐集景點、交通規劃至周邊美食都仰賴事前的縝密安排。但目前網路上旅遊相關的資訊較為行程導向、缺乏對一地區之整體推薦或是探究一地區相對於其他地區的特色與獨到之處。
    因此,本研究選定台灣著名的旅遊論壇「背包客棧」中不同旅遊地區子版塊的討論內容分析,以比較不同旅遊地區之討論熱詞差異,以期找到各地區獨有的特徵並幫助旅行業者規畫行程時更切合旅客的需求。
    為回答上述研究問題,本研究使用Python爬蟲程式爬取背包客棧中各討論版的文章共計7883篇,並用TF-IDF公式計算各討論版中較常出現之詞組,並比較各討論版的熱詞異同及關聯。
    研究結果有幾下幾點發現。第一,不論地理層級,最常出現的討論主題為其地區內之景點、交通、住宿、金錢及簽證相關話題,顯示這是背包客棧的使用者最常討論的主題。第二,可以藉由出現的關鍵字看出兩地之間的關聯,且聯繫並非雙向關係而是單向的。第三,本研究使用關聯分析與視覺化套件繪出其網狀圖,可由此對不同關鍵詞之間的交互作用有更直觀的了解。
    ;With the rapid growth of the Internet, the world wide web has become the most important source of our daily information. Social media and online forums have enabled online users to express their opinions, which is called "User-Generated Content". These user-generated contents are more likely to gain trust from other users and everyone can write their own posts. On the other hand, tourism has always been one of the most info-heavy tasks. From collecting the scenery spot, planning the transportation, to the food or anything else, all of these rely on a thorough study to arrange a wonderful trip. But nowadays online information about tourist attraction is more route-oriented and lack of overall recommendation or probe into the difference between the two countries in a specific region.

    Thus, this research utilizes the posts on the famous Taiwanese tourist website "Backpackers Forum", comparing the difference of most interesting topic/words in different forum section/geographic area, in expectation to find the unique characteristics of the areas and serve as an insight for travel agency.

    To do so, this research uses Python to write a web crawler to crawl and store the posts on different sections of the forum, and use TF-IDF to calculate the most frequent words/topics and compared with other section to find the different patterns.

    The research has the finding as below. First, regardless of the geographical hierarchy, the most common topics are tourist spot of the region, transportation, lodging, and budget & visa. Second, we can observe the relationship between two locations, and the relation is uni-direction. Third, the research use association rules analysis to visualize the relationship between the words, giving a better understanding of the connection of the topics.
    显示于类别:[資訊管理研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML220检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明