dc.description.abstract | With the rapid growth of the Internet, the world wide web has become the most important source of our daily information. Social media and online forums have enabled online users to express their opinions, which is called "User-Generated Content". These user-generated contents are more likely to gain trust from other users and everyone can write their own posts. On the other hand, tourism has always been one of the most info-heavy tasks. From collecting the scenery spot, planning the transportation, to the food or anything else, all of these rely on a thorough study to arrange a wonderful trip. But nowadays online information about tourist attraction is more route-oriented and lack of overall recommendation or probe into the difference between the two countries in a specific region.
Thus, this research utilizes the posts on the famous Taiwanese tourist website "Backpackers Forum", comparing the difference of most interesting topic/words in different forum section/geographic area, in expectation to find the unique characteristics of the areas and serve as an insight for travel agency.
To do so, this research uses Python to write a web crawler to crawl and store the posts on different sections of the forum, and use TF-IDF to calculate the most frequent words/topics and compared with other section to find the different patterns.
The research has the finding as below. First, regardless of the geographical hierarchy, the most common topics are tourist spot of the region, transportation, lodging, and budget & visa. Second, we can observe the relationship between two locations, and the relation is uni-direction. Third, the research use association rules analysis to visualize the relationship between the words, giving a better understanding of the connection of the topics. | en_US |