Active Learning for Incremental POI Extraction and Pairing

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/72281

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/72281

题名:	Active Learning for Incremental POI Extraction and Pairing
作者:	張弘暐;Chang,Hung-Wei
贡献者:	資訊工程學系
关键词:	資料探勘;機器學習;Data Mining;Machine Learning
日期:	2016-08-29
上传时间:	2016-10-13 14:36:45 (UTC+8)
出版者:	國立中央大學
摘要:	隨著網際網路與智慧型行動裝置的快速發展，電子地圖已經成為了我們生活中不可或缺的好幫手。若希望電子地圖能提供高品質的區域搜尋服務，則必須讓使用者能夠精確地搜尋到其所在區域內使用者感興趣的地點(Point of Interest, POI)，包含各類食衣住行育樂等不同類別的商店位置。現今公認最強大的電子地圖莫過於Google Maps，使用者習慣在Google Maps上搜尋POI，但並不是所有使用者想要的POI都能在Google Maps上找到。為此我們勢必得拓展POI的來源，並且建構一個豐富的POI資料庫，以提供使用者查詢。近年來由於社交網站的崛起，使用者常常因著社交網站能夠快速散播資訊的特性，所以在這類網路媒體上分享一些美食資訊、旅遊經驗等等諸如此類的資料。同時商家也會在上面成立官方粉絲團或者官方網頁，詳加介紹店家的產品，以快速增加產品曝光率。這些使用者及店家在網際網路上所提供的資訊，對於探勘新的POI都是很好的來源。在本篇論文中，我們提出一個基於Web資訊的系統，此系統可以大略分為以下三部分。第一部分為地址相關Google snippet的爬取，其爬取的原因為Google snippet當中可能包含豐富的POI相關資訊。第二部分為POI擷取模型，透過Conditional Random Field (CRF) 以及 Conditional Random Field Sharp (CRF Sharp)作為學習演算法，產生的中文地址名稱辨識模型以及中文組織名稱辨識模型，其目的是為找出所有在snippet當中出現過的地址以及組織名稱。第三部分為地址與組織名稱的配對模型，使用LibSVM作為學習演算法,以訓練模型，為地址與組織名稱進行配對。 ;The rapid development of the Internet and mobile smart devices has made the electronic map gradually become a good helper in our lives. If we hope the electronic map can provide a quality Location-Based Service, it must be able to help users accurately find nearby POIs (Point of Interest) in the nearby location, including food, clothing, housing, communications etc. The most powerful electronic map today is Google Maps. Many users are used to search for POIs with it. However, not all user-desired POIs can be found on Google Maps. Therefore, we have to expand the sources of POIs, and build a resourceful database of POIs for user queries. As the rise of social networking in recent years, users often share food information and travel experiences on these media. As the same time, businesses are in favor of setting up official pages to increase the visibility of their products. In this paper, we propose a web-based system, which could be roughly divided into the following three parts. The first part is the crawling of address associated snippets. The second part is the POI extraction model. Through the Conditional Random Field (CRF) and Conditional Random Field Sharp (CRF Sharp) as the learning algorithm. The purpose of this algorithm is to find out all the addresses and POI names in snippets. The third part is the POI pair verification model. The verification model is trained by the LibSVM learning algorithm, paired the address and POI name.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	405	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....