從Web擷取興趣點及驗證關係;POI Extraction and Relation Verification from the Web

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/72087

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/72087

題名:	從Web擷取興趣點及驗證關係;POI Extraction and Relation Verification from the Web
作者:	莊秀敏;Chuang,Hsiu-Min
貢獻者:	資訊工程學系
關鍵詞:	基於位置的服務;興趣點爬取;興趣點關係配對;地理資訊檢索;店名辨識;半監督學習;Location-based service;POI crawling;POI relation pairing;geographic information retrieval;store name recognition;Semi-supervised learning
日期:	2016-07-26
上傳時間:	2016-10-13 14:25:39 (UTC+8)
出版者:	國立中央大學
摘要:	隨著行動設備和智慧手機的普及，我們見證了行動應用服務的快速增長，尤其是在地化服務。根據2014年的行動市場調查，地圖/區域搜尋是智慧手機上最常使用的服務之一。興趣點(POI)如商場、店家、加油站、停車場等都是常見的查詢。已存在的地圖服務如Google地圖或Wikimapia都採用人工建置，不論是特定人員手動建立或群眾外包。然而，手動標記對於POI搜尋服務的成本高且數量有限，由於Web上豐富的資訊量，很多商家的POI資訊可以從Web擷取。另一方面，POI關係可能會隨時間改變，因此確保POI資料的正確是關鍵的。當店家搬遷或歇業，可能造成一對多的地址與店家的配對關係。因此，辨識出過期的POI關係對於改善資料庫品質是重要且具挑戰的。本文探討兩大問題：(1)POI資料庫的建構與地圖搜尋，(2)POI關係的驗證。在第一個研究中，主要包含了三個工作：POI擷取、POI配對，以及POI搜尋。因此我們提出基於查詢詞的爬蟲策略，尋找可能包含有地址的網頁，以擷取出地址與POI名稱，利用配對模型找出最可能的POI。為了提供有效的POI查詢，我們整合多種搜尋結果來進行排序。在第二個研究中，我們利用網路弱標記資料來訓練驗證模型，偵測資料庫中可能過期的POI配對。我們也分析了不同方法與場景下的效能。目前已建構含有125萬個POI的資料庫，透過Apache Solr的搜尋平台進行POI搜尋服務。實驗結果顯示，我們所提出的POI搜尋效能優於Wikimapia和商業app "What′s the Number?"，且與Google Maps的效能相近。對於POI配對的效能顯示，我們提出的方法在Google查詢量充足時，可達到91.1%的F1效能。對於驗證過期POI配對的實驗結果顯示，利用半監督學習方法可改善準確率至72.8%。;With the popularity of mobile devices and smartphones, we have witnessed rapid growth in mobile applications and services, especially in location-based services (LBS). According to the mobile marketing survey in 2014, maps/location searches are among the most utilized services on smartphones. Points of interest (POIs), such as stores, gas stations, and parking lots, are common maps/local searches. Existing map services such as Google Maps and Wikimapia are constructed manually either professionally or with crowd-sourcing. However, manual annotation is costly and limited in current POI search services. With the abundance of information on the Web, many POIs can be extracted from the Web. On the other hand, owing to the fact that POI relations are subject to change over time, it is critical to ensure the accuracy of POI data. When some stores close or move, they often result in one-to-many address-to- store-name pairs. Thus, effectively identifying outdated POI relations is important and challenge for improving the quality of databases. We focus on two problems: (1) POI database construction and search on maps, and (2) POI relation verification. For the first study, it contains three tasks: POI extraction, POI pairing, and POI searches. We adopt the query-based crawler to find address-bearing pages which contain addresses and POI names. Moreover, the pairing model is utilized for coupling. To enable POI searches, we integrate multiple search-results for POI ranking. For the second study, the verification model is used to detect outdated POIs in the database via weakly-labeled Web-data. We also analyze the performance with respect to different classifiers and scenarios. We crawled 1.25 million distinct POIs from the Web and implemented a POI search service via Apache Solr platform. The result demonstrated that our performance outperformed Wikimapia and a commercial app called "What′s the Number?" and was close to Google Maps. For POI pairing, the performance can achieve 91.1% F1-measure. In addition, detecting outdated POIs can improve to 72.8% accuracy via tri-training.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	287	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....