摘要: | 隨著行動設備和智慧手機的普及,我們見證了行動應用服務的快速增長,尤其是在地化服務。根據2014年的行動市場調查,地圖/區域搜尋是智慧手機上最常使用的服務之一。興趣點(POI)如商場、店家、加油站、停車場等都是常見的查詢。已存在的地圖服務如Google地圖或Wikimapia都採用人工建置,不論是特定人員手動建立或群眾外包。然而,手動標記對於POI搜尋服務的成本高且數量有限,由於Web上豐富的資訊量,很多商家的POI資訊可以從Web擷取。另一方面,POI關係可能會隨時間改變,因此確保POI資料的正確是關鍵的。當店家搬遷或歇業,可能造成一對多的地址與店家的配對關係。因此,辨識出過期的POI關係對於改善資料庫品質是重要且具挑戰的。 本文探討兩大問題:(1)POI資料庫的建構與地圖搜尋,(2)POI關係的驗證。在第一個研究中,主要包含了三個工作:POI擷取、POI配對,以及POI搜尋。因此我們提出基於查詢詞的爬蟲策略,尋找可能包含有地址的網頁,以擷取出地址與POI名稱,利用配對模型找出最可能的POI。為了提供有效的POI查詢,我們整合多種搜尋結果來進行排序。在第二個研究中,我們利用網路弱標記資料來訓練驗證模型,偵測資料庫中可能過期的POI配對。我們也分析了不同方法與場景下的效能。目前已建構含有125萬個POI的資料庫,透過Apache Solr的搜尋平台進行POI搜尋服務。實驗結果顯示,我們所提出的POI搜尋效能優於Wikimapia和商業app "What′s the Number?",且與Google Maps的效能相近。對於POI配對的效能顯示,我們提出的方法在Google查詢量充足時,可達到91.1%的F1效能。對於驗證過期POI配對的實驗結果顯示,利用半監督學習方法可改善準確率至72.8%。;With the popularity of mobile devices and smartphones, we have witnessed rapid growth in mobile applications and services, especially in location-based services (LBS). According to the mobile marketing survey in 2014, maps/location searches are among the most utilized services on smartphones. Points of interest (POIs), such as stores, gas stations, and parking lots, are common maps/local searches. Existing map services such as Google Maps and Wikimapia are constructed manually either professionally or with crowd-sourcing. However, manual annotation is costly and limited in current POI search services. With the abundance of information on the Web, many POIs can be extracted from the Web. On the other hand, owing to the fact that POI relations are subject to change over time, it is critical to ensure the accuracy of POI data. When some stores close or move, they often result in one-to-many address-to- store-name pairs. Thus, effectively identifying outdated POI relations is important and challenge for improving the quality of databases. We focus on two problems: (1) POI database construction and search on maps, and (2) POI relation verification. For the first study, it contains three tasks: POI extraction, POI pairing, and POI searches. We adopt the query-based crawler to find address-bearing pages which contain addresses and POI names. Moreover, the pairing model is utilized for coupling. To enable POI searches, we integrate multiple search-results for POI ranking. For the second study, the verification model is used to detect outdated POIs in the database via weakly-labeled Web-data. We also analyze the performance with respect to different classifiers and scenarios. We crawled 1.25 million distinct POIs from the Web and implemented a POI search service via Apache Solr platform. The result demonstrated that our performance outperformed Wikimapia and a commercial app called "What′s the Number?" and was close to Google Maps. For POI pairing, the performance can achieve 91.1% F1-measure. In addition, detecting outdated POIs can improve to 72.8% accuracy via tri-training. |