中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/72087
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 41642338      在线人数 : 1425
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/72087


    题名: 從Web擷取興趣點及驗證關係;POI Extraction and Relation Verification from the Web
    作者: 莊秀敏;Chuang,Hsiu-Min
    贡献者: 資訊工程學系
    关键词: 基於位置的服務;興趣點爬取;興趣點關係配對;地理資訊檢索;店名辨識;半監督學習;Location-based service;POI crawling;POI relation pairing;geographic information retrieval;store name recognition;Semi-supervised learning
    日期: 2016-07-26
    上传时间: 2016-10-13 14:25:39 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著行動設備和智慧手機的普及,我們見證了行動應用服務的快速增長,尤其是在地化服務。根據2014年的行動市場調查,地圖/區域搜尋是智慧手機上最常使用的服務之一。興趣點(POI)如商場、店家、加油站、停車場等都是常見的查詢。已存在的地圖服務如Google地圖或Wikimapia都採用人工建置,不論是特定人員手動建立或群眾外包。然而,手動標記對於POI搜尋服務的成本高且數量有限,由於Web上豐富的資訊量,很多商家的POI資訊可以從Web擷取。另一方面,POI關係可能會隨時間改變,因此確保POI資料的正確是關鍵的。當店家搬遷或歇業,可能造成一對多的地址與店家的配對關係。因此,辨識出過期的POI關係對於改善資料庫品質是重要且具挑戰的。
    本文探討兩大問題:(1)POI資料庫的建構與地圖搜尋,(2)POI關係的驗證。在第一個研究中,主要包含了三個工作:POI擷取、POI配對,以及POI搜尋。因此我們提出基於查詢詞的爬蟲策略,尋找可能包含有地址的網頁,以擷取出地址與POI名稱,利用配對模型找出最可能的POI。為了提供有效的POI查詢,我們整合多種搜尋結果來進行排序。在第二個研究中,我們利用網路弱標記資料來訓練驗證模型,偵測資料庫中可能過期的POI配對。我們也分析了不同方法與場景下的效能。目前已建構含有125萬個POI的資料庫,透過Apache Solr的搜尋平台進行POI搜尋服務。實驗結果顯示,我們所提出的POI搜尋效能優於Wikimapia和商業app "What′s the Number?",且與Google Maps的效能相近。對於POI配對的效能顯示,我們提出的方法在Google查詢量充足時,可達到91.1%的F1效能。對於驗證過期POI配對的實驗結果顯示,利用半監督學習方法可改善準確率至72.8%。;With the popularity of mobile devices and smartphones, we have witnessed rapid growth in mobile applications and services, especially in location-based services (LBS). According to the mobile marketing survey in 2014, maps/location searches are among the most utilized services on smartphones. Points of interest (POIs), such as stores, gas stations, and parking lots, are common maps/local searches. Existing map services such as Google Maps and Wikimapia are constructed manually either professionally or with crowd-sourcing. However, manual annotation is costly and limited in current POI search services. With the abundance of information on the Web, many POIs can be extracted from the Web. On the other hand, owing to the fact that POI relations are subject to change over time, it is critical to ensure the accuracy of POI data. When some stores close or move, they often result in one-to-many address-to- store-name pairs. Thus, effectively identifying outdated POI relations is important and challenge for improving the quality of databases.
    We focus on two problems: (1) POI database construction and search on maps, and (2) POI relation verification. For the first study, it contains three tasks: POI extraction, POI pairing, and POI searches. We adopt the query-based crawler to find address-bearing pages which contain addresses and POI names. Moreover, the pairing model is utilized for coupling. To enable POI searches, we integrate multiple search-results for POI ranking. For the second study, the verification model is used to detect outdated POIs in the database via weakly-labeled Web-data. We also analyze the performance with respect to different classifiers and scenarios. We crawled 1.25 million distinct POIs from the Web and implemented a POI search service via Apache Solr platform. The result demonstrated that our performance outperformed Wikimapia and a commercial app called "What′s the Number?" and was close to Google Maps. For POI pairing, the performance can achieve 91.1% F1-measure. In addition, detecting outdated POIs can improve to 72.8% accuracy via tri-training.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML287检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明