透過POI的過期驗證以持續維護POI資料庫

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：102

、訪客IP：18.118.171.161

姓名

張國斌(CHEONG KUOK PAN) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

透過POI的過期驗證以持續維護POI資料庫
(Sustainable POI Database Maintenance via Outdated POI Verification)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著智慧行動設備的普及率快速提升，查詢店家、地點等POI(Point of Interest)資訊的服務也變成大家的日常所需，提供這種服務的背後需要有一個龐大的POI資料庫。在經過一段時間之後，這些資料庫的POI資料就不一定是最新的。如果使用者得到錯誤的資訊，將會浪費他寶貴的時間。所以如何讓POI資料庫保持在最新的狀態就成了一門關鍵的課題。我們希望透過持續更新資料庫，識別出已經停止營運的POI，從而提供正確的POI資訊。
由於來自黃頁的POI資料庫的資料量過於龐大，很難有效地使用人工的方式進行更新驗證，而政府有大量的開放資料是由眾多業者共同維護的。其中「全國營業(稅籍)登記資料集」和「公司解散登記清冊」可以被我們使用。然而，開放資料集的資料格式與一般的POI資料庫不同也需要小心處理。除此之外，網路上有豐富的資料量可以提供我們使用。利用網路上的資訊，例如網頁更新日期、網路上的聲量等資料來訓練驗證模型，檢測資料庫中可能過期的POI。
在本論文中，我們的系統目標在於在可行的時間內偵測資料庫內過期的POI。方法分為兩個部分。第一部分為政府開放資料的使用，找出POI資料庫與開放資料共同擁有的POI以直接更新其狀態；第二部分則是利用網路資訊訓練POI過期驗證模型，偵測資料庫內已經過期的POI。實驗結果顯示採用Google地圖資訊、與上次有消息的時間差、是否還出現在官網上、描述POI過期的詞彙等資料可達到F度量0.758，透過特徵組合可達到F度量0.91，比起Chuang等人模型提升F度量0.201。

摘要(英)

With the increase usage of mobile phones, the demand of searching POI (Point of Interest), such as store, address, etc., is becoming part of people′s daily life. Providing such services needs a massive POI database. However, the POI information for such a database may change as time passing. It’s annoying for user to get wrong information. How to keep the POI database up to date by continuously identifying outdated POIs and updating the database has become a key issue.
As the POI database grows, it is difficult to effectively use the manual way to verify the data. Yet the government has open data regarding business, e.g. “全國營業(稅籍)登記資料集” and “公司解散登記清冊”. However, the data should be used carefully since the data format of the open data set is different with general POI database used in may service On the other hand, there is rich and available information on the web. Using the information on the web, such as the date that the web page is updated, the volume of POI mentioned on the web, we can train a verification model to detect POIs that may be outdated in the database.
In this paper, our goal is to detect outdated POIs in the database within a feasible time. The approach can be divided to two parts. The first part is using open government information. The second part is using Web information to train a model to detect outdated POI in the database. Experiments show that our performance can achieve 0.758 F-measure (by using google map information, time distance between today and recent publishing date, appear on official website or not, words about outdated POI description), best performance can be reached to 0.91 F-measure by feature combination, it′s higher than Chuang 0.201.

關鍵字(中)

★ 基於位置的服務
★ 興趣點
★ 監督式學習

關鍵字(英)

★ Location-based service
★ Point of interest
★ Supervised learning

論文目次

摘要.....................i
Abstract................ii
目錄.....................iii
圖目錄...................iv
表格目錄..................v
1. 緒論...................1
2. 相關研究................4
2.1 POI資料的比對..........4
2.2 過期POI的偵測方法......4
3. 系統架構與方法..........9
3.1 政府開放資料的利用.....10
3.2 POI驗證模型的建立......11
4. 實驗...................18
4.1 資料集..............18
4.2 評估................21
5 總結.................27
6 未來工作..............28
參考....................29

參考文獻

[1]Chuang, H. M., & Chang, C. H. (2015, May). Verification of poi and location pairs via weakly labeled web data. In Proceedings of the 24th International Conference on World Wide Web (pp. 743-748).
[2]Al-Bahadili, H., Qtishat, H., & Naoum, R. S. (2013). Speeding up the Web Crawling process on a Multi-core processor using Virtualization. International Journal on Web Service Computing, 4(1), 19.
[3]Chuang, H. M., Chang, C. H., & Kao, T. Y. (2014, September). Effective web crawling for chinese addresses and associated information. In International Conference on Electronic Commerce and Web Technologies (pp. 13-25). Springer International Publishing.
[4]Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152). ACM.
[5]Chuang, H. M., Chang, C. H. (2016). POI Extraction and Relation Verification from the Web [Chuang, NCU, PhD Thesis]
[6]Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.
[7]Tran, T., & Cao, T. H. (2013). Automatic Detection of Outdated Information in Wikipedia Infoboxes. Research in Computing Science, 70, 211-222.
[8]Hu, Y., Janowicz, K., & Prasad, S. (2014, November). Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In Proceedings of the 8th workshop on geographic information retrieval (p. 8). ACM.
[9]Lin Y. Y., Chang, C. H. (2014) Store Name Extraction and Name-Address Matching for Geographic Information Retrieval [Lin, NCU, masters Thesis]

指導教授

張嘉惠

審核日期

2017-8-24

推文