博碩士論文 102522097 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:7 、訪客IP:35.168.110.128
姓名 高霆耀(Ting-yao Kao)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於Web之商家景點擷取與資料庫建置
(Points of Interest Extraction from Unstructured Web)
相關論文
★ 行程邀約郵件的辨識與不規則時間擷取之研究★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討★ 淨化網頁:網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究★ 同性質網頁資料整合之自動化研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著行動裝置的普及,區域搜尋成為了一項新興的熱門服務。然而區域搜尋要能提供完整的服務,必需要讓使用者能夠準確地搜尋到附近的餐廳、旅館、巴士站、卡拉OK、圖書館、藥局等各式各樣食衣住行育樂等地點(Point of Interest, POI),為此我們要建構一個完整地POI資料庫供使用者查詢。近年來因為網際網路的盛行,使用者開始會在他們的部落格或是社交網路上分享旅遊經驗或是POI的資料,同時也有越來越多的商家或組織有自己的官方網頁,並且在網頁上詳細的介紹他們的資料。隨著這類型的網頁越來越多,整個網際網路也成為了最大的POI資訊來源。
在本篇論文中我們提出一個基於Web資訊的POI建置系統,系統可以分為兩大部分,第一部分為包含地址網頁(Address-bearing Page, ABP)的爬取,目的是在網路中尋找ABP,這些網頁中會包含許多POI以及可用來做為檢索的POI相關描述訊息。第二部分為POI擷取系統,透過條件隨機域(Conditional Random Field, CRF)作為學習演算法產生的中文組織名稱辨識模型及中文地址辨識模型,找出網頁中所有出現的地址和組織名稱,接著再將地址與組織名稱配對成POI資料,最後再為每一個POI擷取其相關資訊。
摘要(英) With the increased popularity of mobile devices, local search has become a new popular service. However, a complete local search service have to provide nearby POIs (Point-of-Interest) like stores, shops, gas stations, parking lots, bus stops, drugstore for users. Therefore, we need a powerful POI database to support that. In recent years, the web has become the largest data source of POIs. With the prevalence of Internet, people will share their travel experience and information of POIs that they had been visited on social network, their blogs, and even check-in post. Besides, many companies and organizations publish their business on their own websites. Those webpages contain a large number of POIs.
In this paper, we propose a POI database construction system based on the immense data of the Web. Our system could be separated into two parts: the query-based crawler, the POI extraction system. The goal of query-based crawler is to collect ABP (address-bearing pages) from the web as address is a good indicator of POIs. The second part is POI extraction system. We use CRF (Conditional Random Field) to train a Chinese postal address recognition model and a Chinese organization recognition model. Then POI extraction system extracts addresses and POI names from ABP with these two CRF models and pairs an address and a POI name as a POI. In the end, POI extraction system will extract POI associated information for each POI to construct a complete POI data.
關鍵字(中) ★ 電子地圖
★ 網路爬蟲
★ 資訊擷取
★ POI資料庫
關鍵字(英) ★ electronic map
★ web crawler
★ information extraction
★ POI database
論文目次 中文摘要 i
ABSTRACT ii
Table of Contents iv
List of Figures vi
List of Tables vii
I. INTRODUCTION 1
1.1. General Background Information 1
1.2. Chapter Summary 3
II. RELATED WORK 5
2.1. Crawling 6
2.2. Information Extraction 7
2.3. Geographic Information Retrieval & POI Map Search 8
III. POI DATA CONSTRUCTION SYSTEM 10
3.1. Query-based Crawler 10
3.1.1. Query String 11
3.1.2. Improvement of Crawling Efficiency 12
3.2. POI Extraction Module 13
3.2.1. Address and POI Name Recognition 13
3.2.2. Address and POI Name Pairing 15
3.3. POI Associated Information Extraction 19
IV. EXPERIMENT 22
4.1. Efficiency of Query-based Crawler 22
4.1.1. Comparison of address patterns 22
4.1.2. Improvement by Proxy Server 23
4.1.3. Comparison of Different Crawlers 24
4.2. POI Pairing Accuracy 25
4.3. Performance Evaluation of POI Associated Information 28
V. CONCLUSION & FUTURE WORK 30
REFERENCE 31
參考文獻 [1] D. Ahlers, Business entity retrieval and data provision for yellow pages by local search. Integrating IR technologies for professional search, ECIR, 2013.
[2] D. Ahlers and S. Boll, Location-based Web search. The Geospatial Web, 55-66, Springer, 2007.
[3] S. Chakrabarti, M. Van den Berg and B. Dom, Focused crawling: a new approach to topic specific Web resource discovery, WWW, 1999.
[4] C.-C. Chang, C.-J., LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27):1-27, 2011.
[5] C.-H. Chang, C.-Y. Huang and Y.-S. Su, Chinese Postal Address and Associated Information Extraction,” The 26th Annual Conference of the Japanese Society for Artificial Intelligence, 2012.
[6] J. Cho and H. Garcia-Molina, The evolution of the Web and implications for an incremental crawler, VLDB ’00 Proceedings of the 26th International Conference on Very Large Data Bases, 200-209, 2000.
[7] H.-M. Chuang, C.-H. Chang, Verification of POI and Location Pairs via Weakly Labeled Data. WWW 2015 Workshop, May 18–22, 2015.
[8] J. Foley, M. Bendersky and V.Josifovski, Learning to extract local events from the Web, SIGIR, Chile, August 9-13, 2015.
[9] Y. He, D. Xin, V. Ganti, S. Rajaraman and N. Shah, Crawling deep Web entity pages, International Conference on Web Search and Data Mining, 2013.
[10] Y.-Y. Huang, C.-L. Chou, C.-H. Chang, Web NER Model Generator Tool based on Google Snippets, submitted for publication, 2015.
[11] C. B. Jones and R. S. Purves, Geographical information retrieval, International Journal of Geographical Information Science, 22(3), 219–228, 2008.
[12] Y.-Y. Lin, C.-H. Chang, 網頁商家名稱擷取與 地址配對之研究 (Store Name Extraction and Name-Address Matching on the Web) [In Chinese]. ROCLING, 2014.
[13] Y. Ling, J. Yang and L. He, Chinese organization name recognition based on multiple features, in Pacific Asia conference on Intelligence and Security Informatics, 136-144, 2012.
[14] M. Najork and J. L. Wiener, Breadth-first crawling yields high-quality pages, Proceedings of the 10th international conference on World Wide Web, 114-118, 2001.
[15] C. Olston and M. Najork, Web crawling. Foundations and trends, information retrieval, 4(3), 175-246, 2010.
[16] M. Sanderson and J. Kohler, Analyzing geographic queries, in Workshop on Geographic Information Retrieval (SIGIR), Sheffield, UK, 2004.
[17] P. Serdyukov, V. Murdock and R. V. Zwol, Placing Flickr photos on a map. SIGIR, MA, USA, 2009.
[18] V. Shkapenyuk and T. Suel, Design and implementation of a high-performance distributed Web crawler, Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 2002.
[19] A. Popescu and A. Shabou, Towards precise POI localization with social media. ACM Multimedia Conference, Catalunya, Spain, Oct. 21-25, 2013.
[20] S. Zhang and X. Wang, Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields, Natural Language Processing and Knowledge Engineering, Sheffield, 229-233, 2007.
指導教授 張嘉惠 審核日期 2015-7-29
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明