博碩士論文 103322088 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:20 、訪客IP:44.220.44.148
姓名 張皓(Hao Chang)  查詢紙本館藏   畢業系所 土木工程學系
論文名稱 地理網路爬蟲:具擴充及擴展性之地理網路資源爬行架構
(GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources)
相關論文
★ 物聯網制動功能之互操作性解決方案★ TDR監測資訊平台之改善與 感測器觀測服務之建立
★ 利用高解析衛星立體像對產製近岸水底地形★ 整合oneM2M 及OGC SensorThings API 標準建立開放式物聯網架構
★ 巨量物聯網資料之多重屬性索引架構★ 高效率異質性時序資料表示法辨別系統
★ A TOA-reflectance-based Spatial-temporal Image Fusion Method for Aerosol Optical Depth Retrieval★ An Automatic Embedded Device Registration Procedure for the OGC SensorThings API
★ 基於本體論與使用者興趣之個人化地理網路搜尋引擎★ 利用本體論整合城市模型及物聯網開放式標準探討智慧城市之應用
★ 運用無人機及影像套合法進行混凝土橋梁裂縫檢測★ GeoRank: A Geospatial Web Ranking Algorithm for a GeoWeb Search Engine
★ 應用高時空解析度遙測影像融合於海水覆蓋率之監測★ LoRaWAN Positioning based on Time Difference of Arrival and Differential Correction
★ 類神經網路逆向工程理解遙測資訊:以Landsat 8植被分類為例★ 基於語意網技術與WordNet促進地理網路資源之探索
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 由於科技的進步,網際網路(World-Wide Web, WWW)發展快速,在過去Web 1.0的時代,使用者只能單方面地接收少數組織機關、管理者所發布的資訊。然而如今已進入Web 2.0時代,所有網際網路使用者皆能夠在網路上分享各種資料或網路服務(web service),成為資料的提供者。在所有使用者都能夠在網路上發布資料或服務的情況下,巨量資料(Big Data)的概念逐漸被重視,巨量資料的三個V特性,分別是資料總量(Volume)、資料產生速度(Velocity)、以及資料多樣性(Variety)。而地理空間資料也符合此3V特性,如此巨量的地理空間資料散布在網際網路各處(即地理網路Geospatial Web, GeoWeb),造成資料搜尋上的困難。如同網際網路有搜尋引擎提供網頁搜尋服務,地理網路亦需搜尋引擎提供使用者快速搜尋地理資源。由於建立搜尋引擎的第一要件為資料蒐集,本研究主要目標為設計一個可擴充及擴展之網路爬蟲(web crawler)架構,GeoWeb Crawler,主動地發掘網路上各式地理空間資料來源。在本研究目前之搜尋目標包含地理網路服務及資料,如Open Geospatial Consortium (OGC)所訂定的Sensor Observation Service (SOS)、Web Map Service (WMS) 、Web Map Tile Service (WMTS)、Web Feature Service (WFS)、Web Coverage Service (WCS)、Web Processing Service (WPS)、Catalogue Service for the Web (CSW)、Keyhole Markup Language (KML)及ESRI 公司所發展的Shapefile檔案。為了提升網路爬行效能,並搜尋大範圍的網際網路覆蓋,我們利用分散式處理來達成水平可擴充性。經過測試,八台電腦同時爬行,速度約提高13倍。此外,一般網路爬蟲被設計為基於超連結爬行,而本研究之爬行架構GeoWeb Crawler可透過開發客製化連結器以搜尋出隱藏在網路服務的地理資源。在地理資源搜尋成果中,本研究針對10種開放標準式資源及3種非開放標準式資源,共蒐集到7,351個地理網路服務及194,003個地理資料集,此數量約為現有方法所蒐集數量的3.8倍至47.5倍。透過統計不同爬行深度的地理資源分布,其成果顯示利用Google搜尋作為爬行起點,確實有利於搜尋地理資源。而當爬行越深,其效益遞減。最後本研究也建立了地理網路搜尋引擎之雛型,GeoHub。根據上述研究成果,本研究提出之GeoWeb Crawler具擴充及擴展性,可提供完整的地理網路資源索引,進而作為地理網路搜尋引擎之基礎。
摘要(英) With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. While geospatial resources are being published at an ever-increasing speed, the "big geospatial data management" issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the globally distributed WWW, users are facing difficulties in finding the resources they need. While the WWW has Web search engines addressing web resource discovery issues, we envision that the geospatial Web (i.e., GeoWeb) also requires GeoWeb search engines for users to efficiently find GeoWeb resources. To realize a GeoWeb search engine, one of the first steps is to proactively discover GeoWeb resources on the WWW. Hence, in this study, we propose the GeoWeb Crawler, an extensible Web crawling framework that can find various types of GeoWeb resources, such as Open Geospatial Consortium (OGC) web services, Keyhole Markup Language (KML) and ESRI Shapefiles. In addition, to promote the performance of the GeoWeb Crawler, we apply the distributed computing concept in the framework to easily scale horizontally. By using 8 machines, we had 13 times performance improvement on the crawling process. Furthermore, while regular web crawlers are ideal for discovering resources with hyperlinks, the GeoWeb Crawler should customize connectors to find the resources hidden behind open or proprietary web services. The result shows that for 10 targeted open-standard-based resource types and 3 non-open-standard-based resource types, the GeoWeb Crawler discovered 7,351 geospatial services, and 194,003 datasets, which are 3.8 to 47.5 times more than what users can find with existing approaches. Based on the crawling level distribution of discovered resources, the result indicates that Google search provide us good seeds to discover resources efficiently. However, the deeper levels we crawl, the more unnecessary effort we spend. Based on the proposed solution, we built the GeoWeb search engine prototype, GeoHub. According to the experimental result, the proposed GeoWeb Crawler framework is proven to be extensible and scalable to provide comprehensive index of GeoWeb.
關鍵字(中) ★ 地理網路
★ 資源搜尋
★ 網路爬蟲
★ 開放地理空間協會
關鍵字(英) ★ Geospatial Web
★ Resource discovery
★ Web crawler
★ Open Geospatial Consortium
論文目次 摘要 i
Abstract ii
致謝 iii
List of Figures vi
List of tables vii
Abbreviation viii
1. Introduction 1
1-1 Big geospatial data 1
1-2 Problems in geospatial resources discovery 2
1-3 Research objective 5
1-4 Content structure of this thesis 5
2. Related works 6
2-1 Existing approaches for GeoWeb resource discovery 6
2-2 Existing GeoWeb resource crawler solutions 8
3. GeoWeb resource distribution and research scope 10
4. Methodology 13
4-1 Workflow 13
4-2 Discovering plain-text URLs with regular expressions 15
4-3 Identification mechanisms for different geospatial resources 15
4-4 Distributed crawling process for scalability 19
4-5 Filtering redundant URLs 20
5. Result and discussion 22
5-1 Geospatial resources crawling results 22
5-2 Comparison between GeoWeb Crawler and existing approaches 23
5-3 Crawling level distribution of discovered resources 26
5-4 Performance comparison between standalone and parallel processing 27
6. Geohub - a GeoWeb search engine prototype 29
6-1 The design of GeoHub 29
7. Conclusions and future work 32
References 34
參考文獻 1. Densham, P.J. Spatial decision support systems. Geographical information systems: Principles and applications 1991, 1, 403-412.
2. Crossland, M.D.; Wynne, B.E.; Perkins, W.C. Spatial decision support systems: An overview of technology and a test of efficacy. Decision support systems 1995, 14, 219-235.
3. Burrough, P.A. Principles of geographical information systems for land resources assessment. 1986.
4. Rao, M.; Fan, G.; Thomas, J.; Cherian, G.; Chudiwale, V.; Awawdeh, M. A web-based gis decision support system for managing and planning usda′s conservation reserve program (crp). Environmental Modelling & Software 2007, 22, 1270-1280.
5. Haklay, M.; Singleton, A.; Parker, C. Web mapping 2.0: The neogeography of the geoweb. Geography Compass 2008, 2, 2011-2039.
6. Lake, R.; Farley, J. Infrastructure for the geospatial web. In The geospatial web, Springer: 2009; pp 15-26.
7. Taylor, B. The world is your javascript-enabled oyster. The official Google blog 2005.
8. Laney, D. 3d data management: Controlling data volume, velocity and variety. META Group Research Note 2001, 6, 70.
9. Liang, S.H.; Huang, C.-Y. Geocens: A geospatial cyberinfrastructure for the world-wide sensor web. Sensors 2013, 13, 13402-13424.
10. Botts, M.; Percivall, G.; Reed, C.; Davidson, J. Ogc® sensor web enablement: Overview and high level architecture. In Geosensor networks, Springer: 2006; pp 175-190.
11. Viceconti, M.; Hunter, P.; Hose, R. Big data, big knowledge: Big data for personalized healthcare. IEEE journal of biomedical and health informatics 2015, 19, 1209-1215.
12. Kuhn, W. Introduction to spatial data infrastructures. Presentation held on March 2005, 14, 2005.
13. Liang, S.; Chen, S.; Huang, C.; Li, R.; Chang, Y.; Badger, J.; Rezel, R. In Capturing the long tail of sensor web, Proceedings of International Workshop on Role of Volunteered Geographic Information in Advancing Science, In conjunction with GIScience 2010, 2010.
14. Chris, A. The long tail: Why the future of business is selling less of more. New York: Hyperion: 2006.
15. Sullivan, D. Major search engines and directories. SearchEngineWatch. com 2004.
16. Hicks, C.; Scheffer, M.; Ngu, A.H.; Sheng, Q.Z. In Discovery and cataloging of deep web sources, Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on, 2012; IEEE: pp 224-230.
17. Page, L.; Brin, S.; Motwani, R.; Winograd, T. The pagerank citation ranking: Bringing order to the web. 1999.
18. Dean, J.; Ghemawat, S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM 2008, 51, 107-113.
19. Mirtaheri, S.M.; Dinçktürk, M.E.; Hooshmand, S.; Bochmann, G.V.; Jourdan, G.-V.; Onut, I.V. A brief history of web crawlers. arXiv preprint arXiv:1405.0749 2014.
20. Lopez-Pellicer, F.J.; Rentería-Agualimpia, W.; Nogueras-Iso, J.; Zarazaga-Soria, F.J.; Muro-Medrano, P.R. Towards an active directory of geospatial web services. In Bridging the geographic information sciences, Springer: 2012; pp 63-79.
21. Bai, Y.; Yang, C.; Guo, L.; Cheng, Q. In Opengis wms-based prototype system of spatial information search engine, Geoscience and Remote Sensing Symposium, 2003. IGARSS′03. Proceedings. 2003 IEEE International, 2003; IEEE: pp 3558-3560.
22. Li, W. In Polarhub: A global hub for polar data discovery, AGU Fall Meeting Abstracts, 2014; p 3665.
23. Bone, C.; Ager, A.; Bunzel, K.; Tierney, L. A geospatial search engine for discovering multi-format geospatial data across the web. International Journal of Digital Earth 2014, 1-16.
24. Sample, J.T.; Ladner, R.; Shulman, L.; Ioup, E.; Petry, F.; Warner, E.; Shaw, K.B.; McCreedy, F.P. Enhancing the us navy′s gidb portal with web services. Internet Computing, IEEE 2006, 10, 53-60.
25. Chen, N.; Liping Di B, G.Y.B.; Chen, Z. Geospatial sensor web data discovery and retrieval service based on middleware. 2008.
26. Li, W.; Yang, C.; Yang, C. An active crawler for discovering geospatial web services and their distribution pattern–a case study of ogc web map service. International Journal of Geographical Information Science 2010, 24, 1127-1147.
27. Lopez-Pellicer, F.J.; Béjar, R.; Florczyk, A.J.; Muro-Medrano, P.R.; Zarazaga-Soria, F.J. In State of play of ogc web services across the web, INSPIRE Conference, 2010.
28. Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 1970, 13, 422-426.
29. Kleene, S.C. Representation of events in nerve nets and finite automata; DTIC Document: 1951.
30. Consortium, O.G. Opengis web map server interface implementation specification. Revision: 2000.
31. de La Beaujardiere, J. Opengis® web map server implementation specification. Open Geospatial Consortium Inc., OGC 2006, 06-042.
32. Na, A.; Priest, M. Sensor observation service. Implementation Standard OGC 2007.
33. Bröring, A.; Stasch, C.; Echterhoff, J. Ogc sensor observation service interface standard. Open Geospatial Consortium Interface Standard 2012, 12-006.
34. Vretanos, P.A. Web feature service implementation specification. Open Geospatial Consortium Specification 2005, 04-094.
35. Vretanos, P.A. Opengis web feature service 2.0 interface standard. Open Geospatial Consortium Inc, Version 2010, 2.
36. Consortium, O.G. Web coverage service (wcs), version 1.0. 0 (corrigendum). JD Evans, Ed 2005.
37. Whiteside, A.; Evans, J.D. Web coverage service (wcs) implementation standard. Open Geospatial Consortium 2008.
38. Baumann, P. Ogc wcs 2.0 interface standard—core. Open Geospatial Consortium Inc., Wayland, MA, USA, OpenGIS® Interface Standard OGC 2010.
39. Maso, J.; Pomakis, K.; Julia, N. Opengis web map tile service implementation standard. Open Geospatial Consortium Inc 2010, 04-06.
40. Schut, P.; Whiteside, A. Opengis web processing service. OGC project document 2007.
41. ESRI, U.; PaperdJuly, W. Esri shapefile technical description. Comput. Stat 1998, 16, 370-371.
42. Nebert, D.; Whiteside, A.; Vretanos, P. Opengis catalogue services specification. Implementation Specification 2007.
43. Broder, A.; Mitzenmacher, M. Network applications of bloom filters: A survey. Internet mathematics 2004, 1, 485-509.
指導教授 黃智遠(Chih-Yuan Huang) 審核日期 2016-7-26
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明