Associated Information Extraction for Enabling Entity Search on Electronic Map

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.17.129.242

姓名

蘇詠勝(Yueng-Sheng Su) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(Associated Information Extraction for Enabling Entity Search on Electronic Map)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

電子地圖自問世並開放民用以來之歷史至今不過十年，其服務卻廣泛程度進入了我們的日常生活當中，同時拜資料檢索技術進步及動態疊加資訊標記所賜，人們開始不再對一個陌生的地方或陌生的地址感到懼怕擔心，這些全仰賴電子地圖背後那強大且豐富的地理資料庫及系統。然而，過去資料庫的建立，主要還是依靠人工來進行。反觀網路世界，其實早有為數不少地址及其對應之相關描述資源，若能自網頁中有效自動擷取出這些地址與相關資訊，便可利用過去已存在的現有資料，實現以自動化擴充現有地理資料庫的構想。
我們的系統分為兩個部分，第一部分為資料擷取系統，目的在擷取網頁中可用來做為檢索的地址相關資訊，主要先採用過去既有研究之地址擷取技術，找出地址在網頁樹中的節點位置，接著產生該些節點的路徑，再來藉由比對每個存有地址之節點其路徑中的相異處，找出一具有排它性之最大子樹區域，以該區域內的資料內容做為每筆地址所對應之相關資訊。第二部分為資料檢索系統，針對已擷與出之地址和相關資訊做前處理的斷詞，並過濾其中多數低頻率單字，配合傳統資料檢索演算法，根據查詢單字將查詢結果排序輸出。

摘要(英)

Electronic map plays an important role today in our daily life. People use it to find out where a restaurant is before they have a dinner with their friends, or use a navigation device to search a location. These all attributed to the geographic system and the technology of information retrieval. In tradition, the construction of geographic database system costs a lot of human efforts and is not efficient. The goal of this research is to make use of the rich information on the web for automatically constructing the geographic database system.
Our system could be separated into two parts: the IE (information extracting) subsystem and IR (information retrieval) subsystem. The first one is to extract the associated information of postal addresses from webpages for constructing the geographic database. Based on the state-of-the-art technology for postal address extraction, we locate via the difference of each address’s path and find out a largest disjunctive subtree as the associated information of each address. The second one applies information retrieval technology to rank the associated information based on users’ queries.

關鍵字(中)

★ 電子地圖
★ 資訊擷取
★ 資訊檢索

關鍵字(英)

★ Electronic Map
★ Entity Extraction
★ Data Record Extraction
★ Information Retrieval

論文目次

摘要 i
Abstract ii
圖目錄 vi
表目錄 vii
一. 序論 1
1.1. 研究動機 1
1.2. 研究背景 2
1.3. 章節概要 4
二. 相關研究 6
2.1. Entity Extraction 6
2.2. Data Record Extraction 7
2.3. Information Retrieval 11
三. 地址相關資訊擷取系統 13
3.1. 概念描述 13
3.2. 名詞定義 15
3.3. Single Path Algorithm 20
3.4. Multiple Path Algorithm 24
四. 地址相關資訊擷取實驗 29
4.1. 英文地址相關資訊擷取實驗 29
4.2. 中文地址相關資訊擷取實驗 31
五. 地址檢索系統 35
5.1. 資料前處理 35
5.2. 檢索演算法 36
5.3. 實驗與結果分析 38
六. 結論與未來工作 44
參考文獻 46

參考文獻

[1] G. O. Arocena and A. O. Mendelzon, “WebOQL: Restructuring Documents, Databases, and Webs” Int’l Conf. Data Eng (ICDE), pp.24-33, 1998.
[2] Saeid Asadi, Guowei Yang, Xiaofang Zhou, Yuan Shi, Boxuan Zhai, Wendy Wen-Rong Jiang, “Pattern-Based Extraction of Addresses from Web Page Content” APWeb, pp.407-418, 2008.
[3] D. Buttler, L. Liu, and C. Pu, “A Fully Automated Object Extraction System for the World Wide Web” Int’l Conf. Distributed Computing Systems (ICDCS), pp.361-370, 2001.
[4] Deng Cai, Shipeng Yu, Ji-Rong Wen, and Wei-Ying Ma. “Extracting Content Structure for Web Pages Based on Visual Representation” Asia Pacific Web Conf. (APWeb), pp.406-417, 2003.
[5] Lin Can, Zhang Qian, Xiaofeng Meng, Wenyin Lin, “Postal Address Detection from Web Documents” WIRI, pp.40-45, 2005.
[6] C.-H. Chang, C.-N. Hsu, and S.-C. Lui, “Automatic Informatio Extraction from Semi-Structured Web Pages by Pattern Discovery” Decision Support Systems, pp.129-147, 2003.
[7] Chia-Hui Chang and Chia-Yi Huang. “On Chinese Postal Address and Associated Information Extraction” Japanese Society for Artificial Intelligence (JSAI), 2012.
[8] Chia-Hui Chang and Shu-Ying Li. “MapMarker: Extraction of Postal Addresses and Associated Information for General Web Pages” IEEE/WIC/ACM Web Intelligence, pp.105-111, 2010.
[9] V. Crescenzi and G. Mecca, “Grammars Have Exceptions” Information Systems, pp.539-565, 1998.
[10] Thomas G. Dietterich, “Machine Learning for Sequential Data” SSPR/SPR, pp.15-30, 2002.
[11] Dayne Freitag: Information Extraction from HTML, “Application of a General Machine Learning Approach” AAAI/IAAI, pp.517-523, 1998.
[12] J. Hammer, J. McHugh, and H. Garcia-Molina, “Semistructure Data: The TSIMMIS Experience” East-European Workshop Advances in Databases and Information Systems (ADBIS), pp.1-8, 1997.
[13] C. -N. Hsu and M. -T. Dung, “Generating Finite-State Transducer for Semi-Structured Data Extraction from the Web” Information Systems, pp.521-538, 1998.
[14] N. Kushmerick, “Wrapper Induction: Efficiency and Expressiveness” Artificial Intelligence, pp.15-68, 2000.
[15] B. Liu, R. L. Grossman, and Y. Zhai, “Mining Data Records in Web Pages” Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp.601-606, 2003.
[16] L. Liu, C. Pu, and W. Han, “XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources” Int’l Conf. Data Eng. (ICDE), pp.611-621, 2000.
[17] Wei Liu, Xiaofeng Meng, Weiyi Meng. “ViDE: A Vision-Based Approach for Deep Web Data Extraction” Transactions on Knowledge and Data Engineering, IEEE, pp.447-460, 2010.
[18] I. Muslea, S. Minton, and C. A. Knoblock, “Hierarchical Wrapper Induction for Semi-Structured Information Sources” Autonomous Agents and Multi-Agent Systems, vol.4, nos.1/2, pp.93-114, 2001.
[19] P. Nagabhushan, S. A. Angadi, Basavaraj S. Anami, “A Fuzzy Symbolic Inference System for Postal Address Component Extraction and Labelling” FSKD, pp.937-946, 2006.
[20] A. Sahuguet and F. Azavant, “Building Intelligent Web Applications Using Lightweight Wrappers” Data and Knowledge Eng, pp.283-316, 2001.
[21] Zheyuan Yu. “High Accuracy Postal Address Extraction From Web Pages” Master Thesis, Dalhousie University. 2007.
[22] Y. Zhai and B. Liu, “Web Data Extraction Based on Partial Tree Alignment” Proc. Int’l World Wide Web Conf. (WWW), pp.76-85, 2005.
[23] CRF++ Yet Another CRF toolkit : http://crfpp.sourceforge.net/
[24] HTML Tidy : http://tidy.sourceforge.net/
[25] Yahoo API 斷章取義 : http://tw.developer.yahoo.com/cas/
[26] Yahoo!奇摩搜尋引擎 : http://tw.yahoo.com/

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2012-9-24

推文