Associated Information Extraction for Enabling Entity Search on Electronic Map

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/57782

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/57782

題名:	Associated Information Extraction for Enabling Entity Search on Electronic Map
作者:	蘇詠勝;Su,Yueng-Sheng
貢獻者:	資訊工程學系
關鍵詞:	電子地圖;資訊擷取;資訊檢索;Data Record Extraction;Electronic Map;Entity Extraction;Information Retrieval
日期:	2012-09-22
上傳時間:	2012-11-12 14:36:50 (UTC+8)
出版者:	國立中央大學
摘要:	電子地圖自問世並開放民用以來之歷史至今不過十年，其服務卻廣泛程度進入了我們的日常生活當中，同時拜資料檢索技術進步及動態疊加資訊標記所賜，人們開始不再對一個陌生的地方或陌生的地址感到懼怕擔心，這些全仰賴電子地圖背後那強大且豐富的地理資料庫及系統。然而，過去資料庫的建立，主要還是依靠人工來進行。反觀網路世界，其實早有為數不少地址及其對應之相關描述資源，若能自網頁中有效自動擷取出這些地址與相關資訊，便可利用過去已存在的現有資料，實現以自動化擴充現有地理資料庫的構想。我們的系統分為兩個部分，第一部分為資料擷取系統，目的在擷取網頁中可用來做為檢索的地址相關資訊，主要先採用過去既有研究之地址擷取技術，找出地址在網頁樹中的節點位置，接著產生該些節點的路徑，再來藉由比對每個存有地址之節點其路徑中的相異處，找出一具有排它性之最大子樹區域，以該區域內的資料內容做為每筆地址所對應之相關資訊。第二部分為資料檢索系統，針對已擷與出之地址和相關資訊做前處理的斷詞，並過濾其中多數低頻率單字，配合傳統資料檢索演算法，根據查詢單字將查詢結果排序輸出。Electronic map plays an important role today in our daily life. People use it to find out where a restaurant is before they have a dinner with their friends, or use a navigation device to search a location. These all attributed to the geographic system and the technology of information retrieval. In tradition, the construction of geographic database system costs a lot of human efforts and is not efficient. The goal of this research is to make use of the rich information on the web for automatically constructing the geographic database system.Our system could be separated into two parts: the IE (information extracting) subsystem and IR (information retrieval) subsystem. The first one is to extract the associated information of postal addresses from webpages for constructing the geographic database. Based on the state-of-the-art technology for postal address extraction, we locate via the difference of each address’s path and find out a largest disjunctive subtree as the associated information of each address. The second one applies information retrieval technology to rank the associated information based on users’ queries.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	880	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....