會議公告網站資訊擷取之研究

DC 欄位	值	語言
DC.contributor	資訊工程學系在職專班	zh_TW
DC.creator	胡姝涵	zh_TW
DC.creator	Shu-Han Hu	en_US
dc.date.accessioned	2006-7-24T07:39:07Z
dc.date.available	2006-7-24T07:39:07Z
dc.date.issued	2006
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=92532007
dc.contributor.department	資訊工程學系在職專班	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	隨著資訊科技的進步，網際網路的快速與便利使得我們漸漸以網頁來取代傳統以紙張為主的資料呈現方式，然而網頁呈現的豐富與多樣化，使得有效擷取有用的資訊成為一項重大的挑戰。資訊擷取（Information Extraction）的技術主要是將非結構化的資料，透過整理、篩選，加以整合成為結構化的資料，最後便可有效的擷取出有用的資訊。資訊擷取的設計，最直接的方法是針對各個網站利用人工撰寫資訊擷取的方式，架構出符合此網站的資訊擷取系統，但由於網站的格式隨時有可能發生變更，或是因應不同作者架構出的網站格式不同，我們都必須修改撰寫不同的資訊擷取程式，這是非常不經濟的。因此，如何利用自動化的方式因應不同的網站格式來擷取網頁資訊，是設計資訊擷取程式最大的目標。自動化的資訊擷取設計，就要仰賴機器學習（Machine Learning）的方式，如何讓電腦具有學習的能力，從以往的經驗學習到知識和擷取規則，使得電腦本身具有擷取正確資訊的能力。　　本篇論文主要針對國際性會議（International Conference）公告網站，擷取來自不同佈告者公告的國際會議資訊，包括會議名稱、會議地點、會議日期和論文接受日期。國際會議內容以純文字為主，加上會議內容的撰寫來自不同的佈告者且為公告性質的網站，內容多為佈告者以簡短的口語來表達並不具結構性，所以在資訊的整合與擷取上有一定的困難度，如何有效的擷取出正確的資訊，本篇論文運用機器學習的方式，讓電腦具有學習的能力，自動擷取來自不同佈告者公告的國際會議資訊，並且有不錯的效果。	zh_TW
dc.description.abstract	With the progress of information technologies, the traditional sheets of paper are replaced by web pages rapidly. The versatilities and abundant contents in the web pages make the extraction of useful information far more difficult than before. Information extraction technology has allowed us to extract such information from non-structural data by means of a series of processes, such as arrangement, distillation and coalition. Due to the potential changes of infra-structure of web pages and the diversities of designers’ personal styles, the most straight-forward but may not so cost effective way is to construct extraction system manually in accordance with the characteristics of individual web site. Therefore, automated extraction is the most wanted goal to achieve. This thesis focuses on the extraction of conference information, such as conference names, locations, dates and accept paper dates, from DB World and international conference web pages. Since the bulletin-type conference web pages are not only text-rich but also written and published orally by different individuals without any structural harmonization, it makes the processes of integration and extraction rigorously. The system which is built on machine learning techniques is creditable and validated to perform well for the extraction of specific fields from cross web site pages.	en_US
DC.subject	資訊擷取	zh_TW
DC.subject	機器學習	zh_TW
DC.subject	Information Extraction	en_US
DC.subject	Machine Learning	en_US
DC.title	會議公告網站資訊擷取之研究	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Conference Information Extraction: Segmentation Base Approach	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 92532007 完整後設資料紀錄