利用網頁內容結構之區塊擷取方法以呈現新聞服務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：3.138.141.202

姓名

黃淇(Chi Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用網頁內容結構之區塊擷取方法以呈現新聞服務
(A Block-Extraction Approach to Rendering News Services by Exploiting Web Content Structures)

相關論文

★ 移動代理人監控系統之設計與實作	★ 以正規表式法為基礎之地理編碼服務設計與實作
★ Android應用程式開發之持續整合系統	★ 基於設計矩陣之需求追溯關係建立方法
★ 點對點移動代理人之設計與實作與於車資通訊之應用	★ 網頁內容叢集分類法之設計與實作
★ 設計與實作於行動裝置上以XUL為基礎之介面呈現	★ Android平台上以OSGi為基礎之服務遞送
★ 設計與實作以感測器為中心的查詢機制	★ 針對路徑規劃服務之Web 2.0系統設計與實作
★ 整合OSGi與RESTful服務之BPEL引擎	★ 利用文件相似度以轉換網路內容為OSGi Bundles
★ 量測Java類別的耦合關係	★ 轉換Android應用程式為OSGi Service
★ 物聯網應用之Context塑模方法	★ 從資料到服務之事件驅動方法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

現代人多藉由瀏覽網路獲取資訊。近年行動裝置的普及和效能的進步，由桌上型主機轉為以手持裝置瀏覽網頁已成為新的潮流，手持設備隨時隨地的特性更能即時接收新聞訊息。但手持裝置小巧的體積及精簡的設備卻造成瀏覽網頁的不便，因此桌上型主機瀏覽網頁的方式並不能完全套用到手持設備。在現有的解決方案中，網頁開發者需花費額外時間，設計適合在手持設備瀏覽的網頁介面。
本研究希望提供自動轉譯新聞網站的瀏覽系統，將新聞網站轉換為適合在手持裝置上瀏覽的資訊。系統建立新聞網站目錄，並擷取各個分類的重要新聞。因此使用者能快速掌握所有焦點新聞，且新聞內容在手持設備上的顯示清楚簡單，且資料量減少也降低傳輸頻寬的負擔。
本研究實做在本實驗室開發的Website Browsing System，系統包含Website Parsing System以及Website Rendering System兩部分，Website Parsing System為本研究的實做系統，此系統由Proxy處理Website Rendering System發送的使用者瀏覽請求，Menu Constructor建立網站目錄， Block Extractor擷取各分類的重要新聞，分析結果在Website Rendering System呈現給使用者。

摘要(英)

Nowadays, people search information from Internet by handheld device became a trend. Recently, browsing platform transform from desktop to handheld with their improved performance. The ubiquitous features of handheld device, “any-time” and “any-place”, make user can get information instantly. However, the screen size and the network speed of the device constrain the user browsing feeling. Thus, the browsing way for desktop device is unsuitable on handheld device. In the extant solutions, Web author takes a lot of efforts in preparing multiple versions of Web pages and resources for various platforms. Our solution provides an automatic parsing mechanism, we use a block-extraction approach to rendering news service by exploiting web content structures, to parse each news website to suitable handheld device.

關鍵字(中)

★ 文件物件模型
★ 網頁切割
★ 網頁資訊擷取

關鍵字(英)

★ web content extraction
★ page segmantaiton
★ DOM

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 引言與動機 1
1.2 目的 2
1.3 論文架構 2
第二章背景知識介紹 3
2.1 HTML & CSS 3
2.2 DOM 3
2.3 JavaScript 3
2.4 HtmlUnit 4
2.5 建立網站架構 4
2.6 分割頁面 5
2.7 辨識區塊 6
第三章系統分析設計與實作 7
3.1 系統需求 7
3.2 系統架構 8
3.3 Proxy 10
3.4 Menu Constructor 12
3.4.2 架構設計 13
3.5 Page Segmentation 16
3.5.1 元件分析 16
3.5.2 架構設計 18
3.6 Block Identification 18
3.6.1 元件分析 18
3.6.2 架構設計 21
第四章案例討論 23
4.1 Menu Constructor實驗結果 23
4.2 Block Extractor實驗結果 24
4.3 系統展示 26
第五章文獻探討 29
第六章結論 30
6.1 貢獻 30
6.2 未來展望 30
參考文獻 31

參考文獻

[1] Kosala, R., Bruynooghe, M., Bussche, J. D., & Blockeel, H., ”Information extraction from web documents based on local unranked tree automaton inference,” In: Proceedings of eighteenth international joint conference on artiﬁcial intelligence, 2003
[2] Chao Wang, Jie Lu, Guangquan Zhang, “Mining key information of web pages: A method and its application,” Expert Systems with Applications, 2007
[3] Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma, “a Vision-based Page Segmentation Algorithm,” Microsoft Research Microsoft Corporation One Microsoft Way Redmond, 2003
[4] Zheng Yan, Cheng Xiao-chun, Chen Kai, “Filtering noise in Web pages based on parsing tree,” The Journal of China Universities of Posts and Telecommunications, 2008.
[5] Jing Wang, Zhijing Liu, “A Novel Method for the Web page Segmentation and Identification,” International Conference on Computer Engineering and Technology, 2009
[6] Stephen J.H. Yang, “An Automatic Segment Detection Service for HTML Documents,” Proceedings of the 2008 IEEE International Conference on Services Computing, 2008
[7] Shian-Hua Lin, Jan-Ming Ho, “Discovering Informative Content Blocks from Web Documents,” International Conference on Knowledge Discovery and Data Mining, 2002
[8] Hung-Yu Kao, Shian-Hua Lin, Jan-Ming Ho, Ming-Syan Chen, “Entropy-based link analysis for mining web informative structures,” In Proceedings of the 11th ACM international conference on Information and Knowledge Management, 2002
[9] Chih-Wei Hsu, Chih-Jen Lin, “A comparison of methods for multiclass support vector machines,” IEEE Transactions on Neural Networks, 2002
[10] Precision and Recall。2011年5月，取自
http://en.wikipedia.org/wiki/Precision_and_recall。
[11] HTML。2011年6月，取自
http://en.wikipedia.org/wiki/HTML
[12] Cascading Style Sheets home page。2011年6月，取自
http://www.w3.org/Style/CSS/。
[13] JavaScript。2011年6月，取自
http://en.wikipedia.org/wiki/JavaScript
[14] Dojo Toolkit。2011年6月，取自
http://dojotoolkit.org/
[15] HtmlUnit。2011年6月，取自
http://htmlunit.sourceforge.net/
[16] 張耕輔，「Design and Implementation of Web Content Clustering」，國立中央大學，碩士論文，民國99年。
[17] 鄭致瑋，「Design and Implementation of XUL-Based Rendering for
Mobile Devices」，國立中央大學，碩士論文，民國99年。
[18] 林欣潔，「Design and Implementation of XUL-Based Rendering for Mobile Devices」，國立中央大學，碩士論文，民國100年。

指導教授

李允中(Jonathan Lee)

審核日期

2011-9-20

推文