博碩士論文 945902011 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:123 、訪客IP:3.128.199.210
姓名 蔡子宸(Tzu-chen Tsai)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 自動偵測HTML語言的語意區塊
(An Automatic Semantic-Segment Detection Method in the HTML Language)
相關論文
★ 應用智慧分類法提升文章發佈效率於一企業之知識分享平台★ 家庭智能管控之研究與實作
★ 開放式監控影像管理系統之搜尋機制設計及驗證★ 資料探勘應用於呆滯料預警機制之建立
★ 探討問題解決模式下的學習行為分析★ 資訊系統與電子簽核流程之總管理資訊系統
★ 製造執行系統應用於半導體機台停機通知分析處理★ Apple Pay支付於iOS平台上之研究與實作
★ 應用集群分析探究學習模式對學習成效之影響★ 應用序列探勘分析影片瀏覽模式對學習成效的影響
★ 一個以服務品質為基礎的網際服務選擇最佳化方法★ 維基百科知識推薦系統對於使用e-Portfolio的學習者滿意度調查
★ 學生的學習動機、網路自我效能與系統滿意度之探討-以e-Portfolio為例★ 藉由在第二人生內使用自動對話代理人來改善英文學習成效
★ 合作式資訊搜尋對於學生個人網路搜尋能力與策略之影響★ 數位註記對學習者在線上學習環境中反思等級之影響
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 21世紀網際網路的使用率已迅速普及化,以及各大場商推出多樣化的可攜式上網設備,讓使用者除了傳統使用個人電腦的設備外,增加了更多的選擇及機會,達到可以隨時隨地使用各種上網設備,在網路上獲得網頁資訊及學習。雖然這些可攜式上網設備的優勢就是輕便、小巧、可移動性強,及功能多樣化,深受大眾喜愛。但是也存在讓人困擾的缺點,如受到螢幕畫面太小會導致呈現網頁內容時,常常出現排版失當及不易閱讀的情況,以及網路頻寬和計算能力較個人電腦差,產生使用者需要花較多的等待時間...等問題。然而內容調適主要技術是分析、拆解原始網頁並且依照使用者的情境狀態、身體狀況及設備等條件,重新為使用者轉換內容,產生量身訂做的網頁,使得調適後的網頁是以更佳的呈現方式,表達作者要傳遞給使用者的資訊。總而言之,內容調適(Content Adaptation)機制是為了彌補各大網站未提供使用者適宜網頁的缺失,進行自動化產生調適網頁的系統。但是系統依照使用者條件調適網頁前,必需先使用正確的編碼拆解網頁。因此,我在論文中提出一套有效率模組化拆解且自動偵測語意區塊的方法,進行識別最小不可再被細分的成員,依照語意關聯性和結構性,自動偵測出語意區塊,作為調適的單位,而單位內的物件要保留語意的同質關聯性、完整的功能性、可讀性,以及呈現位置結構的階層性。然而再以語意區塊為單位進行物件轉換等調適策略,產生適合使用者的網頁,解決等使用者等待下載時間過長及觀看網頁資訊過多導致需要不斷移動畫面所產生的不易閱讀等等的問題。
摘要(英) The amount of information on World Wide Web continues to grow at an astonishing speed increases astonishingly, and then many contents of the web pages are designed for large-sized screen and powerful computation device such as PC and NB so these contents can not fit into the small device, such as personal digital assistants. Additionally, these factors, users’ personal condition and capability of device, can influence the users to successfully understand content of the webpage. In this paper, we propose a mediator system to facilitate the surfing in WWW for users. The main purpose of this system adapts the original content to suitable content for users via Context Aware. We named this system Content Adaptation (CA). In other words, CA system produces the suitable webpage for the user.
CA can be separated into two steps, content decomposing and content re-composing. Because of the content decomposer needs to analyze semantics of HTML language before adapting content for the users’ condition, I focused on the automatic content decomposition in my research. In the decomposition process, I need to use a correct Code-Page to parse the HTML file and structurally consider whole tags and information of HTML, furthermore I developed to analyze the semantic context, architecture, arrangement, structure, and visual effect and split it into a small Semantic Segment (S.S.) that is not being subdivided. S.S. has some important properties, keeping complete function (functionality related), readable typesetting (readability related), relationship of presenting (space and time related), and literary context (semantics related). My experimental results show that I proposed convention of detection semantic segments and developed a page splitting scheme to partition the web page into many smaller semantic segments greatly improve the users’ browsing experiences on a small screen of hand-held devices.
關鍵字(中) ★ 內容調適
★ 語意區塊
★ 調適策略
★ 自動偵測
關鍵字(英) ★ Semantic Segment
★ Content Adaptation
★ Structure Fragment
★ Context Aware
論文目次 摘 要 I
ABSTRACT II
ACKNOWLEDGEMENTS IV
CONTENTS V
LIST OF FIGURES VIII
LIST OF TABLES XI
CHAPTER 1 INTRODUCTION 1
1.1 WHAT IS THE MOTIVATION OF THIS RESEARCH? 1
1.2 WHAT KINDS OF PROBLEMS TO BE SOLVED? 4
1.3 CHARACTERISTICS AND CHALLENGES 6
1.4 HOW TO SOLVE THE PROBLEMS? 7
1.5 CONTRIBUTION OF OUR SOLUTIONS 8
CHAPTER 2 RELATED WORK 9
CHAPTER 3 METHOD AND SOLUTION 17
3.1 ENCODING AND CONVERSION 19
3.1.1 Encoding and conversion methodology 19
3.1.2 Encoding and conversion algorithm 19
3.2 CSS LOCALIZING AND PARSING HTML 20
3.2.1 CSS localizing and parsing HTML methodology 20
3.2.2 CSS localizing algorithm 20
3.3 STRUCTURE-FRAGMENT IDENTIFICATION 21
3.3.1 Structure-fragment identification methodology 21
3.3.1.1 Definition 22
3.4 SEMANTIC-SEGMENT DETECTION 42
3.4.1 Semantic-segment detection methodology 42
3.4.1.1 Definition 42
CHAPTER 4 SYSTEM IMPLEMENTATION 43
4.1 DEVELOPMENT PHILOSOPHY 43
4.2 SYSTEM ARCHITECTURE 44
4.2.1 High-level system design and analysis 49
4.2.2 Low-level system design and Analysis 51
4.3 SYSTEM DEMO 56
4.3.1 Text Level demo 56
4.3.2 Tag Level demo 58
4.3.3 Structure Level demo 58
4.3.4 Semantic Level demo 59
4.4 EXPERIENCE LEARNED FORM THE IMPLEMENTATION 60
CHAPTER 5 EXPERIMENT AND DISCUSSION 63
5.1 BACKGROUND 63
5.2 QUANTITATIVE EVALUATION 64
5.2.1 Performance evaluation 64
5.2.1.1 System Execution time 64
5.2.1.2 Precision 68
5.2.2 Results and lesson learned 70
5.3 QUALITATIVE EVALUATION 78
5.3.1 Questionnaire design and survey 78
5.3.2 Results and lesson learned 79
5.3.2.1 Performance comparison with and without my approach 79
5.3.2.2 Performance comparison with other related approaches 81
CHAPTER 6 CONCLUSION AND FUTURE WORK 83
6.1 CONCLUSIONS 83
6.2 FUTURE WORK 84
REFERENCE 86
參考文獻 [1] D. Buttler and L. Liu, 2001, “A Fully Automated Object Extraction System for the World Wide Web”, In Proceedings of ICDCS-2001, 2001.
[2] D. Raggett. HTML TIDY. http://www.w3.org/People/Raggett/tidy/
W3C® (MIT, ERCIM, Keio)
[3] J.S.F. Hsieh. DOL HTML Parser.
http://www.codeproject.com/useritems/DOL_HTML_Parser.asp Code Project
[4] B. Bos, T. ÇelikIan, I. Hickson, and H.W. Lie, 2006, “Cascading Style Sheets, level 2 revision 1 CSS 2.1 Specification”, W3C® (MIT, ERCIM, Keio), November 2006
[5] W3C, 2004b, “Document Object Model (DOM) Level 3 Core Specification
Version 1.0”, W3C Recommendation, 07 April.
[6] W3C, 2004c, “Extensible Markup Language (XML) 1.1”, W3C Recommendation, 4th February.
[7] W3C, “HTML 4.0 specification”. http://www.w3.org/TR/html4/
[8] W3C, 2001, “XSL Transformations (XSLT) Version 1.1”, W3C Working Draft August. Available at: http://www.w3.org/TR/xslt11/
[9]. W3C, 2004a, “Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 1.0”, W3C Recommendation 15 January.
[10] U. Manber. , 1994, “Finding Similar Files in a Large File System”, In Proceedings of USENIX-1994, January 1994.
[11] J. Mogul. , 1995, “Network Behavior of a Busy Web Server and its Clients”, Technical report, DEC Western Research Laboratories, 1995.
[12] N. Adam and S. Naqvi, 1996, “Universal Access in Digital Libraries”, ACM Computing Surveys, vol. 28, no. 4, Dec.
[13] Bickmore, T.W. and Schilit, B.N. Digestor. , 1997, “Deviceindependent Access to the World Wide Web”, Proc. of the 6th WWW Conference, 1997, pp655-663.
[14] A. Broder. , 1997, “On resemblance and Containment of Documents”, In Proceedings of SEQUENCES-97, 1997.
[15] A. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. , 1997, “Syntactic Clustering of the Web”, In Proceedings of WWW-6, April 1997.
[16] J.R. Smith, R.Mohan, and C.S. Li, 1998, “Transcoding Internet Content for
Heterogeneous Client Devices”, Proc. of IEEE International Conf. On Circuits and Systems. June, Monterey, California, pp. pp. 599-602, May.
[17] R. Han and, P. Bhagwat, 1998, “Dynamic Adaptation in an Image Transcoding Proxy for Mobile Web”, IEEE Personal Communications Magazine, Dec. 1998, pp. 8-17.
[18] Fox, A., Gribble, S.D., et al., 1998, “Adapting to Network and Client Variation Using Infrastructural Proxies: Lessons and Perspectives”, IEEE Personal Communication, V5, I4, 1998, pp10-19.
[19] J. Challenger, A. Iyengar, and P. Dantzig., 1999, “A Scalable System for Consistently Caching Dynamic Web Data”, In Proceedings of IEEE INFOCOM 1999, March 1999.
[20] M. C. Chan and T. W. C. Woo., 1999, “Cache-Based Compaction: A
New Technique for Optimizing Web Transfer”, In Proceedings of INFOCOM-1999
[21] F. Reynolds, J Hjelm, S. Dawkins, and S. Singhal, 1999, “Composite
Capability /Preference Profiles (CC/PP): a User Side Framework for Content Negotiation”, W3C note, 27 July.
[22] R. Mohan, J.R. Smith, and C.S. Li, 1999, “Adapting Multimedia Internet
Content for Universal Access”, IEEE Transactions on Multimedia, Volume 1, No. 1, pp. 104-114.
[23] Hori, M., Kondoh, G., Ono, K., Hirose, S. and Singhal, S., 2000, “Annotation-Based Web Content Transcoding”, Proc. Of WWW-9, Amsterdam, Holland, May 2000.
[24] Yang, Y.D., Chen, J.L. and Zhang, H.J, 2000, “Adaptive Delivery of HTML Contents”, WWW9 Poster Proceedings, May, 2000, pp24-25.
[25] J. Challenger, A. Iyengar, K. Witting, C. Ferstat, and P. Reed, 2000, “Publishing System for Efficiently Creating Dynamic Web”, Content. In Proceedings of IEEE INFOCOM 2000, May 2000.
[26] Buyukkokten, O., Garcia-Molina, H. and Paepcke, A., 2001, “Accordion Summarization for End-Game Browsing on PDAs and Cellular Phones”, Proc. of the SIGCHI Conference on Human Factors in Computing Systems, 2001, pp213-220.
[27] Buyukkokten, O., Garcia-Molina, H. and Paepcke, A., 2001, “Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices”, Proc. of WWW-10, May 1-5, 2001, Hong Kong.
[28] Chen, J.L., Zhou, B.Y., Shi, J., Zhang, H.J. and Wu, Q.F., 2001, “Function-based Object Model Towards Website Adaptation”, Proc. of WWW-10, May 1-5, 2001, Hong Kong.
[29] Rahman, A.F.R., Alam, H., Hartono, R. and Ariyoshi, K., 2001, “Automatic Summarization of Web Content to Smaller Display Devices”, In: Post Presentations of 6th International Conference on Document Analysis and Recognition, Seattle, The United States, Sept. 10-13, 2001.
[30] D. Buttler and L. Liu., 2001, “A Fully Automated Object Extraction System for the World Wide Web”, In Proceedings of ICDCS-2001, 2001.
[31] P. Mohapatra and H. Chen., 2001, “A Framework for Managing QoS and Improving Performance of Dynamic Web Content”, In Proceedings of GLOBECOM-2001, November 2001.
[32] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten , and J.C. Burgelman, 2001,
“ISTAG Advisory Group Report on Scenarios for Ambient Intelligence in 2010”, available on http://www.hltcentral.org/usr_docs/ISTAG-Final.pdf
[33] L. Zhijun and N.D. Georganas, 2001, “Context-based Media Adaptation in
Pervasive Computing”, Electrical and Computer Engineering, Canadian
Conference on Volume 2, May 2001.
[34] N.R. Adam, V. Athuri, I. Adiwiyaya, S. Banerjee, and R. Holowczak, 2001, “A Dynamic Manifestation Approach for Providing Universal Access to Digital Library Objects”, IEEE Transactions on Knowledge and Data Engineering, Volume. 13, No. 4, pp. 705-716, January.
[35] Gu, X.D., Chen, J.L., Ma, W.Y., Chen, G.L., 2002, “Visual Based Content Understanding towards Web Adaptation”, 2nd Intl. Conf. on Adaptive Hypermedia and Adaptive Web Based Systems (Malaga, Spain, May 2002), pp164-173.
[36] Milic-Frayling, N. and Sommerer, R, 2002, “SmartView: Flexible Viewing of Web Page Contents”, Poster paper at the Eleventh World Wide Web Conference, Hawaii, 2002 http://www2002.org/CDROM/poster/172/
[37] Wang, Y.L. and Hu, J.Y., 2002, “A Machine Learning Based Approach for Table Detection on the Web”, Proc. Of WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
[38] H. Bahn, H. Lee, S. H. Noh, S. L. Min, and K. Koh., 2002, “Replica-Aware Caching for Web Proxies”, Computer Communications, 25(3), 2002.
[39] Z. Bar-Yossef and S. Rajagopalan., 2002, “Template Detection via Data Mining and its Applications”, In Proceedings of WWW-2002, May 2002.
[40] K. S. Candan, D. Agrawal, W.-S. Li, O. Po, and W.-P. Hsiung., 2002, “View Invalidation for Dynamic Content Caching in Multi tiered Architectures”, In Proceedings of VLDB-2002, September 2002.
[41] A. Datta, K. Dutta, H. Thomas, D. VanderMeer, Suresha, and K. Ramamritham. , 2002, “Proxy-Based Accelaration of Dynamically Generated Content on the World Wide Web: An Approach and Implementation”, In Proceedings of SIGMOD-2002, June 2002.
[42] T. Kelly and J. Mogul. , 2002, “Aliasing on the World Wide Web: Prevalence and Performance Implications”, In Proceedings of the 11th International World Wide Web Conference, May2002.
[43] L.Q. Chen, X. Xie, Fan X., W.Y. Ma, H.J. Zhang, H.Q. Zhou, and H.Q. Feng, 2002a, “DRESS: A Slicing Tree Based Web Representation for Various Display Sizes”, Technical report MSR-TR-2002-126, Microsoft Research.
[44] L.Q. Chen, X. Xie, X. Fan, W.Y. Ma, H.J. Zhang, and H.Q. Zhou, 2002b, “A Visual Attention Model for Adapting Images on Small Displays”, Technical report MSR-TR-2002-125, Microsoft Research.
[45] T. Lemlouma and N. Layaida, 2002, “Universal Profiling for Content Negotiation and Adaptation in Heterogeneous Environments”, W3C Workshop on Delivery Context. W3C/INRIA Sophia-Antipolis, France, 4-5 March 2002.
[46] T. Phan, G. Zorpas, and R. Bagrodia, 2002, “An Extensible and Scalable Content Adaptation Pipeline Architecture to Support Heterogeneous Clients”, Proceedings of the 22nd International Conference on Distributed Computing Systems, pp. 507-516, Austria.
[47] W.Y. Lum and, F.C.M. Lau, 2002, “A Context-Aware Decision Engine for Content Adaptation”, IEEE Pervasive computing, Volume 1, No.3, pp.41-49.
[48] S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma., 2003, “Improving pseudo-relevance feedback in web information retrieval using web page segmentation” ,In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 11-18, Budapest, Hungary, May 20-24, 2003.
[49]Y. Chen, W.-Y. Ma, and H.-J. Zhang, 2003, “Detecting web pages structure for adaptive viewing on small form factor devices”, In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 225-266, Budapest, Hungary, May 20-24, 2003.
[50] F. Douglis and A. Iyengar., 2003, “Application-Specific Delta Encoding Via Resemblance Detection”, In Proceedings of the USENIX Annual Technical Conference, June 2003.
[51] M. Naaman, H. Garcia-Molina, and A. Paepcke., 2003, “Evaluation of ESI and Class-Based Delta Encoding”, In Proceedings of WCW - 2003.
[52] S. C. Rhea, K. Liang, and E. Brewer., 2003, “Value-Based Web Caching”, In Proceedings of 12th WWW Conference, 2003.
[53] T. Suel, P. Noel, and D. Trendafilov., 2003, “Improved File Synchronization Techniques for Maintaining Large Replicated Collections Over Slow Networks”, In Proceedings of ICDE 2004, March 2004. To appear.
[54] A. Kinno, Y. Yonemoto, T. Nakayama and M. Etoh, 2003, “Environment adaptive XML Transformation and Its Applications to Content Delivery”, In Proceedings of 2003 Symposium on Applications and the Internet (SAINT2003), January.
[55] A. Pashtan, S. Kollipara, and M. Pearce, 2003, “Adapting Content for Wireless Web Service”, IEEE Internet Computing, Volume 7, No. 5, pp. 79-85. 8. F.H. Ernest, 2003, Jess in Action: Java Rule-Based Systems, Manning Publications.
[56] S. Toivonen, J. Kolari, and T. Laakko, 2003, “Facilitating Mobile Users with Contextualized Content”, Artificial Intelligence in Mobile System Workshop, USA.
[57] T. Lemlouma and N. Layaida, 2003, “Adapted Content Delivery for Different Contexts, 2003 Symposium on Applications and the Internet”, Florida, USA, pp 190 – 197.
[58] V.W.M. Kwan, R.C.M. Lau, and C.L. Wang, 2003, “Functionality Adaptation: a Contest-aware Service Code Adaptation for Pervasive Computing Environments”, Web Intelligence (WI 2003), IEEE/WIC International Conference on 13-17 Oct., pp. 358-364.
[59] Y.W. Lee, G. Chandranmenon, and S.C. Miller, 2003, “GAMMA: A Content Adaptation Server for Wireless Multimedia Applications”, Bell-Labs, Technical Report, 2003.
[60] T. Lemlouma and N. Layaida, 2004, “Context-aware Adaptation for Mobile Devices”, 2004 IEEE International Conference on Mobile Data Management, pp. 106–111, USA.
[61] S.J.H. Yang, B.C.D. Wu, and N.W.Y. Shao, 2004, “Content Model applied to HTML Content Adaptation”, 9 th TAAI, Sept.
[62] G. Berhe L. Brunie, and J.M. Pierson, 2004, “Modeling Service-based Multimedia Content Adaptation in Pervasive Computing”, Proceedings of the first conference on computing frontiers on Computing frontiers, pp. 60-69, Ischia , Italy, April.
[63] B. Kurz, I. Popescu, and S. Gallacher, 2004, “FACADE - a Framework for Context-aware Content Adaptation and Delivery”, Second Annual Conference on Communication Networks and Services Research, pp. 46–55, Canada.
[64] D. Wagelaar, 2004, “Towards a Context-Driven Development Framework for Ambient Intelligence”, Proceeding of the 24th International conference on Distributed Computing Systems Workshops, pp. 304-309, Japan.
[65] A. Kinno, H. Yukitomo, and T. Nakayama, 2004, “An Efficient Caching Mechanism for XML Content Adaptation”, the 10th International Multimedia Modeling Conference, pp.308-315.
[66] P. Kulkarni, F. Douglis, J. LaVoie, and J. Tracey. , 2004, “Redundancy Elimination Within Large Collections of Files”, In Proceedings of the USENIX Annual Technical Conference, June 2004. To appear.
[67] J. Mogul, Y. Chan, and T. Kelly. , 2004, “Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP”, In Proceedings of NSDI ’04, March 2004. To appear.
[68] P. L. Emiliani and C. Stephanidis, 2005, “Universal Access to Ambient Intelligence Environments: Opportunities and Challenges for People with Disabilities”, IBM System journal, Volume 44, No.3, pp. 605-619.
[69] S.J.H. Yang, N.W.Y. Shao, and J.Y. Chung, 2005a, “Pervasive Content Access for Service Oriented Mobile Commerce”, Seventh IEEE Conference on E-Commerce Technology, pp. 523-526, Germany.
[70] S.J.H. Yang and N.W.Y. Shao, 2005b, “Enhancing Pervasive Web Accessibility with Rule-Based Adaptation Strategy”, Expert Systems With Applications, 32(4), to be published in August 2005.
[71] L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis, 2005, “Automatic Fragment Detection in Dynamic Web Pages and Its Impact on Caching”, IEEE Transactions on Knowledge and Data engineering, Volume 17, No. 6, pp.859-874.
[72] M.T. Chebbine, A. Obaid, S. Chebbine, and R. Johnston, 2005, “Internet Content Adaptation System for Mobile and Heterogeneous Environment”, Wireless and Optical Communications Networks 2005 (WOCN 2005), Second IFIP International Conference on March 6-8, pp. 346-350.
[73] Stephen J.H. Yang, Jia Zhang, Rick C.S. Chen, and Norman W.Y. Shao, 2007, “A UOI-Based Content Adaptation Method for Improving Web Content Accessibility in the Mobile Internet”, ETRI Journal.
指導教授 楊鎮華(Stephen J.H. Yang) 審核日期 2007-7-5
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明