多XML文件整合萃取工具之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：18.227.10.112

姓名

林昌正(Chang-cheng Lin) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

多XML文件整合萃取工具之研究
(An Integrated extraction tool for multi-XML Documents)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來對於XML技術的應用越來越盛行，目前市面上主流的辦公室應用軟體，像是 OpenOffice.org與Microsoft Office 都已改用XML為其文件的儲存格式。在電子商務上，XML也慢慢成為彼此間資料傳遞的重要格式。因為越來越多的應用採用XML技術，對於此一技術所進行的相關研究也就越來越熱絡。過去XML研究並未有系統的針對多文件進行統合的結構與文字內容萃取，但是因為XML文件資料相關研究非常熱門，所以此研究議題應是非常重要，也值得更多的研究與努力。
本研究將建立一套同時對多份XML文件進行資料整合萃取的工具。萃取出來的資料包括文件結構資料、文件文字內容與文件片段。使得未來相關研究將不再需要處理原始文件，而是直接利用本萃取工具萃取後的資料進行研究。

摘要(英)

It is popular with the use of XML in recent years. The main office application software like OpenOffice.org or Microsoft Office has changed into XML for storage form. XML also has become the major format for data exchange in e-commerce gradually. Because of the more use of XML, the studies related to XML are more prevalent. There are not systemic for multi-document extracting structures and contents. Because of the popularity of XML, it is very important and is worth doing studies.
This study would establish a tool which extracts data from XML, and the extractives are XML’s structures、contents and fragments. It does not need processing original document anymore, and we could use the extractives doing research.

關鍵字(中)

★ 延伸標示語言
★ 萃取工具

關鍵字(英)

★ Extraction Tool
★ XML

論文目次

中文摘要...................................... i
英文摘要...................................... ii
目錄..... ..................................... iii
圖目錄........................................ v
表目錄........................................ vi
一、諸論..................................... 1
1-1 研究動機................................. 1
1-2 研究目的................................. 1
二、文獻探討................................. 4
2-1 延伸標示語言............................. 4
2-1-1特性與優點............................... 4
2-1-2XML文件基本內容.......................... 6
2-1-3文件格式定義............................. 7
2-2 XML相關研究.............................. 8
2-2-1XML檢索.................................. 8
2-2-2XML Data Mining.......................... 11
三、研究方法................................. 13
3-1 結構分析演算法........................... 13
3-2 關鍵字矩陣演算法......................... 17
3-3 文件分割................................. 23
3-3-1自動分割演算法........................... 23
3-3-2條件分割演算法........................... 27
四、實證分析................................. 29
4-1 XML資料整合萃取工具介面.................. 29
4-2 執行結果說明............................. 31
4-2-1結構分析................................. 31
4-2-2關鍵字矩陣............................... 33
4-2-3文件分割................................. 35
五、結論與未來研究建議....................... 37
5-1 結論..................................... 37
5-2 未來研究建議............................. 38
參考文獻 .................................... 39

參考文獻

[1] OpenOffice.org，2008年4月30日，取自http://www.openoffice.org/
[2] Introducing the Office (2007) Open XML File Formats，2008年4月30日，取自http://msdn2.microsoft.com/en-us/library/aa338205.aspx
[3] ebXML， 2008年4月30日，取自 http://www.ebxml.org/
[4] Ling Feng, Tharam Dillon, Hans Weigand, and Elizabeth Chang, “An XML-Enabled Association Rule Framework”, LNCS , Vol. 2736,pp. 88-97, September 2003.
[5] INitiative for the Evaluation of XML Retrieval(INEX), 2008年4月30日，取自http://inex.is.informatik.uni-duisburg.de/
[6] Toshiyuki Shimizu, Norimasa Terada and Masatoshi Yoshikawa, “Development of an XML Information Retrieval System for Queries on Contents and Structures”, ICKS 2007, pp.161-168, Kyoto, January 2007.
[7]張真誠、蔡文輝，資料結構設計與C++程式應用，旅標出版股份有限公司，台北市，民國91年。
[8] A. Zisman, “An overview of XML”, Computing & Control Engineering Journal, Vol. 11, pp.165-167, August 2000.
[9]黃冠倫，「XML 文件的內容同等轉換」，東海大學，碩士論文，民國92年。
[10]Torsten Schlieder and Holger Meuss, “ Querying and Ranking XML Documents” , The Journal of American Society for Information Science and Technology, Vol. 53, pp.489-503,May 2002.
[11]Norbert Fuhr and Kai Grobjohann, “XIRQL: An XML Query Language Based on Information Retrieval Concepts”, ACM Transactions on Information Systems, Vol. 22, pp.313-356, April 2004.
[12]Jaap Kamps, Maarten Marx, Maarten de Rijke and Borkur Sigurbjornsson, “XML Retrieval: What to Retrieve? ”, SIGIR’03, Canada, July 2003.
[13]Liang Zuopeng, Hu Kongfa, Ye Ning and Dong Yisheng, “An efficient index structure for XML based on generalized suffix tree”, Information Systems, Vol. 32, pp.283-294, April 2007.
[14]Rebecca J. Cathey, Steven M. Beitzel, Eric C. Jensen, David Grossman and Ophir Frieder, “Using a Relation database for scalable XML search”, The Journal of Supercomputing, Vol. 44,pp.146-178, October 2007.
[15]XQuery，2008年5月05日，取至 http://www.w3.org/TR/xquery/
[16]Daniel Egnor and Robert Lord,"XYZFind–Searching in Context with XML”, ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, July 2000.
[17]Daniel Braga, Alessandro Campi, Stefano Ceri, Mika Klemettinen and Pier Luca Lanzi, “A Tool for Extracting XML Association Rules”, IEEE International Conference on Tools with Artificial Intelligence, pp. 57-64, 2002
[18]Alexandre Termier, Marie-Christine Rousset and Michele Sebag, “TreeFinder : a first step towards XML data mining”, Proceedings of International Conference on Data Mining, pp.450-457,2002.
[19]T. Asai, K. Abe, S. Kawasoe, H. Arimura, H.Sakamoto and S. Arikawa, “Efficient Substructure Discovery from Large Semi-structured Data”, proceedings of the 2nd SIAM International Conference on Data Mining, April 2002.
[20]Ling Chen, Souray S. Bhowmick and Liang-Tien Chia, “FRACTURE mining: mining frequently and concurrently mutating structures from historical xml documents”, Data & Knowledge Engineering, Vol. 59,pp. 320-347,November 2006.
[21] Mong Li Lee, Liang Huai Yang, Wynne Hsu and Xia Yang, “XClust: Clustering XML schemas for effective integration”, Proceedings of the 11th ACM International Conference on Information and Knowledge Management , USA, November 2002.
[22] M. F. Porter, “An Algorithm for Suffix Stripping”, Program, Vol. 14, pp. 130-138, 1980.
[23]張志君，「高效率的跨版本XML文件儲存結構之研究-以OpenOffice.org為例」，中央大學，碩士論文，民國97年。
[24] CeBIT_OOo20.odp， 2008年5月6日，取至 http://de.openoffice.org/files/documents/66/3274/CeBIT_OOo20.odp
[25] CeBIT_OOo_En.odp， 2008年5月6日，取至http://www.ba.ncu.edu.tw/dmerplab/CeBIT_OOo_En.odp

指導教授

許秉瑜(Ping-yu Hsu)

審核日期

2008-6-19

推文