近年來對於XML技術的應用越來越盛行,目前市面上主流的辦公室應用軟體,像是 OpenOffice.org與Microsoft Office 都已改用XML為其文件的儲存格式。在電子商務上,XML也慢慢成為彼此間資料傳遞的重要格式。因為越來越多的應用採用XML技術,對於此一技術所進行的相關研究也就越來越熱絡。過去XML研究並未有系統的針對多文件進行統合的結構與文字內容萃取,但是因為XML文件資料相關研究非常熱門,所以此研究議題應是非常重要,也值得更多的研究與努力。 本研究將建立一套同時對多份XML文件進行資料整合萃取的工具。萃取出來的資料包括文件結構資料、文件文字內容與文件片段。使得未來相關研究將不再需要處理原始文件,而是直接利用本萃取工具萃取後的資料進行研究。 It is popular with the use of XML in recent years. The main office application software like OpenOffice.org or Microsoft Office has changed into XML for storage form. XML also has become the major format for data exchange in e-commerce gradually. Because of the more use of XML, the studies related to XML are more prevalent. There are not systemic for multi-document extracting structures and contents. Because of the popularity of XML, it is very important and is worth doing studies. This study would establish a tool which extracts data from XML, and the extractives are XML’s structures、contents and fragments. It does not need processing original document anymore, and we could use the extractives doing research.