博碩士論文 944203029 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:36 、訪客IP:18.216.32.116
姓名 張楷鑫(Gush Chang)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 以PAT tree 為基礎發展之快速搜尋索引樹
(noneA fast access indexing tree structure developed based on PAT tree)
相關論文
★ 信用卡盜刷防治簡訊規則製作之決策支援系統★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions★ 以複合名詞為基礎之文件概念建立方式
★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果
★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究★ 探討使用者回饋之半結構化文件字詞特性於檢索文件的應用
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 在網路普及和分散式多媒體運算的時代,資料量的成長一日千里。如何快速的存取資料,是當代迫切的需求。PAT tree 資料結構提供極快速的搜尋能力。藉由它的幫助,我們可以在O(log n)的時間內,執行任意的字首搜尋,不用受限於實際資料量的大小。事實上, 查詢字彙的長度通常小於 log n,所以真正在執行搜尋時,所耗費的時間是決定於查詢字彙的長度。雖然PAT tree 是如此的快速,我們認為它可以更快。我們憑藉PAT tree 的基礎,發展了DPAT tree。它的搜尋模式是隨機存取,不受限於查詢字彙的長度。DPAT tree 是一個trie 資料結構的變形,它能支援任何的語言索引功能,它同時也是一個雜湊表。我們使用了一個編碼程序來產生雜湊值,所有的搜尋都透過此雜湊值來查詢,來達到隨機存取的目的,確保查詢花費時間不會超過O(1)。在Cranfield II 資料測試集的實驗中,平均一個查詢花費25ms,相較於PAT tree 平均所花費的1443ms,它所提供的速度快上許多。
摘要(英) In the era of the Internet and distributed multimedia computing, data increasesrapidly in astonishing rate. How to retrieve data efficiently has become an urgentissue. The PAT tree is a data structure that allows very efficient searching withpreprocessing. We can do arbitrary prefix searching in O(log n) time with PAT tree,independent of the size of the answer. In practice, the length of the query is less thanO(log n), and the searching time is proportional to the query length. To improve thesearching efficiency, we have developed a modified PAT tree called decimal PAT tree(DPAT tree) with which the search could be done in a random access manner,regardless of the length of the query. The DPAT tree structure is a variation of triestructure that can support any language indexing functionality. It also is a hash tablethat maps every sistring to where it resides. We have employed an encoding procedureas the hash function to generate the hash values. The procedure is guaranteed to createa hash table, in which search can be done in random access manner. Thus searchingcould be done in no more than O(1). The experiment tests Cranfield II documentcollection and yields 25ms response time per query in average in contrast to 1443msresponse time per query in average in PAT tree retrieval.
關鍵字(中) ★ 資訊檢索
★ 存取方式 資料結構
關鍵字(英) ★ information retrieval
★ access method
★ Patricia trie
★ data structure
論文目次 Abstract ...................................................................................................... ii
中文摘要................................................................................................... iii
Contents .................................................................................................... iv
List of Tables...............................................................................................v
List of Figures ........................................................................................... vi
1. Introduction..........................................................................................1
2. Related work ........................................................................................3
2.1. Trie.............................................................................................3
2.2. Fast Searching - PAT-Tree .........................................................3
2.3. Compact Patricia Tree (CPAT tree) and Hierarchical Compact
Patricia Tree(HCPAT tree) ...................................................................8
2.4. Chinese Keyword Extraction.....................................................8
3. Decimal PAT tree .................................................................................9
3.1. The search cost in PAT tree........................................................9
3.2. Decimal PAT tree(DPAT).........................................................10
3.3. Encoding ..................................................................................13
3.4. DPAT tree construction and search..........................................14
4. Experiment.........................................................................................16
4.1. Procedure .................................................................................16
4.2. Experimental results ................................................................18
4.3. Discussion................................................................................19
5. Conclusion........................................................................................21
Reference .........................................................................................24
參考文獻 Aoe, J. 1989, "A fast digital search algorithm using a double-array structure", Systems and Computers in Japan, vol. 20, no. 7, pp. 92 - 103.
Aoe, J., Morimoto, K., Shishibori, M. & Park, K.H. 1996, "A trie compaction algorithm for a large set of keys", IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 3, pp. 476-491.
Bayer, R. & Unterauer, K. 1977, "Prefix B-trees", ACM Transactions on Database Systems, vol. 2, no. 1, pp. 11-26.
Blumer, A., Blumer, J., Haussler, D., McConnell, R. & Ehrenfeucht, A. 1987, "Complete inverted files for efficient text retrieval and analysis", Journal of the ACM (JACM), vol. 34, no. 3, pp. 578-595.
Chien, L. 1999, "PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval", Information Processing and Management, vol. 35, no. 4, pp. 501-521.
Chien, L.F. 1997, "PAT-tree-based keyword extraction for Chinese information retrieval", Proceedings of the 20th annual international ACM/SIGIR conference on Research and development in information retrieval, ACM New York, NY, USA, pp. 50-58.
Comer, D. 1979, "The ubiquitous B-tree", ACM Computing Surveys (CSUR), vol. 11, no. 2, pp. 121-137.
De Jonge, W., Tanenbaum, A.S., Van De Riet, R.P. 1987, "Two access methods using compact binary trees", IEEE Transactions on Software Engineering, vol. SE-13, no. 7, pp. 799-810.
Frakes, W.B. & Baeza-Yates, R. 1992, Information retrieval: data structures and algorithms, Prentice-Hall, Inc. Upper Saddle River, NJ, USA.
Gonnet, G.H., Baeza-Yates, R.A. & Snider, T. 1992, "New indices for text: PAT trees and PAT arrays", Prentice-Hall, Inc. Upper Saddle River, NJ, USA.
Jung, M., Shishibori, M., Tanaka, Y. & Aoe, J. 2002, "A dynamic construction algorithm for the Compact Patricia trie using the hierarchical structure", Information Processing and Management, vol. 38, no. 2, pp. 221-236.
Knott, G.D. 1971, "Expandable open addressing hash table storage and retrieval", Proc. ACM SIGFIDET Workshop on Data Description, Access, and Control, ACM New York, NY, USA, pp. 186-206.
Litwin, W. 1981, "Trie hashing", Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM New York, NY, USA, pp. 19-29.
Litwin, W. 1980, "Linear hashing: A new tool for file and table addressing", Proc. of the International Conference on Very Large Database, VLDB Endowment, San Francisco, CA, USA, vol. 6, pp. 212–223.
Litwin, W. 1978, "Virtual hashing: A dynamically changing hashing", Proceedings of the fourth international conference on Very Large Data Bases, VLDB Endowment, West Berlin, Germany, vol. 4, pp. 517-523.
Litwin, W., Roussopoulos, N., Levy, G. & Hong, W. 1991, "Trie hashing with controlled load", IEEE Transactions on Software Engineering, vol. 17, no. 7, pp. 678-691.
Manber, U. & Myers, G. 1990, "Suffix arrays: A new method for on-line string searches", Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 319-327.
Morrison, D.R. 1968, "PATRICIA—practical algorithm to retrieve information coded in alphanumeric", Journal of the ACM (JACM), vol. 15, no. 4, pp. 514-534.
Sprugnoli, R. 1977, "Perfect hashing functions: a single probe retrieving method for static sets", Communications of the ACM, vol. 20, no. 11, pp. 841-850.
指導教授 周世傑 審核日期 2009-7-1
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明