應用概念萃取於多罪中文判決書之索引

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：84

、訪客IP：18.221.4.52

姓名

邢台平(Tai-Ping Hsing) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

應用概念萃取於多罪中文判決書之索引
(Applying Concept Extraction to the Indexing of Chinese Written Judgment Containing Several Offenses)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

法學資訊系統的相關研究已經發展數十年，而類似案件的搜尋是法律資訊學的一個重要議題。當一個法律專業人員或一般民眾遇到一件刑事案件，他們可能會有急迫的需要去尋找類似的案例作為參考。法律專業人員或一般民眾以往查詢法律案件資料庫，習慣於查詢司法院或法學專用的判決書查詢系統去尋找類似案件的判決書，目前的判決書查詢系統多應用全文檢索技術及布林邏輯模式所發展。對於刑事案件，當法官或法律專業人員遇到的刑事案件觸犯多罪，使用者可能會提供更多的查詢關鍵詞，並結合布林運算子去查詢，然而低查詢正確率及過多不正確的案件被檢索出來，使用者必須以人工的方式耗費大量的時間去過濾。為了克服使用者資訊負載過重的問題，其中一種解決方法是藉由概念萃取的應用去改善文件向量的表述方式。概念萃取技術可以從一些相關的語詞萃取出一個概念，它可以避免一些出現很頻繁但不重要的語詞所產生的雜訊，並且把文件由語詞向量轉變成為概念向量去減少向量的維度，並可從文件中萃取出特定的資訊。因此，本研究的目的是去發展概念萃取方法，從刑事案件的判決書中萃取出罪刑的概念，並且利用被萃取的罪刑概念修正刑事案件判決書的向量表述方式。基於關聯規則、遺傳基因演算法及資料包絡分析的概念萃取技術，本研究發展了4個概念萃取的方法，以及基於這4種方法後續的案件索引程序。為了測試本研究所提出方法的適用性，我們進行了3個實驗。第1個實驗比較了這4個概念萃取方法的判決書檢索效率，第2個實驗則是將第1個實驗裏的4個方法，與3個一般常用的文件索引方法，進行檢索效率的比較。第3個實驗去驗證在測試資料集所包含的罪刑數量，是否會影響到這4個概念萃取方法的檢索效率。實驗1與2的結果顯示，我們所提出的方法中，最佳的方法使用了TLCEF及GAWF二個功能的組合，而提出的4個方法在檢索效率上均優於被比較的3個文件索引的方法。實驗的結果顯示測試集所包含的罪刑數量由21個減少至10個，所得到的檢索效率有顯著的提升。

摘要(英)

Many legal information systems have been developed in the past few decades. Similar cases search is an important research issue in legal domain. When law professionals or the general public encounter an instance, they might be interested in looking for the similar cases for reference. Law professionals or the general public used to search a database of legal cases by using the current judgment retrieval systems based on full-text search technique and Boolean-logic model. In the case that law professionals or the general public encounter a criminal case in which criminal activities involve several articles, the user must enter more terms in the query, and these systems often respond with many cases and some of them could be only marginally relevant to the query. To solve the overloading problem of information retrieval, one of the solutions is to improve the vector representation of document by the application of concept extraction. Concept extraction can generalize a concept from some related terms while reducing noises sourced from frequent but unimportant terms, and transfer term vectors into concept vectors to reduce the dimension of vector and extract the specific information from document. Thus, this study is aimed to develop concept-extraction methods to extract the concept of offense from the criminal judgment and to modify the criminal case vector by means of the extracted concepts instead of occurrence of words. Based on concept-extraction techniques of association rule, genetic algorithm, and data envelopment analysis, we have developed four concept-extraction methods, on which, four case-indexing processes are subsequently conceived respectively. To test the applicability of the proposed four methods, this study conducts three experiments. The first experiment compares the retrieval performances among the proposed four methods. The second experiment tests whether the proposed four methods outperform the general indexing schemes. The third experiment confirms whether the cardinality of offense type in test set affects the retrieval performances of the proposed four methods. The first experiments shows that the best of the proposed four methods is the one based on the proposed functions of TLCEF and GAWF in combination. The second experiment shows that all of the proposed four methods outperform general indexing schemes. The last experiment shows that when the cardinality of offense types in test set is reduced from 21 to 10, the retrieval performances of the four methods show a significant increase.

關鍵字(中)

★ 法學資訊系統
★ 法律文件索引
★ 概念萃取
★ 遺傳基因演算法
★ 關聯規則
★ 資料包絡分析

關鍵字(英)

★ Legal information systems
★ concept extraction
★ information retrieval
★ genetic algorithm
★ legal case-indexing

論文目次

1. Introduction 1
1.1 Research background and motivation 1
1.2 Research goal 4
1.3 Research scope and limitations 9
1.4 Research processes 10
1.5 Dissertation organization 12
2. Literature Review 13
2.1 Information retrieval 13
2.2 Related researches on legal information system 16
2.3 Representation of legal document 19
2.4 Commercial legal information retrieval system 21
2.5 Concept extraction 22
2.5.1 Association rule 25
2.5.2 Latent semantic analysis 26
2.5.3 Limitation of existing concept-extraction methods 28
3. System Design and Methodology . 29
3.1 Phase 1: Concept-extraction phase 32
3.1.1 Preprocessing procedure 33
3.1.2 Feature selection process for the concept of offense 38
3.1.3 Feature weighting process for the concept of offense 47
3.1.4 Four concept-extraction methods 63
3.2 Phase 2: Indexing phase 67
3.3 Phase 3: Simulated query phase 71
4. Evaluations 76
4.1 Comparing the retrieval performance levels of the four concept-extraction methods 76
4.1.1 Choice of parameter values for the four methods 76
4.1.2 Experimental results . 81
4.2 Comparing the four concept-extraction methods with three general documentindexing schemes 84
4.2.1 Implementation details for TF-IDF scheme 85
4.2.2 Implementation details for ARBIS 85
4.2.3 Implementation details for LSA 85
4.2.4 Experimental results . 86
4.3 Evaluations on the effects of reduced number of offenses 89
4.3.1 Details of implementation 89
4.3.2 Experimental results 96
4.4 Discussions 100
5. Conclusions and Future Works 104
5.1 Conclusions 104
5.2 Contributions 104
5.3 Future Works 105
References 107
Appendixes 112

參考文獻

[1]Information about the International Conferences on Artificial Intelligence and Law between1987 and 2007. Available: www.sigmod.org/sigmod/dblp/db/conf/icail/
[2]K. Al-Kofahi, A. Tyrrell, A. Vachher, and P. Jackson, "A machine learning approach to prior case retrieval," in Proceedings of the 8th international conference on Artificial intelligence and law, 2001, pp. 88-93.
[3]E. Schweighofer, G. Haneder, A. Rauber, and M. Dittenbach, "Improvement of Vector Representation of Legal Documents with Legal Ontologies," Proceedings of the 5th BIS, Poznan University of Economics Press, Poznan, 2002.
[4]M.-F. Moens and R. Angheluta, "Concept extraction from legal cases: the use of a statistic of coincidence," presented at the Proceedings of the 9th international conference on Artificial intelligence and law, Scotland, United Kingdom, 2003.
[5]T. M. Mitchell, "Machine learning. 1997," Burr Ridge, IL: McGraw Hill, vol. 45, 1997.
[6]C. D. Manning and H. Schütze, Foundations of statistical natural language processing: MIT press, 1999.
[7]M.-F. Moens, Automatic indexing and abstracting of document texts vol. 6: Springer, 2000.
[8]H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," Knowledge and Data Engineering, IEEE Transactions on, vol. 17, pp. 491-502, 2005.
[9]M. N. Ribeiro, M. J. R. Neto, and R. B. C. Prudêncio, "Local feature selection in text clustering," in Advances in Neuro-Information Processing, ed: Springer, 2009, pp. 45-52.
[10]Y. Wang and N. Ishii, "Learning Feature Weights for Similarity Measures."
[11]R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval. New York Harlow, England: ACM Press ; Addison-Wesley, 1999.
[12]G. Salton and M. J. McGill, "Introduction to modern information retrieval," 1986.
[13]Y.-H. Liu, Y.-L. Chen, and W.-L. Ho, "Predicting associated statutes for legal problems," Information Processing & Management, vol. 51, pp. 194-211, 2015.
[14]G. Salton, "Automatic text processing: The transformation, analysis, and retrieval of," Reading: Addison-Wesley, 1989.
[15]S. E. Robertson and K. S. Jones, "Relevance weighting of search terms," Journal of the American Society for Information science, vol. 27, pp. 129-146, 1976.
[16]N. J. Belkin and W. B. Croft, "Information filtering and information retrieval: Two sides of the same coin?," Communications of the ACM, vol. 35, pp. 29-38, 1992.
[17]A. Smeaton, Information retrieval and hypertext: Springer Science & Business Media, 2012.
[18]J. R. Cowie, Y. Ludovik, H. Molina-Salgado, S. Nirenburg, and S. Sheremetyeva, "Automatic Question Answering," pp. 1548-1557.
[19]M.-F. Moens, "Innovative techniques for legal text retrieval," Artificial Intelligence and Law, vol. 9, pp. 29-57, 2001/03/01 2001.
[20]T. Strzalkowski, Natural language information retrieval vol. 7: Springer Science & Business Media, 1999.
[21]D. D. Lewis and K. S. Jones, "Natural language processing for information retrieval," Communications of the ACM, vol. 39, pp. 92-101, 1996.
[22]J. M. Bradshaw, Software agents: MIT press, 1997.
[23]Y. Yang and X. Liu, "A re-examination of text categorization methods," pp. 42-49.
[24]J. Cowie and Y. Wilks, "Handbook of natural language processing. chapter Information Extraction," ed: Marcel Dekker, New York, 2000.
[25]P. Willett, "Recent trends in hierarchic document clustering: a critical review," Information Processing & Management, vol. 24, pp. 577-597, 1988.
[26]K. S. Jones, "What might be in a summary?," Information retrieval, vol. 93, pp. 9-26, 1993.
[27]E. Schweighofer and A. Geist, "Legal Query Expansion using Ontologies and Relevance Feedback," in LOAIT, 2007, pp. 149-160.
[28]K. D. Ashley and E. L. Rissland, "Ashley,K. D.-But, see, accord: generating blue book citations in HYPO," presented at the Proceedings of the 1st international conference on Artificial intelligence and law, Boston, Massachusetts, USA, 1987.
[29]V. Aleven, "Teaching Case-Based Argumentation Through a Model and Examples," Ph.D. Dissertation, University of Pittsburgh, 1997.
[30]M.-F. Moens, C. Uyttendaele, and J. Dumortier, "Abstracting of Legal Cases: The SALOMON Experience," presented at the Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 1997.
[31]E. L. RISSLAND, K. D. ASHLEY, and L. K. BRANTING, "Case-based reasoning and law," The Knowledge Engineering Review, vol. 20, pp. 293-298, 2005.
[32]C.-L. Liu and T.-C. Chang, "Some Case-Refinement Strategies for Case-Based Criminal Summary Judgments," in Foundations of Intelligent Systems. vol. 2871, N. Zhong, Z. Raś, S. Tsumoto, and E. Suzuki, Eds., ed: Springer Berlin Heidelberg, 2003, pp. 285-291.
[33]C.-L. Liu, C.-T. Chang, and J.-H. Ho, "Classification and clustering for case-based criminal summary judgments," presented at the Proceedings of the 9th international conference on Artificial intelligence and law, Scotland, United Kingdom, 2003.
[34]C.-L. Liu and T.-M. Liao, "Classifying Criminal Charges in Chinese for Web-Based Legal Services," in Web Technologies Research and Development - APWeb 2005. vol. 3399, Y. Zhang, K. Tanaka, J. Yu, S. Wang, and M. Li, Eds., ed: Springer Berlin Heidelberg, 2005, pp. 64-75.
[35]S. Chou and T.-P. Hsing, "Text mining technique for chinese written judgment of criminal case," in Intelligence and Security Informatics, ed: Springer, 2010, pp. 113-125.
[36]A. Wyner, R. Mochales-Palau, M.-F. Moens, and D. Milward, Approaches to text mining arguments from legal cases: Springer, 2010.
[37]M. Truyens and P. Van Eecke, "Legal aspects of text mining," Computer Law & Security Review, vol. 30, pp. 153-170, 4// 2014.
[38]V. V. Raghavan and S. M. Wong, "A critical analysis of vector space model for information retrieval," Journal of the American Society for information Science, vol. 37, pp. 279-287, 1986.
[39]V. Tam, A. Santoso, and R. Setiono, "A comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorization," in Proceedings of the 16th International Conference on Pattern Recognition, 2002, pp. 235-238.
[40]G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information processing & management, vol. 24, pp. 513-523, 1988.
[41]J.-T. Horng and C.-C. Yeh, "Applying genetic algorithms to query optimization in document retrieval," Information processing & management, vol. 36, pp. 737-759, 2000.
[42]S. Brüninghaus and K. D. Ashley, "Improving the representation of legal case texts with information extraction methods," presented at the Proceedings of the 8th international conference on Artificial intelligence and law, St. Louis, Missouri, USA, 2001.
[43]O. Dictionary, "Oxford Advanced Learner’s Dictionary," ed: Oxford University Press, 2000.
[44]J. Osborn and L. Sterling, "JUSTICE: a judicial search tool using intelligent concept extraction," presented at the Proceedings of the 7th international conference on Artificial intelligence and law, Oslo, Norway, 1999.
[45]J. Bing, "Performance of legal text retrieval systems: The curse of boole," Law. Libr. J., vol. 79, p. 187, 1987.
[46]J. P. Dick, "Representation of legal text for conceptual retrieval," in Proceedings of the 3rd international conference on Artificial intelligence and law, 1991, pp. 244-253.
[47]R. Winkels, D. Bosscher, A. Boer, and R. Hoekstra, "Extended conceptual retrieval," Legal Knowledge and Information Systems, pp. 85-98, 2000.
[48]L. T. McCarty, "Intelligent legal information systems: Problems and prospects," Rutgers Computer & Tech. LJ, vol. 9, p. 265, 1982.
[49]M.-F. Moens and R. De Busser, "First steps in building a model for the retrieval of court decisions," International Journal of Human-Computer Studies, vol. 57, pp. 429-446, 2002.
[50]J. Breuker, A. Elhag, E. Petkov, and R. Winkels, "Ontologies for legal information serving and knowledge management," Legal Knowledge and Information Systems. IOS Press, Amsterdam, 2002.
[51]S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, "Indexing by latent semantic analysis," JASIS, vol. 41, pp. 391-407, 1990.
[52]D. E. Rose and R. K. Belew, "A connectionist and symbolic hybrid for improving legal research," International Journal of Man-Machine Studies, vol. 35, pp. 1-33, 1991.
[53]S. Brüninghaus and K. D. Ashley, "Finding factors: learning to classify case opinions under abstract fact categories," in Proceedings of the 6th international conference on Artificial intelligence and law, 1997, pp. 123-131.
[54]C. M. Rahman, F. A. Sohel, P. Naushad, and S. Kamruzzaman, "Text classification using the concept of association rule of data mining," arXiv preprint arXiv:1009.4582, 2010.
[55]J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques: Morgan kaufmann, 2006.
[56]C.-c. Su, "Document clustering based on vector space model with concepts as the dimension value," National Central University, Taiwan, 2007.
[57]D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning vol. 412: Addison-wesley Reading Menlo Park, 1989.
[58]J. Heaton, Introduction to neural networks with Java: Heaton Research, Inc., 2008.
[59]H. John, "Holland, Adaptation in natural and artificial systems," ed: MIT Press, Cambridge, MA, 1992.
[60]A. Charnes, W. W. Cooper, and E. Rhodes, "Measuring the efficiency of decision making units," European journal of operational research, vol. 2, pp. 429-444, 1978.
[61]M. J. Farrell, "The measurement of productive efficiency," Journal of the Royal Statistical Society. Series A (General), pp. 253-290, 1957.
[62]S. Godbole and S. Sarawagi, "Discriminative methods for multi-labeled classification," in Advances in Knowledge Discovery and Data Mining, ed: Springer, 2004, pp. 22-30.
[63]K. A. De Jong, "Analysis of the behavior of a class of genetic adaptive systems," 1975.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2016-1-12

推文