應用概念萃取於多罪中文判決書之索引;Applying Concept Extraction to the Indexing of Chinese Written Judgment Containing Several Offenses

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/69488

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/69488

題名:	應用概念萃取於多罪中文判決書之索引;Applying Concept Extraction to the Indexing of Chinese Written Judgment Containing Several Offenses
作者:	邢台平;Hsing,Tai-Ping
貢獻者:	資訊管理學系
關鍵詞:	法學資訊系統;法律文件索引;概念萃取;遺傳基因演算法;關聯規則;資料包絡分析;Legal information systems;concept extraction;information retrieval;genetic algorithm;legal case-indexing
日期:	2016-01-12
上傳時間:	2016-03-17 20:45:32 (UTC+8)
出版者:	國立中央大學
摘要:	法學資訊系統的相關研究已經發展數十年，而類似案件的搜尋是法律資訊學的一個重要議題。當一個法律專業人員或一般民眾遇到一件刑事案件，他們可能會有急迫的需要去尋找類似的案例作為參考。法律專業人員或一般民眾以往查詢法律案件資料庫，習慣於查詢司法院或法學專用的判決書查詢系統去尋找類似案件的判決書，目前的判決書查詢系統多應用全文檢索技術及布林邏輯模式所發展。對於刑事案件，當法官或法律專業人員遇到的刑事案件觸犯多罪，使用者可能會提供更多的查詢關鍵詞，並結合布林運算子去查詢，然而低查詢正確率及過多不正確的案件被檢索出來，使用者必須以人工的方式耗費大量的時間去過濾。為了克服使用者資訊負載過重的問題，其中一種解決方法是藉由概念萃取的應用去改善文件向量的表述方式。概念萃取技術可以從一些相關的語詞萃取出一個概念，它可以避免一些出現很頻繁但不重要的語詞所產生的雜訊，並且把文件由語詞向量轉變成為概念向量去減少向量的維度，並可從文件中萃取出特定的資訊。因此，本研究的目的是去發展概念萃取方法，從刑事案件的判決書中萃取出罪刑的概念，並且利用被萃取的罪刑概念修正刑事案件判決書的向量表述方式。基於關聯規則、遺傳基因演算法及資料包絡分析的概念萃取技術，本研究發展了4個概念萃取的方法，以及基於這4種方法後續的案件索引程序。為了測試本研究所提出方法的適用性，我們進行了3個實驗。第1個實驗比較了這4個概念萃取方法的判決書檢索效率，第2個實驗則是將第1個實驗裏的4個方法，與3個一般常用的文件索引方法，進行檢索效率的比較。第3個實驗去驗證在測試資料集所包含的罪刑數量，是否會影響到這4個概念萃取方法的檢索效率。實驗1與2的結果顯示，我們所提出的方法中，最佳的方法使用了TLCEF及GAWF二個功能的組合，而提出的4個方法在檢索效率上均優於被比較的3個文件索引的方法。實驗的結果顯示測試集所包含的罪刑數量由21個減少至10個，所得到的檢索效率有顯著的提升。;Many legal information systems have been developed in the past few decades. Similar cases search is an important research issue in legal domain. When law professionals or the general public encounter an instance, they might be interested in looking for the similar cases for reference. Law professionals or the general public used to search a database of legal cases by using the current judgment retrieval systems based on full-text search technique and Boolean-logic model. In the case that law professionals or the general public encounter a criminal case in which criminal activities involve several articles, the user must enter more terms in the query, and these systems often respond with many cases and some of them could be only marginally relevant to the query. To solve the overloading problem of information retrieval, one of the solutions is to improve the vector representation of document by the application of concept extraction. Concept extraction can generalize a concept from some related terms while reducing noises sourced from frequent but unimportant terms, and transfer term vectors into concept vectors to reduce the dimension of vector and extract the specific information from document. Thus, this study is aimed to develop concept-extraction methods to extract the concept of offense from the criminal judgment and to modify the criminal case vector by means of the extracted concepts instead of occurrence of words. Based on concept-extraction techniques of association rule, genetic algorithm, and data envelopment analysis, we have developed four concept-extraction methods, on which, four case-indexing processes are subsequently conceived respectively. To test the applicability of the proposed four methods, this study conducts three experiments. The first experiment compares the retrieval performances among the proposed four methods. The second experiment tests whether the proposed four methods outperform the general indexing schemes. The third experiment confirms whether the cardinality of offense type in test set affects the retrieval performances of the proposed four methods. The first experiments shows that the best of the proposed four methods is the one based on the proposed functions of TLCEF and GAWF in combination. The second experiment shows that all of the proposed four methods outperform general indexing schemes. The last experiment shows that when the cardinality of offense types in test set is reduced from 21 to 10, the retrieval performances of the four methods show a significant increase.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	435	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....