對於法律問題進行判例檢索和法條預測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.148.216.27

姓名

劉譯閎(Yi-Hung Liu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

對於法律問題進行判例檢索和法條預測
(Judgment Retrieval and Statute Prediction for Legal Problems)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

應用文件探勘在法律問題處理上已成為近年來新興的研究領域。就我們所知，即使先前已有少數的研究著重於協助法律專業人士檢索相關的法律文件，然而這些研究並未考量到一般人民在使用法律詞彙來描述碰到的法律問題有其困難的地方，同時，也沒有研究在探討有關於利用法律問題來進行相關法條預測。在本論文中，我們探討二個研究議題：藉由運用法律文件的特性進行判例檢索及法條預測。在第一個研究主題之中，我們提出了一個基於文件探勘的方法讓一般人士可以使用日常詞彙來搜尋及檢索出相關的刑事判例。在第二個研究主題中，提出了一個三階段法條預測方法。這個預測的方法提供非專業人士使用日常詞彙來描述法律問題進而用以找出問題所涉及相關的法條。本文透過兩個主要實驗設計來驗證成效。在第一個研究議題實驗上，我們使用了傳統的TF-IDF方法與本文所提出的判例檢索方法透過問卷調查的方式進行成效比較。就第二個研究議題實驗中，我們採用了四個知名的檢索方法分別為Cosine 相似度、Pearson 相關係數、 Spearman′s相關係數及TF-IDF與本文提出的三階段法條預測方法進行成效比較。經由實驗過程(以中文刑事判例為資料集)，說明這兩個研究議題所提出的方法皆具有效性及準確性，同時顯示此兩個方法皆優於傳統方法。

摘要(英)

Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although a few previous studies focused on assisting professionals in the retrieval of related legal documents, to our knowledge, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms and could not provide relevant statutes to the general public using problem statements. In this dissertation, we formulate two research topics: judgment retrieval and statute prediction using the unique characteristics of legal documents. In the first research topic, we design a text mining based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. Then we present an innovative approach, the three-phase prediction (TPP) algorithm, which enables laypeople to use daily vocabulary to describe their problems and find pertinent statutes for their cases. There are two experiments to validate our proposed research methods. The first experimental study compares the performances of traditional TF-IDF method and our judgment retrieval approach through a survey. The second one is based on the statute prediction problem, and four state of the art retrieval functions including Cosine similarity, Pearson correlation coefficient, Spearman′s correlation coefficient and TF-IDF methods are compared with TPP. Both proposed methods have been verified for accuracy and effectiveness by using Chinese Criminal Code judgments. The results show that the proposed methods are accurate and they are more advantageous than traditional methods.

關鍵字(中)

★ 文件探勘
★ 法條
★ 刑事判例
★ 向量空間模型
★ 標準化谷歌距離
★ 支援向量機

關鍵字(英)

★ Text Mining
★ Statute
★ Criminal judgment
★ Vector space model
★ Normalized Google Distance
★ Support Vector Machines

論文目次

Table of Contents i
List of Figures iii
List of Tables iv
Chapter 1. Introduction 1
1.1. Considering the judgment aspect of legal problems 3
1.2. Considering the statute aspect of legal problems 5
1.3. Organization of the Dissertation 8
Chapter 2. Literature Review 9
2.1. Background 9
2.2. An overview of text mining 10
2.3. Applications of text mining 11
2.4. Related academic research on text mining in the legal domain 12
Chapter 3. Retrieving associated judgments for legal problems 13
3.1. Definitions 13
3.2. The Judgment Retrieval Approach 14
3.2.1. Phase 1: Training set generation 14
3.2.2. Phase 2: Query 14
3.3. Experimental Study 22
3.3.1. Data Collection 22
3.3.2. Details of implementation 14
3.3.3. Experimental results and evaluation 25
3.4. Summary 27
Chapter 4. Predicting relevant statutes for legal poblems............................................29
4.1. Differences between legal documents and normal documents 29
4.2. The Three-Phase Prediction Approach 30
4.2.1. Phase 1: Select the top k1 statutes 31
4.2.2. Phase 2: Select the top k2 statutes 37
4.2.3. Phase 3: Select the final predicted statutes 38
4.3. Experimental Study 40
4.3.1. Testbed 40
4.3.2. Details of implementation 42
4.3.3. Experimental results and evaluation 44
4.3.3.1. Find the optimal combination 44
4.3.3.2. Comparison 47
4.4. Summary 50
Chapter 5. Discussions and Limitations 51
5.1. Findings 51
5.2. Limitations 52
Chapter 6. Conclusions and Future Works 51
References 56
Appendix 60

參考文獻

[1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Wokingham, UK: Addison-Wesley, 1999.
[2] A. Bergholz, J. De Beer, S. Glahn, M.F. Moens, G. Paaß and S. Strobel, “New filtering approaches for phishing email”, Journal of Computer Security, 18(1), pp.7-35, 2010.
[3] A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. van der Goot, M. Halkia, B. Pouliquen and J. Belyaeva, “Sentiment Analysis in the News”, Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC′2010), pp. 2216-2220. Valletta, Malta, pp.19-21, May, 2010.
[4] F. Can and E.A. Ozkarahan, “Computation of term/document discrimination values by use of the cover coefficient concept”, Journal of the American Society for Information Science, 38(3), pp.171-183, 1987.
[5] Chuan-hsi Chen and Jeffery Y. P. Chi, “Use Text Mining to Generate the Draft of indictment for Prosecutor”, PACIS 2010 proceedings, pp.706-712, 2010.
[6] C.C. Chang and C.J. Lin, “LIBSVM: a library for support vector machines”, Accessed 01.07.2013.
[7] S.C. Chou and T.P. Hsing, “Text Mining Technique for Chinese Written Judgment of Criminal Case”, IEEE Intelligence and Security Informatics Conference, pp.113-125, 2010.
[8] L. Chen, D. Zhang and M. Levene, “Question retrieval with user intent”, Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, July 28-August 01, Dublin, Ireland, 2013.
[9] Y.L. Chen and Y.T. Chiu, “An IPC-based vector space model for patent retrieval”, Information Processing and Management, 47(3), pp.309-322, 2011.
[10] W. Chen, J. Yan, B. Zhang, Z. Chen, and Q. Yang, “Document Transformation for Multi-label Feature Selection in Text Categorization”, Proc. 7th IEEE International Conference on Data Mining, IEEE Computer Society, Los Alamitos, CA, USA, pp.451–456, 2007.
[11] Rudi L. Cilibrasi and Paul M.B. Vitanyi, “The Google Similarity Distance“, IEEE Transactions on Knowledge and Data Engineering, 19(3), pp.370-383, 2007.
[12] A. Clare and R.D. King, “Knowledge Discovery in Multi-Label Phenotype Data“, Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pp.42-53, 2001.
[13] J.G. Conrad and F. Schilder, “Opinion mining in legal blogs“, ICAIL ′07 Proceedings of the 11th international conference on Artificial intelligence and law, pp.231-236, 2007.
[14] A. Evangelista and B. Kjos-Hanssen, “Google distance between words“, Frontiers in Undergraduate Research, Univ. of Connecticut, 2006.
[15] R. Feldman and J. Sanger, “The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data“, New York, USA: Cambridge University Press, 2007.
[16] J. Goldstein, V. Mittal, J. Carbonell and M. Kantrowitz, “Multi-document summarization by sentence extraction“, Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization, pp.40-48, April 30-30, Seattle, Washington, 2000.
[17] A. Gomez-Perez, F. Ortiz-Rodriguez and B. Villazon-Terrazas, “Ontology-based legal information retrieval to improve the information access in e-government“, Proceedings of the 15th international conference on World Wide Web, pp.1007-1008, 2007.
[18] A. Hotho, A. Nürnberger and G. Paaß, “A brief Survey of text mining“, Journal for Computational Linguistics and Language Technology, 20(1), pp.19-62, 2005.
[19] H.H. Hsu, Y.F. Chen, C.Y. Lin, C.W. Hsieh and T.K. Shih, “Emotion Care Services with Facebook Wall Messages“, The 26th International Conference on Advanced Information Networking and Applications Workshops, pp.875-880, 2012.
[20] J. Kaur, M. Yusof, P. Boursier and J.M. Ogier, “Automated scientific document retrieval“, The 2nd International Conference on Computer and Automation Engineering, ICCAE 20105, pp.732-736, 2010.
[21] H. Kawai, A. Jatowt, K. Tanaka, L. Kunieda and K. Yamada, “Query expansion and text mining for chronoseeker-search engine for future/past events“, IEICE Transactions on Information and Systems, E94-D (3), pp.552-563, 2011.
[22] K.E. Lochbaum and L.A. Streeter, “Combining and comparing the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval“, Information Processing and Management, 25(6), pp.665-676, 1989.
[23] Y.J. Li, C. Luo, and S.M. Chung, “Text clustering with feature selection by using statistical data“, IEEE Transactions on Knowledge And Data Engineering, 20(5), pp.641-652, 2008.
[24] N. Li and D.D. Wu, “Using text mining and sentiment analysis for online forums hotspot detection and forecast“, Decision Support Systems, 48(2), pp.354–368, 2010.
[25] X. Li, L. Du and Y.D. Shen, “Update Summarization via Graph-Based Sentence Ranking“, IEEE Transactions on Knowledge and Data Engineering, 25(5), pp.1162–1174, 2013.
[26] H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering“, IEEE Transactions on Knowledge and Data Engineering, 17(4), pp.491-502, 2005.
[27] M.F. Moens, “Innovative techniques for legal text retrieval“, Artificial Intelligence and Law, pp.29-57, 2001.
[28] M.F. Moens, “Combining structured and unstructured information in a retrieval model for accessing legislation“, ICAIL ′05 Proceedings of the 10th international conference on Artificial intelligence and law, pp.141-145, 2005.
[29] L. Nie, M. Wang, Z. Zha, G. Li and T.S. Chua, “Multimedia answering: enriching text QA with media information“, Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July 24-28, Beijing, China, 2011.
[30] A. Reyes, P. Rosso and D. Buscaldi, “From humor recognition to irony detection: The figurative language of social media“, Data & Knowledge Engineering, 74, pp.1-12, 2012.
[31] M.N. Ribeiro, M.J.R. Neto and R.B.C. Prudêncio, “Local feature selection in text clustering“,15th ICONIP, Springer, pp.45-52, 2008.
[32] M. Rogati and Y. Yang, “High-performing feature selection for text classification“, CIKM’02, pp.659-661, 2002.
[33] G. Salton, A. Wong and C.S. Yang, “A vector space model for automatic indexing“, Communications of the ACM, 18(11), pp.613-620, 1975.
[34] G. Salton and M. McGill, Introduction to Modern Information Retrieval. New York, USA: McGraw-Hill, 1983.
[35] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval“, Information Processing and Management, 24(5), pp.513-523, 1988
[36] G. Salton, Automatic Text Processing, Addison-Wesley, USA, 1989.
[37] G. Salton, J. Allan and C. Buckley, “Automatic structuring and retrieval of large text files“, Communications of the ACM, 37(2), pp.97–108, 1994.
[38] R. Schumaker, Y. Zhang, C. Huang and H. Chen, “Evaluating sentiment in ﬁnancial news articles“, Decision Support Systems, 53(3), pp.458-464, 2012.
[39] E. Stamatatos, “A survey of modern authorship attribution methods“, Journal of the American Society for Information Science and Technology, 60(3), pp.538-556, 2009.
[40] S. Thomaidou and M. Vazirgiannis, “Multiword keyword recommendation system for online advertising“, Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining, pp.423-427, 2011.
[41] A.J.C. Trappey and C.V. Trappey, “An R&D knowledge management method for patent document“, Industrial Management and Data Systems, 108(1-2), pp.245-257, 2008.
[42] D. Tikk, G. Biró and A. Törcsvári, “A hierarchical online classifier for patent categorization“, Emerging Technologies of Text Mining: Techniques and Applications, pp.244–267, 2007.
[43] G. Tsoumakas, I. Katakis and I. Vlahavas, “Mining Multi-label Data“, Data Mining and Knowledge Discovery Handbook, pp.667-685, 2010.
[44] D. Wang, S. Zhu, T. Li and Y. Gong, “Multi-document summarization using sentence-based topic models“, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, pp.297-300, 2009.
[45] K. Wang, Z.Y. Ming, X. Hu and T.S. Chua, “Segmentation of multi-sentence questions: towards effective question retrieval in cQA services“, Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, Geneva, Switzerland, 2010.
[46] J. Wang, B. Wang, L.Y. Duan, Q. Tian and H. Lu, “Interactive ads recommendation with contextual search on product topic space“, Multimedia Tools and Applications, pp.1-22, 2011.
[47] T.A. Almeida, J. Almeida and A. Yamakami, “Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers “, Journal of Internet Services and Applications, 1(3), pp.183-200, 2011.
[48] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. (3nd ed.). San Francisco, CA: Morgan Kaufmann, 2011.
[49] H. Yin, “Method and system of knowledge based search engine using text mining“, Google Patents, US Patent 7257530, 2007.
[50] S.E. Seker, C. Mert, K. Al-Naami, U. Ayan and N. Ozalp, “Ensemble classification over stock market time series and economy news “, IEEE International Conference on Intelligence and Security Informatics (ISI), pp.272-273, 2013.
[51] R. Zheng, J. Li, H. Chen and Z. Huang, “A Framework for Authorship Identification of Online Messages“, Journal of the American Society for Information Science and Technology, 57(3), pp.378-393, 2006.
[52] A. Wyner, R. Mochales-Palau, M. Moens and D. Milward, “Approaches to text mining arguments from legal cases”, Lecture Notes in Computer Science, 6036, pp.60-79, 2010.
[53] M. Truyens and P.V. Eecke, “Legal aspects of text mining”, Computer Law & Security Review, 30(2), pp.153-170, 2014.
[54] Erik Cambria, Bjorn Schuller, Bing Liu, Haixun Wang and Catherine Havasi, “Knowledge-Based Approaches to Concept-Level Sentiment Analysis”, IEEE Intelligent Systems, 28(2), pp.12-14, 2013.

指導教授

陳彥良

審核日期

2014-11-24

推文