應用於校內法規之分類化文字探勘與檢索技術

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：32

、訪客IP：3.142.197.198

姓名

范芳瑄(Fang-Syuan Fan) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

應用於校內法規之分類化文字探勘與檢索技術
(Classified Term Frequency-Inverse Document Frequency technique applied to school regulationsClassified Term Frequency-Inverse Document Frequency technique applied to school regulations)

相關論文

★ 應用自組織映射圖網路及倒傳遞網路於探勘通信資料庫之潛在用戶	★ 基於社群網路特徵之企業電子郵件分類
★ 行動網路用戶時序行為分析	★ 社群網路中多階層影響力傳播探勘之研究
★ 以點對點技術為基礎之整合性資訊管理及分析系統	★ 在分散式雲端平台上對不同巨量天文應用之資料區域性適用策略研究
★ 應用資料倉儲技術探索點對點網路環境知識之研究	★ 從交易資料庫中以自我推導方式探勘具有多層次FP-tree
★ 建構儲存體容量被動遷徙政策於生命週期管理系統之研究	★ 應用服務探勘於發現複合服務之研究
★ 利用權重字尾樹中頻繁事件序改善入侵偵測系統	★ 有效率的處理在資料倉儲上連續的聚合查詢
★ 入侵偵測系統：使用以函數為基礎的系統呼叫序列	★ 有效率的在資料方體上進行多維度及多層次的關聯規則探勘
★ 在網路學習上的社群關聯及權重之課程建議	★ 在社群網路服務中找出不活躍的使用者

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-7-31以後開放)

摘要(中)

本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』，並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法，無法呈現多元化的選擇，因此透過相性並搭配餘弦相似性、階層式分群法等技術，使得一篇法規可在不同的相性產生不同的結果，透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。

關鍵字：文字探勘、文字探勘與檢索、相似度分析、階層式分群
本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』，並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法，無法呈現多元化的選擇，因此透過相性並搭配餘弦相似性、階層式分群法等技術，使得一篇法規可在不同的相性產生不同的結果，透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。

摘要(英)

This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

keyword：text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

keyword：text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

關鍵字(中)

★ 文字探勘
★ 文字探勘與檢索
★ 相似度分析
★ 階層式分群

關鍵字(英)

★ text mining
★ TF-IDF
★ Cosine Similarity
★ Hierarchical Clustering

論文目次

中文摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
第一章緒論 1
1.1 研究動機與背景 1
1.2 研究目的 2
1.3 論文架構 3
第二章文獻探討 5
2.1 文字探勘與檢索技術（TF-IDF） 5
2.1.1 詞頻（TF） 5
2.1.2 逆向文本頻率（IDF） 6
2.1.3 結論 8
2.2 餘弦相似性 9
2.3 群聚分析 10
第三章系統設計 13
3.1 系統流程與架構 13
3.1.1資料建置 13
3.1.2 文字處理 14
3.1.3 法條相性歸類 15
3.1.4 文字探勘與檢索 16
3.1.5 相似度分析 17
3.1.6 階層式分群 18
3.2 研究對象 19
第四章研究方法 21
4.1 資料蒐集 21
4.2 文字前置處理 24
4.2.1 停用詞 24
4.2.2 同義詞替換 25
4.2.3 自定詞庫斷詞 25
4.3 相性定義 25
4.4 文字探勘與檢索（TF-IDF） 26
4.4.1 詞頻（TF） 29
4.4.2 逆向文本頻率（IDF） 30
4.4.2 結果 31
4.5 計算相似度分析 32
4.6 階層式分群法（Hierarchical Clustering） 32
第五章雲端平台分析設計流程 34
5.1 開發環境 34
5.2 自定相性 34
5.3 匯入基本資料 35
5.4 自定詞庫 36
5.5 文本切詞 37
5.6 計算TF×IDF 38
5.7 法規相似度比較 40
第六章實證分析與結果 41
6.1 相性詞語統計 41
6.2 個相性的分布結果 42
第七章結論 47
7.1 結論 47
7.2 遇到的困難 47
7.3 未來展望 47
參考文獻 48
附錄一法規明細表 50
附錄二限制條件明細表 54
附錄三利益與權利詞語明細表 55
附錄四法規依據詞語明細表 56
附錄五適用對象詞語明細表 57
附錄六審核機制詞語明細表 58

參考文獻

[1] P.‐N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison‐Wesley, Pearson International Edition, 2018.
[2] A. Ochiai. Zoogeographical studies on the solenoid fish found in japan and its neighboring regions. Bull, Japan Soc. Sci. Fisheries 22, 526–530, 1957.
[3] J. J. Barkman, Phytosociology and ecology of cryptogamic epiphytes, 1958.
[4] Chowdhury, G. G. Introduction to modern information retrieval, Facet publishing, 2010.
[5] G. Salton, E. A. Fox, H. Wu, Extended Boolean information retrieval. Cornell University, 1022–1036, 1982.
[6] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information processing & management, 24(5), 513-523, 1988.
[7] V. Zappala, A. Cellino, P. Farinella, Z. Knezevic, Asteroid families. I-Identification by hierarchical clustering and reliability assessment, The Astronomical Journal, 100, 2030-2046, December 1990.
[8] W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, Knowledge discovery in databases: An overview, AI magazine, 13(3), 57-57, 1992.
[9] M. Bramer, Principles of data mining (Vol. 180), London: Springer, 2007.
[10] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2016.
[11] K. A. Taipale, Data mining and domestic security: Connecting the dots to make sense of data, Columbia Science and Technology Law Review, 5(2), 2003.
[12] C. Pitts, The End of Illegal Domestic Spying? Don′t Count on It. Washington Spectator, 2007.
[13] F. Schwed, J. Zweig, Where are the Customers′ Yachts? Or A Good Hard Look at Wall Street (p. 212). New York: Simon and Schuster, 1940.
[14] T. Menzies, Y. Hu, Data mining for very busy people. Computer, 36(11), 22-29, 2003.
[15] R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, WEKAâˆ’Experiences with a Java Open-Source Project. Journal of Machine Learning Research, 11(Sep), 2533-2541, 2010.
[16] J. Forcier, P. Bissex, W. J. Chun, Python web development with Django. Addison-Wesley Professional, 2008.

指導教授

蔡孟峰

審核日期

2019-7-23

推文