具部份漸進學習能力之類神經網路樹及其於垃圾郵件過濾器之應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：38

、訪客IP：18.220.136.165

姓名

羅淑薰(Shu-Hsun Lo) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

具部份漸進學習能力之類神經網路樹及其於垃圾郵件過濾器之應用
(A Neural Tree with Partial Incremental Learning Capability and Its Application in Spam Filtering)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

電子郵件的方便性及低成本，成為網際網路上普遍且廣泛使用的一種服務，但是也由於SMTP通訊協定上的簡略，對於送信者的資料正確與否，及送信上的限制制定簡單，使得容易被干擾、濫用，大量的垃圾信、電子炸燀以及郵件病毒使用戶受到極大的困擾，在未經使用者所允許的郵件，大量傳遞，讓垃圾郵件的問題一發不可收拾，這類的行為，不僅浪費網路頻寬與儲存空間，並且隱藏資訊安全危機，包括病毒與資訊安全外洩，降低生產力與工作效率以及增加管理成本等。
防治垃圾郵件的方式從黑名單比對、內容過濾、阻斷IP位址等技術，直到最新的智慧型防禦引擎，反垃圾郵件技術不斷翻新，然而，很少能百分之百杜絕垃圾信。本論文提出一個階層式二次連結類神經網路（a quadratic-neuron-based neural tree, QUANT），結合了決策樹與類神經網路的優點，利用二次連結的神經元，能找出資料在高維度間的關係，除可有效保留舊有資料的特徵，並能同時吸收新型的變種郵件，達到部分漸進式的學習的效果；這樣的郵件過濾系統，除了能有效防堵既有的垃圾郵件，並能適應新型郵件特性之挑戰。

摘要(英)

People have been struggling with spam for 10 years and more. E-mail’s ubiquitous, no-cost ease of use encourages “bombing,” “flaming,” and other forms of abuse. E-mail messages that bear embedded and attached viruses, or ill-behaved or malevolent executables, can wreak havoc on computers. The standard techniques filtering spam are black-listing, ip-tracing, content-filtering and etc. The trouble is, neither of these traditional techniques works particularly very well. In this thesis, a new approach to constructing a neural tree with partial incremental learning capability is presented.
The proposed neural tree, called a quadratic-neuron-based neural tree （QUANT）, is a tree structured neural network composed of neurons with quadratic neural-type junctions for pattern classification. The proposed QUANT integrates the advantages of decision trees and neural networks. Via a batch-mode training algorithm, the QUANT grows a neural tree containing quadratic neurons in its nodes. These quadratic neurons recursively partition the feature space into hyper-ellipsoidal-shaped sub-regions. The QUANT has the partial incremental capability so that it does not need to re-construct a new neural tree to accommodate new training data whenever new data are introduced to a trained QUANT.
To demonstrate the performance of the proposed QUANT, a design of spam filter was tested. The spam filter is able to learn new type of spam mail besides keeping the property of existed mail. The spam filter can both prevent the existed spam and adapt itself to the new one.

關鍵字(中)

★ 類神經網路樹
★ 決策樹
★ 漸進學習
★ 樣本識別

關鍵字(英)

★ incremental learning
★ decision tree
★ neural tree
★ pattern recognition

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
一、緒論 1
1-1　　研究動機 1
1-2　　研究目標 2
1-3　　論文架構 3
二、相關研究 4
2-1　　垃圾郵件 4
2-1-1　定義與由來 4
2-1-2 電子郵件的基本架構、編碼及傳遞方式 5
2-1-3 防制相關技術 12
2-2　　分類器演算法之整理 14
三、階層式二次連結類神經網路 25
3-1　　二次連結（Quadratic Junction） 25
3-2　　階層式架構 27
3-3 　階層式的漸進式學習（Incremental Learning） 29
四、垃圾郵件過濾器之設計 31
4-1 垃圾郵件過濾器之架構 31
4-2 郵件標頭分析 32
4-2-1　特徵擷取與編碼 32
4-3 系統演算法 33
4-3-1　QUANT 33
4-3-2 建樹的步驟 37
4-3-3 漸進學習階段 40
4-4 郵件內文分析 41
4-4-1 內文前處理 41
4-4-2　建立黑名單 42
4-4-3 偵測圖片數等欲使用者去點選的內容 44
4-4-4 特殊符號 44
五、實驗設計與結果分析 45
5-1 實驗說明 45
5-2 實驗設計 46
5-2-1 資料集之簡介 46
5-2-2 實驗評估方法 49
5-3　　實驗結果 50
5-3-1　QUANT-應用於二維資料 50
5-3-2　SpamAssassin 20030228 53
5-3-3　Trec_ 2006_eng 55
5-3-4　Trec_2006_chi 57
5-3-5　參數調整 58
5-4　　實驗結果分析 61
六、結論與未來展望 64
6-1 結論 64
6-2 未來展望 65
參考文獻 66

參考文獻

[1] S. Abe and R. Thawonmas, “A Fuzzy Classifier with Ellipsoidal Regions,” IEEE Trans. on Fuzzy System, vol. 5, pp. 358-368, Aug. 1997.
[2] S. Abe, “Fuzzy Function Approximators with Ellipsoidal Regions,” IEEE Trans. on System, Man, and Cybernetics, vol. 29, pp. 654-661,August 1999.
[3] S. Abe, “Dynamic Cluster Generation for a Fuzzy Classifier with Ellipsoidal Regions,” IEEE Trans. on System, Man, and Cybernetics, vol.28, pp. 869-876, December 1998.
[4] R. Andrews, J. Diederich, and A. B. Tickle, “Survey and critique of techniques for extracting rules from trained artificial neural networks,” Knowledge-Based Syst., vol. 8, no. 6, pp. 373-383, Dec. 1995.
[5] M. Ben, “An Adaptive Approach to Spam Filtering on a New Corpus,” in Proc. of Third Confernece on Email and Anti-Spam, California, July 27-28, 2006, pp. 6-13.
[6] S. Behnke and N. B. Karayiannis, “Competitive neural trees for pattern recognition,” IEEE Trans. on Neural Networks, vol. 9, pp. 1352-1369, 1998.
[7] J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press, 1981.
[8] E. Blanzieri and A. Bryl, “A Survey of Anti-Spam Techniques,” Technical report, 2006.
[9] L. Breiman, J. Friedman, R. Olshen, and C. J. Stone, Classification and Regression Trees. Wadsworth, Inc., 1984.
[10] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Computer Vision, Graphics, and Image Processing, vol. 37, pp. 54-115, 1987.
[11] J. Clark, I. Koprinska, and J. Poon, “A Neural Network Based Approach to Automated E-mail Classification,” in Proc. of the IEEE/WIC international conference on Web Intelligence, Oct. 2003, pp. 702-705.
[12] J. Clark, I. Koprinska, and J. Poon, “LINGER －A Smart Personal Assistant for E-Mail Classification,” in Proc. of the 13th Intern. Conference on Artificial Neural Networks, Istanbul, Turkey, June 26-29, 2003, pp.274-277.
[13] M. W. Craven and J. W. Shavlik, “Extracting tree structured representations of trained networks,” in Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. London, U.K.: Morgan Kaufmann/MIT Press, 1996.
[14] N. Cristianini and J. Shawe-Taylor, Support Vector Machines. Cambridge University Press, 2000.
[15] N. DeClaris and M. C. Su, “A novel class of neural networks with quadratic junctions, ” IEEE Int. Conf. on Systems, Man, and Cybernetics, Oct. 1991, vol. 3, pp. 1557-1562.
[16] N. DeClaris and M. C. Su, “Introduction to the theory and application of neural networks with quadratic junctions, ” IEEE Int. Conf. on Systems, Man, and Cybernetics, Oct. 1992, vol. 2, pp. 1320-1325.
[17] L. Fang, A. Jennings, W. X. Wen, K. Q.-Q. Li, and T. Li, “Unsupervised learning for neural trees,” in Proc. Int. Joint Conf. on Neural Networks, Seattle, WA, July 8-12, 1991,vol. 3, pp. 2709-2715.
[18] H. Guo and S. B. Gelfand, “Classification trees with neural network feature extraction,” IEEE Trans. on Neural Networks, vol. 3, pp. 923-933, Nov. 1991.
[19] D. Kalles and T. Morris, “Efficient incremental induction of decision trees,” Machine Learning, vol. 24, pp. 231-242, Sep. 1996.
[20] M. W. Kurynski, “The optimal strategy of a tree classifier,” Pattern Recognition, vol. 16, pp. 81-87, 1983.
[21] C. Lai and M. C. Tsai, “An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization,” in Proc. of the 2004 Hybrid Intelligent System, Dec. 2004, pp. 44-48.
[22] R. S. Michalski, “A theory and methodology of inductive learning,” Machine learning: An Artificial Intelligence Approach, vol. 20, no.2, pp. 111-116, 1983.
[23] J. R. Quinlan, “Induction of decision trees, ” Machine Learning, vol. 1, pp. 81-106, 1986.
[24] J. R. Quinlan, “Simplifying decision trees,” International Journal of Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[25] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann, 1993.
[26] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Trans. on Neural Networks, vol. 2, pp. 285-293, 1991.
[27] G. Salton, C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, vol. 24, pp. 513-523, 1988.
[28] T. D. Sanger, “A tree-structured adaptive network for function approximation in high-dimensional spaces,” IEEE Trans. on Neural Networks, vol. 2, pp. 285-293, March 1991.
[29] A. Sankar and R. J. Mammone, “Speaker independent vowel recognition using neural tree networks,” in Proc. of the 1991 International Joint Conference on Neural Networks, Seattle, July 1991, vol. Ⅱ, pp. 809-814.
[30] A. Sankar and R. J. Mammone, “Growing and pruning neural tree networks,” IEEE Trans. on Computers, vol. 42, pp. 809-814, 1993.
[31] A. Sankar and R. J. Mammone, “Optimal pruning of neural tree networks for improved generalization,” in Proc. of the 1991 International Joint Conference on Neural Networks, Seattle, July 1991, vol. Ⅱ, pp. 219-224.
[32] J. C. Schlimmer, D. Fisher, “A Case Study of Incremental Concept Induction,” in Proc. of the 5th National Conference on Artificial Intelligence,1986 , pp. 496-501.
[33] G. P. J. Schmitz, C. Aldrich, and F. S. Gouws, “ANN-DT: an algorithm for extraction of decision trees from artificial neural networks, ” IEEE Trans. on Neural Networks, vol. 10, pp. 1392-1401, Nov. 1999.
[34] S.Z. Selim, M.A. Ismail, “K-means type algorithms: a generalized convergence theorem and characterization of local optimality,” IEEE Trans. Pattern Anal. Mach. Intell.6, pp. 81-87, 1984.
[35] K. Sethi, “Neural implementation of tree classifiers,” IEEE Trans. on System, Man, Cybernetics, vol. 25, pp. 1243-1249, Aug. 1995.
[36] I. K. Sethi, “Decision tree performance enhancement using an artificial neural network implementation,” Artificial Neural Networks and Statistical Pattern Recognition: Old and New Connections (I. K. Sethi and Anil K. Jain, eds.), Machine Intelligence and Pattern Recognition, North-Holland, 1991, pp. 71-88.
[37] K. Sethi, “Entropy nets: From decision trees to neural networks,” in Proc. of the IEEE, Oct. 1990, vol. 78, no. 10, pp. 1605-1613.
[38] M. Shhami, S. Dumais, D. Heckerman, and E. Horvitz, “A bayesian approach to filtering junk e-mail,” in AAAI’98 Wkshp. Learning for Text Categorization, Madison, WI, July 27, 1998, pp. 55-62.
[39] P. K. Simpson, “Fuzzy min-max neural networks － Part1: Classification,” IEEE Trans. on Neural Networks, vol. 3, pp. 776-786, May 1992.
[40] M. C. Su and T. K. Liu, “Application of neural networks using quadratic junctions in cluster analysis,” Neurocomputing, vol. 37, pp. 165-175, 2001.
[41] G. Towell and J. W. Shavlik, “Extracting refined rules from knowledge based neural networks,” Machine Learning, vol. 13, pp. 71-101, 1993.
[42] F. Uebele, S. Abe, and M.-S. Lan, “A neural network-based fuzzy classifier,” IEEE Trans. on Systems, Man, Cybernetics, vol. 25, pp. 353-361, Feb. 1995.
[43] P. E. Utgoff, “Incremental induction of decision trees,” Machine Learning, vol. 4, pp. 161-186, Nov. 1989.
[44] X. L. Wang and I. Cloete, “Learning to Classify Email: A Survey,” in Proc. of Fourth International Conference on Machine Learning and Cybernetics, Aug. 2005, vol.9, pp. 18-21.
[45] M. Woitaszek and M. Shaaban, “Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine,” in Proc. of the 2003 Symposium on Applications and the Internet, Orlando, FL, USA, Jan. 27-31, 2003, pp. 166-169.
[46] C. Zhan, X. L. Lu, M. S. Hou, and X. Zhou, “A LVQ-based neural network anti-spam email approach,” SIGOPS Operating Systems Review, vol. 39, pp.34-39, 2005.
[47] L. Zhang, J. Zhu, and T. Yao, “An Evaluation of Statistical Spam Filtering Techniques,” ACM Trans. on Asian Language Information Processing, vol. 3, pp. 243-269, Dec. 2004.
[48] K. Lucke. STOP SPAM. [Online]. Available: http://www.stopspam.org
/email/headers.html
[49] Spam Track Corpus. 2006 TREC Public Spam Corpora. [Online]. Available: http://plg.uwaterloo.ca/~gvcormac/treccorpus06/
[50] SpamAssassin. 20030228_Publiccorpus. [Online]. Available: http://
spamassassin.apache.org/publiccorpus/
[51] 王雅慈、曾黎明、游象甫和陳奕明，「廣告電子郵件的分流過濾及回覆訊息之萃取」，TANET2005，國立中興大學，台中市，2005年十月。
[52] 周守廉，「電子垃圾郵件管制作為之探討」，TANET 2004，579-584頁，國立台東大學，台東市，2004年十月。
[53] 蘇木春，張孝德主編，機器學習：類神經網路、模糊系統以及基因演算法則，全華科技圖書股份有限公司，台北市，民國八十六年。

指導教授

蘇木春(Mu-Chun Su)

審核日期

2007-7-4

推文