基於餘弦和模糊相似度方法之漸進式企業電子郵件分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：41

、訪客IP：3.128.200.205

姓名

周家宇(CHIA-YU CHOU) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於餘弦和模糊相似度方法之漸進式企業電子郵件分類
(Incremental Enterprise Email Classification Based on Cosine and Fuzzy Similarity Approaches)

相關論文

★ 應用自組織映射圖網路及倒傳遞網路於探勘通信資料庫之潛在用戶	★ 基於社群網路特徵之企業電子郵件分類
★ 行動網路用戶時序行為分析	★ 社群網路中多階層影響力傳播探勘之研究
★ 以點對點技術為基礎之整合性資訊管理及分析系統	★ 在分散式雲端平台上對不同巨量天文應用之資料區域性適用策略研究
★ 應用資料倉儲技術探索點對點網路環境知識之研究	★ 從交易資料庫中以自我推導方式探勘具有多層次FP-tree
★ 建構儲存體容量被動遷徙政策於生命週期管理系統之研究	★ 應用服務探勘於發現複合服務之研究
★ 利用權重字尾樹中頻繁事件序改善入侵偵測系統	★ 有效率的處理在資料倉儲上連續的聚合查詢
★ 入侵偵測系統：使用以函數為基礎的系統呼叫序列	★ 有效率的在資料方體上進行多維度及多層次的關聯規則探勘
★ 在網路學習上的社群關聯及權重之課程建議	★ 在社群網路服務中找出不活躍的使用者

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於現今網路的發達以及方便性，使得電子郵件的使用量大幅上升。許多企業也將電子郵件視為與客戶或是企業內部員工相互傳訊的重要管道，因此對於公司而言，企業電子郵件系統的控管也變得相對重要。然而，許多員工利用企業的郵件系統傳送私人信件的情況是無可避免的。此現象帶來的後果是，私人郵件不但佔用郵件系統的頻寬造成系統效能降低，甚至可能造成企業的重要商業郵件延遲或無法順利寄出，造成公司商業上的損失。而且隨著隱私權意識的抬頭，如何在不監控郵件內容的情況下，將私人與商業郵件進行分類，以提升公司的商業效益，為本研究的目的。
為達到此目的，本研究只使用電子郵件之表頭資料(Header)而非郵件內容，雖然可能會降低分類的準確度，但卻能保護員工的隱私權。利用萃取出來的表頭資料，使用餘弦和模糊相似度的方法進行企業電子郵件的分類。更重要的是，本研究提出的漸進式系統可有效地避免處理累計的龐大郵件資料量，而且也考慮到隨著時間的改變，公司內部的人員流動或是客戶族群的變動問題。

摘要(英)

Nowadays, the usage amounts of email have increased because Internet becomes more common. Many enterprises regard email as an essential way for business in contacting with customers or employees. Therefore, the management of email system becomes even more important for an enterprise. However, it is unavoidable that a lot of employees send private emails by enterprise email system. It has brought negative effect to email system because the bandwidths are used by personal purpose. What worse, it may delay or affect in sending significant business emails. It may decrease the interests of an enterprise. Moreover, public becomes to take care about privacy. How to classify enterprise emails as either business or personal emails to improve the business interests without monitoring the contents of email. This is the goal of the paper.
To achieve this purpose, only the header of email will be used. The contents in this paper will not. Although it may lower the accuracy of classification. It will protect employee’s private rights. Using the cosine similarity and fuzzy similarity approaches to classify enterprise emails by extracted email header. More important, the incremental system which this paper purposed could effectively avoid handling the huge amount of cumulate emails. And it also considers the change of internal staffs or customers of an enterprise with passing of time.

關鍵字(中)

★ 模糊相似度
★ 餘弦相似度
★ 電子郵件分類

關鍵字(英)

★ fuzzy similarity
★ E-mail classification
★ cosine similarity

論文目次

摘要...i
Abstract...ii
目錄...iii
圖目錄...v
表目錄...vi
一、緒論 ...- 1 -
1-1 研究背景...- 1 -
1-2 研究動機與目的...- 3 -
1-3 論文架構...- 4 -
二、相關研究工作...- 5 -
2-1 資料探勘...- 5 -
2-2 分類技術相關研究...- 6 -
2-2-1. 貝氏分類 (Bayesian Classification)...- 7 -
2-2-2. 決策樹（Decision Tree）...- 8 -
2-2-3. 餘弦相似度（Cosine Similarity）...- 9 -
2-2-4. 模糊理論 (Fuzzy Theory)...- 10 -
2-2-5. 支持向量機 (Support Vector Machines, SVM)...- 11 -
2-3 相關電子郵件之分類...- 12 -
2-4 評估方法...- 12 -
三、系統設計與方法 ...- 14 -
3-1 系統架構...- 14 -
3-2 問題定義...- 15 -
3-3 系統分類方法...- 16 -
3-3-1. 郵件收件者相似度...- 17 -
3-3-2. 郵件主旨相似度...- 20 -
3-3-3. 郵件分類...- 24 -
3-4 系統漸進式策略...- 25 -
3-4-1. 郵件收件者部份...- 26 -
3-4-2. 郵件主旨部份...- 29 -
3-4-3. 不能分類及收件者和主旨部份判別不一致之郵件...- 31 -
四、實驗步驟與方法 ...- 33 -
4-1 資料收集與前置處理...- 33 -
4-1-1. 資料來源...- 33 -
4-1-2. 郵件表頭資料之萃取...- 33 -
4-1-3. 公務及私人郵件群組之建立...- 34 -
4-2 實驗相關設定及說明...- 35 -
4-2-1. 系統分類...- 35 -
4-2-2. 系統漸進式策略...- 35 -
4-3 實驗效能評估方法...- 35 -
4-3-1. 召回率 (Recall Rate)...- 36 -
4-3-2. 精密率 (Precision Rate)...- 36 -
4-3-3. 準確率 (Accuracy)...- 36 -
4-3-4. 假陽性率 (False Positive Rate)...- 37 -
4-3-5. F-score...- 37 -
五、實驗結果與分析 ...- 38 -
5-1 不能分類的郵件分析...- 39 -
5-2 準確率(Accuracy)分析...- 40 -
5-3 召回率(Recall Rate)分析...- 42 -
5-4 精密率(Precision Rate)分析...- 43 -
5-5 假陽性率(False Positive Rate)分析...- 44 -
5-6 F-score分析...- 45 -
5-7 執行時間分析...- 46 -
六、結論 ...- 47 -
七、參考文獻...- 48 -

參考文獻

[1] El-Sayed M. El-Alfy and Fares S. Al-Qunaieer, “A Fuzzy Similarity Approach for Automated Spam Filtering”, IEEE/ACS International Conference on Computer Systems and Application, pp.544-550, 2008.
[2] L. H. Gomes, et al.,“Improving Spam Detection Based on Structural Similarity”, In USENIX Workshop on SRUTI, pp.85-91, 2005.
[3] AMA Press Room, “2007 Electronic Monitoring & Surveillance Survey,” American Management Association and The ePolicy Institute, Feb. 2008.
[4] Frawley, W. J., S. G. Paitetsky and C. J. Matheus, “Knowledge Discovery in Databases: An Overview,” Communications of the ACM, Vol. 39, pp.1-34, 1996.
[5] Grupe, F. H. and M. M. Owrang, “Data mining discovering new knowledge and cooperative advantage,” Information Systems Management,12(4), pp. 26-31,1995.
[6] Fayyad, U., G. P. Shapiro and P. Smyth, “From Data Mining to Knowledge Discovery in Database”, AI Magazine, Vol. 17, pp.37-54, 1996.
[7] Michael J. A. Berry, Gordon S. Linoff, “Data Mining Techniques: for marketing, sales, and customer support”, Superpoll.net, Inc Published by arrangment with Weikeg Publishing Co., Ltd., 1997
[8] P. Domingos and M. Pazzani, “Beyond independence: Conditions for the optimality of the simple bayesian classifier, ”in 13th International Conference on Machine Learning(ICML’’96), pp. 105-112, 1996.
[9] P. Taninpong and S. Ngamsuriyaroj, “Incremental Naive Bayesian Spam Mail Filtering and Variant Incremental Training”, Eight IEEE/ACIS International Conference on Computer and Information Science, pp. 383-387, 2009.
[10] R. Kothari and M. Dong, “Decision Trees for Classification: A Review and Some New Results”, World Scientific, 2000.
[11] C. Apte, F. Damerau, and S.M. Weiss, “Automated Learning of Decision Rules for Text Categorization”, in ACM Transactions on Information Systems, 1994.
[12] C. Apte, F. Damerau, and S.M. Weiss, “Text Mining with Decision Trees and Decision Rules”, in Conference on Automated Learning and Discovery, Carnegie-Mellon University, June 1998.
[13] C. Haruechaiyasak, S. Mei-Ling and C. Shu-Ching, “Web Document Classification Based on Fuzzy Association”, 26th Annual International Computer Software and Applications Conference, pp.487-492, 2002
[14] C-Y Tseng and M-S Chen, “Incremental SVM Model for Spam Detection on Dynamic Email Social Networks”, International Conference Computational Science and Engineering, pp.128-135, 2009.
[15] M. N. Marsono, et al., ”Prioritized E-mail Servicing to Reduce Non-Spam Delay and Loss: A Performance Analysis”, International Journal of Network Management, pp. 323-342, 2008.
[16] C-Y Tseng, J-W Huang, and M-S Chen, “ProMail: Using progressive email social network for spam detection,” Proceedings of the Pan-Asia Conference on Knowledge Discovery and Data Mining, pp. 833-840, 2007.
[17] M. Aery and S. Chakravarthy, “eMailSift: Mining-based Approaches To Email Classification”, SIGIR’04, pp. 580-581, 2004.
[18] A. Dasgupta, et al., ”Enhanced Email Spam Filtering through Combining Similarity Graphs”, WSDM’11, pp.785-794, 2011.

指導教授

蔡孟峰(Meng-Feng Tsai)

審核日期

2012-8-7

推文