博碩士論文 975302016 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:4 、訪客IP:3.144.202.167
姓名 鄭協龍(Sie-Long Jheng)  查詢紙本館藏   畢業系所 資訊工程學系在職專班
論文名稱 基於社群網路特徵之企業電子郵件分類
(Enterprise Email Classification Based on Social Network Features)
相關論文
★ 應用自組織映射圖網路及倒傳遞網路於探勘通信資料庫之潛在用戶★ 行動網路用戶時序行為分析
★ 社群網路中多階層影響力傳播探勘之研究★ 以點對點技術為基礎之整合性資訊管理 及分析系統
★ 在分散式雲端平台上對不同巨量天文應用之資料區域性適用策略研究★ 應用資料倉儲技術探索點對點網路環境知識之研究
★ 從交易資料庫中以自我推導方式探勘具有多層次FP-tree★ 建構儲存體容量被動遷徙政策於生命週期管理系統之研究
★ 應用服務探勘於發現複合服務之研究★ 利用權重字尾樹中頻繁事件序改善入侵偵測系統
★ 有效率的處理在資料倉儲上連續的聚合查詢★ 入侵偵測系統:使用以函數為基礎的系統呼叫序列
★ 有效率的在資料方體上進行多維度及多層次的關聯規則探勘★ 在網路學習上的社群關聯及權重之課程建議
★ 在社群網路服務中找出不活躍的使用者★ 利用階層式權重字尾樹找出在天文觀測紀錄中變化相似的序列
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 隨著多媒體和網絡技術的普及,現在電子郵件往往附加高容量的的多媒體資料。不過,提供大量夾帶多媒體內容的電子郵件通過企業電子郵件系統可以很容易搞垮整個網絡的服務品質。此外,由於沒有某種形式的限制,許多企業發現,網絡資源被使用在個人利益上。在電子郵件通信業務因此受到不良的延誤和造成企業的損害。員工競相使用電子郵件服務處理個人私務,因此成為一個問題,許多企業不得不處理。顯然,企業應該管理電子郵件服務,使屬於企業的電子郵件具有優先於個人用途。管理外寄電子郵件將企業電子郵件作官方和私人電子郵件的分類,需要一個有效的方法,發展這個方法便成為本研究的目的。為了達到分類方法所需的準確度,本研究盡可能的多方研究評估相關的方法以及資訊。另一方面,監控電子郵件的內容細節,不僅會降低郵件寄送的效能,也可能侵犯在法律規範體系保護下的隱私權。因此追求在準確的分類和保護隱私的權利的平衡變成為一個挑戰。隨著討論與挑戰,本研究建立了一個電子郵件的分類方法,此方法是根據社群特徵,而不是根據電子郵件內容。在本研究所知中,本文是第一個解決上述問題的研究。本研究從電子郵件得到的社群特徵,並將其轉換為向量輸入向量的支持向量機(SVM)分類器。初步結果證明,本研究的方法具有高度的準確性。相對於其他基於郵件內容的電子郵件分類器,本研究的研究證明,在解決類似的問題上,探索社群特徵是一種很有前途的優先方向。
摘要(英) With the popularity of multimedia and network technologies, it is now often to attach large size of multimedia dataset to emails. However, delivering large volume of multimedia data over an enterprise email system can easily bring down the quality of overall network service. Moreover, without some sort of restrictions, many enterprises found that the network resource was occupied for personal interests. The business communication over emails thus suffers undesirable delays and cause damages to businesses. The competition to use email service therefore become an issue that many enterprises have to deal with. Obviously, enterprises should manage the email service so that business emails have the priority over personal usages. This management requires an effective methodology to classify enterprise emails into official and private emails, and the development of the method is the goal of this work. To achieve the accuracy of a desired classification methodology, we normally anticipated the developed method to survey as much information as possible. On the other hand, monitoring details of the email contents not only can decrease the performance of the method, but it also may violate the privacy rights that many legal regulation systems now protected. The balance of pursuing accurate classification and protecting privacy rights becomes a challenge for this problem. With the discussed challenges in mind, we develop an email classification method based on social features, rather than surveying the email contents. To the best of our knowledge, this paper is the first study to address the aforementioned problems. We obtain social features from emails to represent the input vector of support vector machine (SVM) classifier. Preliminary results show that our methodology can classify emails with a high accuracy. Compared with the other content-based feature of email, our work shows that exploring social features is a promising direction to solve similar prioritizing problems.
關鍵字(中) ★ 企業電子郵件分類
★ 社群網路
★ 機器學習
關鍵字(英) ★ Social Network
★ Enterprise Email Classification
★ Machine Learning
論文目次 摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 viii
表目錄 ix
一、 緒論 - 1 -
1-1 研究背景 - 1 -
1-2 研究動機與目的 - 3 -
1-3 研究方法 - 3 -
1-4 論文架構 - 4 -
二、 相關研究工作 - 5 -
2-1 資料探勘方法 - 5 -
2-2 分類技術相關研究 - 5 -
2-2-1. TF-IDF - 6 -
2-2-2. 餘弦相似度(cosine similarity) - 8 -
2-2-3. 決策樹(Decision Tree) - 8 -
2-2-4. 支持向量機(Support Vector Machines, SVM) - 9 -
2-2-5. K-最鄰近分類法 (K-nearest Neighbor Classifier, K-NN) - 10 -
2-2-6. 樸素貝氏分類 (Naïve Bayesian Classification) - 11 -
2-3 資料探勘在電子郵件分類上的應用 - 12 -
2-4 評估方法 - 13 -
2-4-1 K摺交互驗證法(k-fold cross-validation) - 13 -
2-4-2 評估方法(Evaluation Metric) - 13 -
三、 系統設計 - 16 -
3-1 系統架構 - 16 -
3-2 問題定義 - 17 -
3-3 特徵 - 19 -
3-3-1 社群基礎特徵(Social-based Feature) - 19 -
3-3-2 內容基礎特徵(Content-based Feature) - 22 -
四、 實驗步驟與方法 - 23 -
4-1 資料收集與前置處理 - 23 -
4-1-1. 關鍵字詞庫建立 - 24 -
4-1-2. 官方/私人社群建立 - 25 -
4-2 分類法 - 26 -
4-3 特徵群組 - 26 -
4-4 實驗效能評估 - 27 -
4-4-1. 召回率(Recall Rate) - 27 -
4-4-2. 精密率(Precision Rate) - 28 -
4-4-3. 正確率(Accuracy Rate) - 28 -
4-4-4. 假陽性率(False Positive Rate) - 28 -
4-4-5. F-score - 29 -
五、 實驗結果 - 30 -
5-1. SVM已過濾資料之實驗成果 - 31 -
5-2. 已過濾資料之系統分類效能分析 - 32 -
5-2-1. 準確率分析 - 32 -
5-2-2. 假陽性(FP)分析 - 34 -
5-2-3. 精密率(Precision Rate)分析 - 36 -
5-2-4. F-measure分析 - 37 -
5-3. SVM未過濾資料之實驗成果 - 38 -
5-4. 未過濾資料之系統分類效能分析 - 39 -
5-4-1. 準確率(Accuracy)分析 - 39 -
5-4-2. 假陽性(FP)分析 - 41 -
5-4-3. 精密率(Precision Rate)分析 - 42 -
5-4-4. F-measure分析 - 44 -
六、 結論 - 45 -
七、 參考文獻 - 46 -
參考文獻 [1] T. J. Coan and A. M. Ostrander, “E-Mail Life Cycle Management: Keeping Watch Over the Rising Tide,” International Legal Technology Association, White Paper, Jul. 2009.
[2] J. Vandermeer, “Seven Highly Successful Habits of Enterprise Email Managers: Ensuring that your employees' email usage is not putting your company at risk,” Information Systems Security, 2006, pp.64-75.
[3] J. P. Kesan, “Cyber-Working or Cyber-Shirking?: A First Principles Examination on Electronic Privacy in the Workspace,” Florida Law Review, vol. 54, 2002, pp. 289-332.
[4] AMA Press Room, “2007 Electronic Monitoring & Surveillance Survey,” American Management Association and The ePolicy Institute, Feb. 2008.
[5] J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “Email as spectroscopy: Automated discovery of community structure within organizations, “ The Information Society, 21, pp. 133-141, 2005.
[6] K.. Yelupula, and S. Ramaswamy, “Social network analysis for email classification,” Proceedings of the 46th Annual Southeast Regional Conference, pp. 469-474, 2008.
[7] J. Shetty, and J. Adibi, The Enron Dataset Database Schema and Brief Statistical Report. http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf.
[8] R. D. Twining, M. M. Williamson, and M. Mowbray, “Email prioritization: Reducing delays on legitimate mail caused by junk mail,” Proceedings of the USENIX Annual Technical Conference, pp. 45-58, 2004.
[9] P-A Chirita, J. Diederich, and W. Nejdl, “MailRank: Using ranking for spam detection,” Proceedings of the ACM CIKM Conference on Information and Knowledge Management, pp. 373-380, 2005.
[10] C-Y Tseng, J-W Huang, and M-S Chen, “ProMail: Using progressive email social network for spam detection,” Proceedings of the Pan-Asia Conference on Knowledge Discovery and Data Mining, pp. 833-840, 2007.
[11] C-Y Tseng, and M-S Chen, “Incremental SVM model for spam detection on dynamic email social networks,” Proceedings of the IEEE International Conference on Computational Science and Engineering, 2009.
[12] C. Neustaedter, A. J. Brush, M. Smith, and D. Fisher, “The social network and relationship finder: Social sorting for email triage,” Proceedings of the Second Conference on Email and Anti-Spam, 2005.
[13] C. Neustaedter, A. J. Brush, and M. Smith,”Beyond “From” and “Received”: Exploring the dynamics of email triage,” Proceedings of the 2005 Conference on Human Factors in Computing Systems, pp. 1977-1980, 2005.
[14] D. Fisher, A. J. Brush, B. Hogan, M. Smith, and A. Jacobs, “Using social metadata in email triage: Lessons from the field,” Proceedings of the 2007 Conference on Human-Computer Interaction, pp. 13-22, 2007.
[15] S. Yoo, Y. Yang, F. Lin, and I-C Moon, “Mining social networks for personalized email prioritization,” Proceedings of the 15th Conference on Knowledge Discovery and Data Mining, pp. 967-976, 2009.
[16] W-Y Ma, K-J Chen, “Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff,” Proceedings of the second SIGHAN workshop on Chinese language processing, pp. 168-171, 2003.
[17] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explor. Newsl.,11, pp. 10-18, 2009.
[18] Y. EL-Manzalawy and V. Honavar, “Integrating Libsvm into Weka environment,” 2005. Software available at: http://www.cs.iastate.edu/~yasser/wlsvm.
[19] Cross-validation (statistics), http://en.wikipedia.org/wiki/Cross-validation_(statistics).
[20] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Elsevier, San Francisco, 2006.
[21] 林昕潔, 「以SVM與詮釋資料設計書籍分類系統」,國立交通大學資訊科學與工程研究所,碩士論文,2006。
[22] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, Inc. New York, NY, USA, 1986.
[23] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manage., vol. 24, pp. 513-523, 1988.
[24] G. Salton, E. A. Fox and H. Wu, “Extended Boolean Information Retrieval,” 1982
[25] V. Vapnik, “The Nature of Statistical Learning Theory. 1995,” NY Springer
[26] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. Wiley New York, 1973.
[27] M.James, Classification algorithms. John Wiley & Sons, Inc. 1985.
[28] C. Apte, F. Damerau, and S.M. Weiss, “Text Mining with Decision Trees and Decision Rules”, in Conference on Automated Learning and Discovery, Carnegie-Mellon University, June 1998.
[29] C. Apte, F. Damerau, and S.M. Weiss, “Automated Learning of Decision Rules for Text Categorization”, in ACM Transactions on Information Systems, 1994.
[30] P. Domingos and M. Pazzani, “Beyond independence: Conditions for the optimality of the simple bayesian classifier, ”in 13th International Conference on Machine Learning(ICML'96), 1996, pp. 105-112.
[31] Frawley, W. J., S. G. Paitetsky and C. J. Matheus, “Knowledge Discovery in Databases: An Overview,” Communications of the ACM, Vol. 39, 1996, pp.1-34.
[32] Fayyad, U., G. P. Shapiro and P. Smyth, “From Data Mining to Knowledge Discovery in Database”, AI Magazine, Vol. 17, 1996, pp.37-54.
指導教授 蔡孟峰(Meng-Feng Tsai) 審核日期 2011-7-12
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明