博碩士論文 964201054 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:75 、訪客IP:18.219.230.43
姓名 廖盈傑(Ying-Jie Liao)  查詢紙本館藏   畢業系所 企業管理學系
論文名稱 高效率e-mail作者驗證演算法之研究
(An Efficient Algorithm For e-mail Authorship Verification)
相關論文
★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究★ 太陽能光電產業經營績效評估-應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究★ 資料視覺化圖表與議題之關聯
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 現今e-mail為人們傳遞訊息的主要媒介,但e-mail帶來便利的同時,也衍生了許多安全性的問題,如盜用帳號、竊取資料等網路犯罪事件層出不窮。因此迫切需要能有效鑑定e-mail來源是否可靠的方法。
作者鑑定(Authorship Identification)為根據文章的寫作風格特徵(style features),而提供最有可能的作者之方法。應用於e-mail上,則可藉由判別可疑e-mail之寫作風格特徵,得知是否出自於原作者。但目前針對e-mail作者鑑定的研究並不多,所提出的方法皆有效率低落的缺點,甚至有些方法只能適用於特定情境之下,因而有適用性差的缺點。
本研究所提出的UserProtector 演算法,為同時兼顧高效率和高適用性的方法。由於其他研究所的方法皆須對整封e-mail進行掃描,而UserProtector只需對信件標題掃描即可,因此具有效率高的優點。另外,因採用Character n-grams作衍伸的方法萃取風格特徵,因此各種情境下皆能有效萃取風格特徵,因此具有適用性高的優點。
摘要(英) Nowadays, people use E-mail as the main media to transfer messages. However, while e-mail is convenience for people, it also brings out many problems of security. Internet crimes like account usurping, data stealing are getting worse. Therefore, a method to identify e-mail sources efficiently is urgently necessary.
Authorship Identification can base on the style features of articles to provide the most possible writers. It can be used to identify the original writer by judging dubious style features of an e-mail. But, there aren’t many researches that focus on identifying e-mail writers right now. They all have a chief defect of low efficiency. Moreover, some of them can be only used in specific circumstances. Hence, a defect of low suitability appears as well.
To take both high efficiency and suitability into consideration, this research provides an algorithm: UserProtector. Duo to other methods need to scan all content of one e-mail, UserProtector only scan the e-mail subject. Consequently, it has an advantage of high efficiency. Further, by evolving Character n-grams to extract style features, every kind of circumstances can be extracted style features efficiently. For this reason, it has an advantage of high suitability.
關鍵字(中) ★ 電子郵件
★ 資料探勘
★ 作者鑑定
★ n-grams
關鍵字(英) ★ e-mail
★ data mining
★ Authorship Identification
★ n-grams
論文目次 目錄
頁次
中文摘要 …………………………………………………………………… Ⅰ
英文摘要 …………………………………………………………………… Ⅱ
目錄 …………………………………………………………………… Ⅲ
圖目錄 …………………………………………………………………… Ⅵ
表目錄 …………………………………………………………………….Ⅶ
一、 緒論.…………………………………………………….………………..1
1.1 研究動機 …………………………………………..……………… ...1
1.2 研究目的 ……………………………………………………………. 2
1.3 論文架構 ……………………………………………………………. 4
二、文獻探討.....…..….……………………………………………………….. 6
2.1 作者鑑識(Authorship Identification) ………………………………. . 6
2.2 Character n-grams …………………………………………………… 8
2.3 e-mail作者鑑識 …………………………………………………… 10
2.4 結語 ………………………………………………………………... 13
三、演算法 ..…..…..………………………………………………………… 16
3.1 UserProtector 演算法 …………………………………………….. 16
3.1.1 訓練方法(Training method) ……………………………………. .16
3.1.2 即時騙局偵測方法(Real time fraud detection method) ………. 26
3.2 多重n-grams之萃取 (第1~6行) ……………………………….. 29
3.3 過濾共通慣用語 (第15~34行) …………………………………. 30
3.4 取得風格特徵集(style features set) 〖SF〗_i (第7~14行) …………… 31
3.5 決定各n-grams之權重 (第35~44行) ……………………….…. 32
3.6 取得門檻值 (第45~57行) …………………………………….… 33
3.7 即時騙局偵測方法 (第58~69行) …………………………….… 37
四、實證分析 .…….….……………………………………………………. 38
4.1 實驗設計 .………………………………………………………….38
4.1.1 實驗一 ………………………………………………………….38
4.1.2 實驗二 ………………………………………………………….43
4.2 實驗結果與分析 …………………………………………………. 44
4.2.1 實驗一 ………………………………………………………….45
4.2.1.1鑑定原作者之準確率(1-α )………………………………….45
4.2.1.2鑑定可疑信件之準確率(1- ) ………………………………48
4.2.1.3 α、 值之關係………………………………………………51
4.2.2 實驗二 ………………………………………………………… 59
五、結論與未來研究建議 ………………………………………………….67
5.1 結論 ………………………………………………………………. 67
5.2 未來研究建議 ……………………………………………………. 68
參考文獻 …………………………………………………………………… 69
參考文獻 [1] The Economist print edition , : http://www.economist.com/displaystory.cfm?story_id=13416219 , The Economist , Apr 2nd 2009
[2] CNN print edition, : http://edition.cnn.com/2009/TECH/03/30/ghostnet.cyber.espionage/index.html,CNN, March 30, 2009
[3] Kjell, B., Addison Woods, W, Frieder O.: Discrimination of authorship using visualization. Information Processing and Management 30:1 (1994).
[4] Keselj, V., Peng, F., Cercone, N. Thomas, C.: N-gram-based Author Profiles for Authorship Attribution. In Proc. of the Conference Pacific Association for Computational Linguistics (2003).
[5] F Iqbal, R Hadjidj, BCM Fung, M Debbabi, : A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation, 2008
[6] Jianbin Ma Ying Li Guifa Teng Fang Wang Yang Zhao.: Sequential Pattern Mining for Chinese E-mail Authorship Identification. : Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
[7] B Allison, L Guthrie,: Authorship Attribution of E-Mail Comparing Classifiers Over a New Corpus for Evaluation. Proceedings of LREC, 2008
[8] Gui-Fa Teng Mao-Sheng Lai Jian-Bin Ma Ying Li .: E-MAIL AUTHORSHIP MINING BASED ON SVM FOR COMPUTER. Machine Learning and Cybernetics, 2004
[9] O De Vel, A Anderson, M Corney, G Mohay.: Mining Email Content for Author Identification Forensics. ACM Sigmod Record, 2001
[10] K Calix, M Connors, D Levy, H Manzar, G McCabe, S.: Stylometry for E-mail Author Identification and Authentication. CSIS Research Day, Pace Univ, 2008
[11] Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics 26:4 (2000) 471-495.
[12] Peng, F., Shuurmans, F., Keselj, V.,: Wang, S.: Language Independent Authorship Attribution Using Character Level Language Models. In Proc. of the 10th European Association for Computational Linguistics (2003).
[13] de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining E-mail Content for Author Identification Forensics. SIGMOD Record, 30:4 (2001) 55-64.
[14] Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems, 20:5 (2005) 67-75.
[15] van Halteren, H.: Linguistic Profiling for Author Recognition and Verification. In Proc. Of the 42nd Annual Meeting of the Association for Computational Linguistics (2004) 199-206.
[16] Chaski, C.: Empirical Evaluations of Language-based Author Identification Techniques.Forensic Linguistics, 8:1 (2001) 1-65.
[17]De Vel O. Mining e-mail authorship. Paper presented at the workshop on text mining. In: ACM international conference on knowledge discovery and data mining (KDD); 2000.
[18] Abbasi A, Chen H. Writeprints: a stylometric approach to identitylevel identification and similarity detection in cyberspace.ACM Transactions on Information Systems March 2008;26(2).
[19] C. Apte, F. Damerau, and S. Weiss.:Text mining with decision rules and decision trees". In Workshop on Learning from text and the Web, Conference on Automated Learning and Discovery, 1998.
[20] H. Ng, W. Goh, and K. Low. Feature selection, perceptron learning, and a usability case study for text categorization". In Proc. 20th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR97), pages 67{73, 1997.
[21] T. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
[22] Y. Yang and X. Liu. A re-examination of text categorisation methods". In Proc. 22nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR99), pages 67{73, 1999.
[23] T. Joachims. Text categorization with support vector machines: Learning with many relevant features". In Proc. European Conf. Machine Learning (ECML'98), pages 137{142, 1998.
[24] Holmes, D.: The Evolution of Stylometry in Humanities Scholarship. Literary and Linguistic Computing, 13:3 (1998) 111-117.
[25] Burrows, J.F.:Word Patterns and Story Shapes: The Statistical Analysis of Narrative Style. Literary and Linguistic Computing, 2: 61-70. 1987.
[26] Keselj, V., Peng, F., Cercone, N. Thomas, C.: N-gram-based Author Profiles for Authorship Attribution. In Proc. of the Conference Pacific Association for Computational Linguistics (2003).
[27] Ali ,C. Tunga ,G.: Time-efficient spam e-mail filtering using n-gram models , 2007.
[28] Abou-Assaleh, T. Cercone, N. Keselj, V. Sweidan, R. : N-gram-based Detection of New Malicious Code, : Computer Software and Applications Conference, 2004. COMPSAC 2004. Proceedings of the 28th Annual International
[29] Yamamoto, H. Sagisaka, Y.:MULTI-CLASS COMPOSITE N-GRAM BASED ON CONNECTION DIRECTION, Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on Publication Date: 15-19 Mar 1999
[30] De Vel O, Anderson A, Corney M, Mohay G. Mining e-mail content For author identification forensics. SIGMOD Record 2001a; 30(4):55–64.
[31] I Rigoutsos, T Huynh,: Chung-Kwei: a Pattern-discovery-based System for theAutomatic Identification of Unsolicited E-mail Messages (SPAM),2004
[32] 林惠聆,陳正倉,:統計學原理(二版),雙葉書局,2001
指導教授 許秉瑜(Ping-Yu Hsu) 審核日期 2009-7-9
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明