基於抽象語法樹和編碼的程式碼抄襲檢測器之實作與方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.133.140.122

姓名

吳尉誠(Wei-Cheng Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於抽象語法樹和編碼的程式碼抄襲檢測器之實作與方法
(The Implementation and Method of a Code Plagiarism Checker based on Abstract Syntax Tree and Encoding)

相關論文

★ 條件判斷式事件驅動程式設計之C語言擴充	★ 基于小波变换的指纹活度检测,具有聚集 LPQ 和 LBP 特征
★ 應用自動化測試於異質環境機器學習管道之 MLOps 系統	★ 設計具有可視化思維工具和程式作為單一步的輔助學習程式之棋盤式遊戲
★ TOCTOU 漏洞的靜態分析與實作	★ 用於繪製風力發電控制邏輯之特定領域語言
★ 在Java程式語言中以雙向結構表達數學公式間關聯之設計與實作	★ 支援模組化規則製作之程式碼轉換工具
★ 基於替代語意的 pandas DataFrame 靜態型別檢查器	★ 自動化時間複雜度分析的設計與實作–從軟體層面評估嵌入式系統的功率消耗
★ 以震波層析成像為應用之特定領域語言實作與分析	★ 用特徵選擇減少疲勞偵測腦電圖通道數
★ 一個應用紙本運算與數位化於程式設計學習使程序性思維可視化的機制	★ 基於抽象語法樹的陣列形狀錯誤偵測
★ 從合作學習角色分工獲得函式程式設計思維學習遞迴程式的機制	★ 基於抽象語法樹的深度複製及彈性別名之所有權系統解決 Java 表示暴露問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2031-8-1以後開放)

摘要(中)

程式碼抄襲檢測技術對於程式設計課程是相當重要的，現在的比對技術以屬性、結構以及混合為主。本研究使用基於抽象語法樹和編碼的比對方式。我們針對抽象語法樹的節點給予自定義編碼符號，以及針對每個程式碼區塊括號，例如：函式、迴圈等，給予不同的括號編碼符號，並針對各種程式抄襲行為進行條件處理，經過此條件處理，原始程式碼輸出的編碼格式可與抄襲程式碼輸出的編碼格式完全一致，便可以有效檢測出相似型態、行為相似以及位置順序調換問題。最後，透過本研究使用的演算法便可計算出相似度數值，使用者可透過此相似度數值來評斷兩方程式之間的抄襲可能性。只要知道某抄襲行為與程式碼之間的對應條件，透過本研究的方法便可以檢測出該程式碼抄襲行為。本研究將這個工具稱為PASTE (Plagiarism checker by Abstract Syntax Tree and Encoding)。

摘要(英)

The code plagiarism detection technology is very important for programming assignments. And, the current matching technology is mainly based on attribute, structure, and hybrid. In this paper, we encode the nodes of the abstract syntax tree. We define customized encoding symbols for the nodes of the abstract syntax tree. And, we define different bracket encoding symbols for each code block, such as functions, loops, etc., In addition, we use conditional encoding for various code plagiarism behaviors. After conditional encoding, the encoding format from the source code can be exactly the same as the encoding format from the plagiarism code. With this method, we can effectively detect similar types, similar behaviors, and position order exchange problems. Finally, by using the algorithm we proposed in this paper, we can calculate the similarity value, and users can judge the possibility of plagiarism from this similarity value. As long as we know the relationship between a certain plagiarism behavior and the source code, the code plagiarism behavior can be detected by our method. We name this tool PASTE (Plagiarism checker by Abstract Syntax Tree and Encoding).

關鍵字(中)

★ 抄襲偵測
★ 抽象語法樹
★ 程式比對

關鍵字(英)

★ plagiarism detection
★ abstract syntax tree
★ code comparison

論文目次

摘要v
Abstract vi
誌謝vii
目錄viii
圖目錄xi
表目錄xii
一、緒論1
1.1 程式碼抄襲行為......................................................... 1
1.2 早期程式碼抄襲檢測方法............................................. 2
1.3 現有程式碼抄襲檢測方法............................................. 3
1.4 字串相似度演算法...................................................... 8
二、動機12
2.1 範例........................................................................ 12
2.2 現有工具的分析......................................................... 14
2.2.1 MOSS ............................................................. 14
2.2.2 JPlag .............................................................. 14
2.2.3 SPDS.............................................................. 15
2.3 問題總結.................................................................. 15
viii
目錄
三、方法16
3.1 資料前處理............................................................... 18
3.1.1 抽象語法樹...................................................... 18
3.1.2 編碼............................................................... 19
3.2 相似度計算............................................................... 23
3.2.1 最長共同子字串................................................ 24
3.2.2 演算法............................................................ 24
四、實作28
4.1 特定節點條件編碼...................................................... 28
4.1.1 相似型態檢測................................................... 28
4.1.2 行為相似檢測................................................... 29
4.1.3 if-else 順序調換檢測........................................... 30
4.2 實作限制.................................................................. 32
五、評估33
5.1 多項抄襲偽裝實驗測試結果.......................................... 33
5.2 分析與解釋............................................................... 36
5.2.1 改寫變數型態................................................... 36
5.2.2 改寫變數名稱................................................... 36
5.2.3 程式碼位置調換................................................ 37
5.2.4 改寫迴圈類型................................................... 37
5.2.5 改寫函數型態................................................... 38
5.2.6 改寫函數名稱................................................... 38
5.2.7 刪除原始註解................................................... 39
5.2.8 插入註解......................................................... 40
5.2.9 插入空白行...................................................... 40
5.2.10 插入變數......................................................... 41
ix
目錄
5.2.11 插入非相關程式碼............................................. 42
5.3 False Positive 實驗與結果............................................. 43
5.4 評估總結.................................................................. 50
六、相關研究51
6.1 CCFinder: a multilinguistic token-based code clone detection
system for large scale source code .................................... 51
6.2 Plagiarism in Programming Assignments.......................... 52
七、未來展望53
八、總結54
參考文獻55

參考文獻

[1] A. Parker and J. O. Hamblen, “Computer algorithms for plagiarism detection,” IEEE
Transactions on Education, vol. 32, no. 2, pp. 94–99, 1989.
[2] G. Whale, “Identification of program similarity in large populations,” The Computer
Journal, vol. 33, no. 2, pp. 140–146, 1990.
[3] P. Clough, “Plagiarism in natural and programming languages: an overview of
current tools and technologies,” Department of Computer Science, University of
Sheffield, 2000.
[4] O. Karnalim, W. Chivers, et al., “Similarity detection techniques for academic source
code plagiarism and collusion: A review,” in 2019 IEEE International Conference
on Engineering, Technology and Education (TALE), pp. 1–8, IEEE, 2019.
[5] P. Sallis, A. Aakjaer, and S. MacDonell, “Software forensics: old methods for a
new science,” in Proceedings 1996 International Conference Software Engineering:
Education and Practice, pp. 481–485, IEEE, 1996.
[6] S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: local algorithms for
document fingerprinting,” in Proceedings of the 2003 ACM SIGMOD international
conference on Management of data, pp. 76–85, 2003.
[7] L. Prechelt, G. Malpohl, M. Philippsen, et al., “Finding plagiarisms among a set of
programs with jplag,” J. UCS, vol. 8, no. 11, p. 1016, 2002.
[8] J. Feng, B. Cui, and K. Xia, “A code comparison algorithm based on ast for plagiarism
detection,” in 2013 Fourth International Conference on Emerging Intelligent
Data and Web Technologies, pp. 393–397, IEEE, 2013.
[9] W. Yang, “Identifying syntactic differences between two programs,” Software: Practice
and Experience, vol. 21, no. 7, pp. 739–755, 1991.
[10] F. Tip, “A survey of program slicing techniques,” 1994.
[11] C. Liu, C. Chen, J. Han, and P. S. Yu, “Gplag: detection of software plagiarism
by program dependence graph analysis,” in Proceedings of the 12th ACM SIGKDD
international conference on Knowledge discovery and data mining, pp. 872–881, 2006.
[12] 許聖泉, “程式碼動態結構抄襲鑑定,” Master’s thesis, 國立臺北科技大學資訊工程
系研究所, 2017. https://hdl.handle.net/11296/b3ksvu.
[13] R. Gao, L. Hu, W. E. Wong, H.-L. Lu, and S.-K. Huang, “Effective test generation
for combinatorial decision coverage,” in 2016 IEEE International Conference on
Software Quality, Reliability and Security Companion (QRS-C), pp. 47–54, IEEE,
2016.
[14] R. Santelices and M. J. Harrold, “Efficiently monitoring data-flow test coverage,” in
Proceedings of the twenty-second IEEE/ACM international conference on Automated
software engineering, pp. 343–352, 2007.
[15] M. J. Wise, “String similarity via greedy string tiling and running karp-rabin matching,”
Online Preprint, Dec, vol. 119, no. 1, pp. 1–17, 1993.
[16] M. J. Wise, “Detection of similarities in student programs: Yap’ing may be preferable
to plague’ing,” Acm Sigcse Bulletin, vol. 24, no. 1, pp. 268–271, 1992.
[17] A. T. Wibowo, K. W. Sudarmadi, and A. M. Barmawi, “Comparison between fingerprint
and winnowing algorithm to detect plagiarism fraud on bahasa indonesia
documents,” in 2013 International Conference of Information and Communication
Technology (ICoICT), pp. 128–133, IEEE, 2013.
[18] A. Jadalla and A. Elnagar, “A fingerprinting-based plagiarism detection system for
arabic text-based documents,” in 2012 8th International Conference on Computing
Technology and Information Management (NCM and ICNIT), vol. 1, pp. 477–482,
IEEE, 2012.
[19] N. Heintze et al., “Scalable document fingerprinting,” in 1996 USENIX workshop on
electronic commerce, vol. 3, 1996.
[20] U. Manber et al., “Finding similar files in a large file system.,” in Usenix Winter,
vol. 94, pp. 1–10, 1994.
[21] T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multilinguistic token-based
code clone detection system for large scale source code,” IEEE Transactions on
Software Engineering, vol. 28, no. 7, pp. 654–670, 2002.
[22] M. Joy and M. Luck, “Plagiarism in programming assignments,” IEEE Transactions
on education, vol. 42, no. 2, pp. 129–133, 1999.
[23] R. Sedgewick and K. Wayne, Algorithms. 4th Edition, Pearson, 2011.
[24] W. B. Croft, D. Metzler, and T. Strohman, Search engines: Information retrieval in
practice, vol. 520. Addison-Wesley Reading, 2010.

指導教授

莊永裕(Yung-Yu Zhuang)

審核日期

2021-7-21

推文