基於抽象語法樹的陣列形狀錯誤偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：6

、訪客IP：13.58.121.131

姓名

高健文(Chien-Wen Kao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於抽象語法樹的陣列形狀錯誤偵測
(Array Shape Error Detection Based on Abstract Syntax Tree)

相關論文

★ 條件判斷式事件驅動程式設計之C語言擴充	★ 基于小波变换的指纹活度检测,具有聚集 LPQ 和 LBP 特征
★ 應用自動化測試於異質環境機器學習管道之 MLOps 系統	★ 設計具有可視化思維工具和程式作為單一步的輔助學習程式之棋盤式遊戲
★ TOCTOU 漏洞的靜態分析與實作	★ 用於繪製風力發電控制邏輯之特定領域語言
★ 在Java程式語言中以雙向結構表達數學公式間關聯之設計與實作	★ 支援模組化規則製作之程式碼轉換工具
★ 基於替代語意的 pandas DataFrame 靜態型別檢查器	★ 自動化時間複雜度分析的設計與實作–從軟體層面評估嵌入式系統的功率消耗
★ 以震波層析成像為應用之特定領域語言實作與分析	★ 用特徵選擇減少疲勞偵測腦電圖通道數
★ 一個應用紙本運算與數位化於程式設計學習使程序性思維可視化的機制	★ 從合作學習角色分工獲得函式程式設計思維學習遞迴程式的機制
★ 基於抽象語法樹的深度複製及彈性別名之所有權系統解決 Java 表示暴露問題	★ 基於 Python 型別提示檢查不可變性

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-8-1以後開放)

摘要(中)

隨著深度學習相關領域的熱門，程式碼中的陣列形狀錯誤也越來越
常見，陣列形狀錯誤是一種在進行陣列相關操作時由於形狀不匹配或是
存取了形狀外的位址所造成的錯誤，在Python 中，由於其程式語言的
特性，使的陣列形狀錯誤只能在程式碼執行時才檢查出來。
在本研究中，我們提出一種基於抽象樹分析的靜態分析器，並利用
抽象直譯器的概念僅針對程式碼中的陣列形狀資訊作處理以檢查形狀錯
誤。透過靜態檢查器不只可以更快的發現錯誤並修正而且相比於直接執
行程式碼也能更加節省資源。
在現有研究Pytropos 中對於無法單純由程式碼中得知形狀訊息的外
部資料集使用型別提示(Type Hint) 的方式讓檢查工具獲得足夠的資訊
來協助檢查，而我們提出另一種做法針對CSV 檔可以快速有效的取得外
部資料集形狀資訊，並在錯誤報告中提供詳細的陣列形狀資訊讓使用者
可以更方便的追蹤錯誤以利除錯，我們將此工具命名為ShapeChecker。

摘要(英)

With the popularity of deep learning related fields, array shape errors
in code are becoming more and more common. Array shape error is an error
caused by mismatching shapes or accessing addresses outside the shape
during array-related operations. In Python, due to the characteristics of
its programming language, the array shape error can only be detected when
the code is executed.
In this study, we propose a static analyzer based on abstract tree
analysis, and use the concept of abstract interpreter to process only the
array shape information in the code to check for shape errors. Not only
can bugs be found and fixed faster through static checkers, but they can
also save more resources than running the code directly.
In the existing research Pytropos, the type-hint method is used for
external data sets whose shape information cannot be obtained simply
from the code, so that the inspection tool can obtain enough information
to assist the inspection. And we propose another method to quickly
and effectively obtain the shape information of external data sets for CSV
files. And provide detailed array shape information in the error report to
make it easier for users to track errors for debugging. We named this tool
ShapeChecker.

關鍵字(中)

★ 抽象語法樹
★ 靜態分析
★ 形狀錯誤

關鍵字(英)

論文目次

摘要v
Abstract vi
誌謝 viii
目錄 ix
一、緒論 1
1.1 陣列形狀錯誤 1
1.2 深度學習與NumPy 1
1.3 動態型別與靜態型 2
1.4 靜態檢查與動態檢查 3
1.4.1 動態檢查 3
1.4.2 靜態檢查 4
1.5 抽象語法樹與抽象解釋 4
1.6 論文架構 7
二、動機
2.1 NumPy 中關於陣列的操作範例 8
2.2 現存研究的解決方案 11
三、提案 14
3.1 概觀 14
3.2 外部資料取得 15
3.2.1 CSV 檔 15
3.2.2 形狀計算處理程序 16
3.3 檢查器 17
3.3.1 陣列形狀計算 20
3.3.2 形狀表追蹤形狀變化 21
3.3.3 形狀錯誤檢查範例 22
3.4 作法限制 23
3.4.1 陣列內資料處理 23
3.4.2 函式處理 23
3.4.3 其他型別處理 24
四、實作25
4.1 實作架構 25
4.2 形狀表 25
4.3 AST.Call 節點處理 26
4.3.1 外部資料取得 26
4.3.2 陣列形狀計算 26
4.4 Slice() 函式形狀計算 31
4.5 實作延伸性討論 34
五、評估35
5.1 ShapeChecker 檢查使用範例 35
5.1.1 第一種形狀錯誤檢查 35
5.1.2 第二種形狀錯誤檢查 36
5.1.3 連續陣列操作形狀錯誤檢查 36
5.1.4 ShapeChecker 使用與否差異比較 37
5.2 函式功能面評估 38
5.3 與Pytropos 比較 39
5.4 讀取外部資料效能分析 42
5.4.1 與其他函式庫函式比較 42
5.4.2 處理程序策略比較 45
六、相關研究48
6.1 Python 上的靜態檢查工具 48
6.2 Python 上其他函式庫的形狀錯誤檢查 49
6.3 越界訪問 50
6.3.1 Valgrind 50
6.3.2 Purify 51
6.3.3 Index checker 52
6.4 其他語言的形狀錯誤檢查 53
七、總結 54
參考文獻 56
附錄A 測試程式碼 60
A.1 函式形狀計算測試程式碼 60
A.1.1 函式形狀錯誤測試程式碼 63

參考文獻

[1] C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020. doi: 10.1038/s41586-020-2649-2. [Online]. Available: https://doi.org/10.1038/s41586-020-2649-2.
[2] M. Abadi, A. Agarwal, P. Barham, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[3] A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32,
2019.
[4] J. Bergstra, F. Bastien, O. Breuleux, et al., “Theano: Deep learning on gpus with python,” in NIPS 2011, BigLearning Workshop, Granada, Spain, Citeseer, vol. 3,
2011, pp. 1–48.
[5] Y. Jia, E. Shelhamer, J. Donahue, et al., “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on
Multimedia, 2014, pp. 675–678.
[6] T. Chen, “Typesafe abstractions for tensor operations (short paper),” in Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, 2017, pp. 45–50.
[7] E. Meijer and P. Drayton, “Static typing where possible, dynamic typing when needed: The end of the cold war between programming languages,” Citeseer, 2004.
[8] L. Tratt and R. Wuyts, “Guest editors’ introduction: Dynamically typed languages,” IEEE software, vol. 24, no. 5, pp. 28–30, 2007.
[9] F. Ortin, J. B. G. Perez-Schofield, and J. M. Redondo, “Towards a static type checker for python,” in European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, vol. 15, 2015, pp. 1–2.
[10] J. Siek and W. Taha, “Gradual typing for objects,” in European Conference on Object-Oriented Programming, Springer, 2007, pp. 2–27.
[11] G. Bierman, M. Abadi, and M. Torgersen, “Understanding typescript,” in European Conference on Object-Oriented Programming, Springer, 2014, pp. 257–281.
[12] M. Kazerounian, N. Vazou, A. Bourgerie, J. S. Foster, and E. Torlak, “Refinement types for ruby,” in International Conference on Verification, Model Checking, and Abstract Interpretation, Springer, 2018, pp. 269–290.
[13] Mypy. [Online]. Available: https://mypy.readthedocs.io/en/stable/ (visited on 05/03/2022).
[14] Cpython-grammer. [Online]. Available: https://github.com/python/cpython/ blob/3.9/Grammar/python.gram/ (visited on 07/03/2022).
[15] P. Cousot and N. Halbwachs, “Automatic discovery of linear restraints among variables of a program,” in Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1978, pp. 84–96.
[16] P. Cousot, “Types as abstract interpretations,” in Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, 1997,pp. 316–331.
[17] Numpy. [Online]. Available: https://numpy.org/ (visited on 05/11/2022).
[18] E. A. Cruz Camacho, “Static analysis of python programs using abstract interpretation: An application to tensor shape analysis,” Ingenierı́a de Sistemas, 2019.
[19] Pep-484. [Online]. Available:
https://peps.python.org/pep-0484/ (visited on 05/03/2022).
[20] A. Fromherz, A. Ouadjaout, and A. Miné, “Static value analysis of python programs by abstract interpretation,” in NASA Formal Methods Symposium, Springer, 2018,pp. 185–202.
[21] Y. Zhuang and M.-Y. Lu, “Enabling type checking on columns in data frame libraries by abstract interpretation,” IEEE Access, vol. 10, pp. 14 418–14 428, 2022.
[22] J. Mitlöhner, S. Neumaier, J. Umbrich, and A. Polleres, “Characteristics of open data csv files,” in 2016 2nd International Conference on Open and Big Data (OBD),IEEE, 2016, pp. 72–79.
[23] 蔡欣倢and 莊永裕, “基於抽象語法樹剖析與網路爬蟲的字串參數錯誤偵測靜態分析檢查器,” The 17th Taiwan Conference on Software Engineering (第17 屆台灣軟體工程研討會, TCSE 2021), Jul. 2021.
[24] T. pandas development team, Pandas-dev/pandas: Pandas, version latest, Feb.2022. doi: 10.5281/zenodo.6408044. [Online]. Available: https://doi.org/10.5281/zenodo.6408044.
[25] W. McKinney, “Data Structures for Statistical Computing in Python,” in Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman, Eds.,2010, pp. 56–61. doi: 10.25080/Majora-92bf1922-00a.
[26] H. Gulabovska and Z. Porkoláb, “Survey on static analysis tools of python programs.,” in SQAMIA, 2019.
[27] Pep-8. [Online]. Available: https://peps.python.org/pep- 0008/ (visited on 06/28/2022).
[28] Bandit. [Online]. Available: https : / / github . com / PyCQA / bandit (visited on 06/28/2022).
[29] Black. [Online]. Available: https://github.com/psf/black (visited on 06/28/2022).
[30] Pylint. [Online]. Available: https://pylint.pycqa.org/en/latest/ (visited on 05/15/2022).
[31] Pyflakes. [Online]. Available: https : / / launchpad . net / pyflakes (visited on 06/28/2022).
[32] Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, “An empirical study on tensorflow program bugs,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018, pp. 129–140.
[33] J. Dolby, A. Shinnar, A. Allain, and J. Reinen, “Ariadne: Analysis for machine learning programs,” in Proceedings of the 2Nd ACM SIGPLAN International Workshop
on Machine Learning and Programming Languages, 2018, pp. 1–10.
[34] M. Sridharan, S. Chandra, J. Dolby, S. J. Fink, and E. Yahav, “Alias analysis for object-oriented programs,” in Aliasing in Object-Oriented Programming. Types, Analysis and Verification, Springer, 2013, pp. 196–232.
[35] S. Lagouvardos, J. Dolby, N. Grech, A. Antoniadis, and Y. Smaragdakis, “Static analysis of shape in tensorflow programs,” in 34th European Conference on Object-Oriented Programming (ECOOP 2020), Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2020.
[36] M. Bravenboer and Y. Smaragdakis, “Strictly declarative specification of sophisticated points-to analyses,” in Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, 2009, pp. 243–262.
[37] H. Jordan, B. Scholz, and P. Subotić, “Soufflé: On synthesis of program analyzers,”in International Conference on Computer Aided Verification, Springer, 2016,pp. 422–430.
[38] S. Verma and Z. Su, “Shapeflow: Dynamic shape interpreter for tensorflow,” arXivpreprint arXiv:2011.13452, 2020.
[39] D. Wu, B. Shen, Y. Chen, H. Jiang, and L. Qiao,“Tensfa: Detecting and repairing tensor shape faults in deep learning systems,” in 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), IEEE, 2021, pp. 11–21.
[40] P. Cousot and R. Cousot, “Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints,” in Proceedings
of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages,1977, pp. 238–252.
[41] Valgrind. [Online]. Available: https://valgrind.org/ (visited on 06/23/2022).
[42] N. Nethercote and J. Seward, “Valgrind: A framework for heavyweight dynamic binary instrumentation,” ACM Sigplan notices, vol. 42, no. 6, pp. 89–100, 2007.
[43] N. Nethercote and J. Fitzhardinge, “Bounds-checking entire programs without recompiling,” SPACE, 2004.
[44] R. Hastings and B. Joyce, “Purify: A tool for detecting memory leaks and access errors in c and c++ programs,” in Proceedings of the Winter 1992 USENIX Conference,
pp. 125–138.
[45] J. Santino, “Enforcing correct array indexes with a type system,” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software
Engineering, 2016, pp. 1142–1144.
[46] M. Kellogg, V. Dort, S. Millstein, and M. D. Ernst, “Lightweight verification of array indexing,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018, pp. 3–14.
[47] M. M. Papi, M. Ali, T. L. Correa Jr, J. H. Perkins, and M. D. Ernst, “Practical pluggable types for java,” in Proceedings of the 2008 international symposium on Software testing and analysis, 2008, pp. 201–212.
[48] Tensorflow-haskell-deptyped. [Online]. Available: https : / / github . com / helq /tensorflow-haskell-deptyped (visited on 06/28/2022).

指導教授

莊永裕(Yung-Yu Zhuang)

審核日期

2022-8-9

推文