解決指令下沉阻擋商餘合併之最佳化演算法改良

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：3.133.148.119

姓名

林庭葦(Ting-Wei Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

解決指令下沉阻擋商餘合併之最佳化演算法改良
(An Algorithm to Prevent Division-Modulo Combine from Being Blocked by Instruction Sink)

相關論文

★ 條件判斷式事件驅動程式設計之C語言擴充	★ 基于小波变换的指纹活度检测,具有聚集 LPQ 和 LBP 特征
★ 應用自動化測試於異質環境機器學習管道之 MLOps 系統	★ 設計具有可視化思維工具和程式作為單一步的輔助學習程式之棋盤式遊戲
★ TOCTOU 漏洞的靜態分析與實作	★ 用於繪製風力發電控制邏輯之特定領域語言
★ 在Java程式語言中以雙向結構表達數學公式間關聯之設計與實作	★ 支援模組化規則製作之程式碼轉換工具
★ 基於替代語意的 pandas DataFrame 靜態型別檢查器	★ 自動化時間複雜度分析的設計與實作–從軟體層面評估嵌入式系統的功率消耗
★ 以震波層析成像為應用之特定領域語言實作與分析	★ 用特徵選擇減少疲勞偵測腦電圖通道數
★ 一個應用紙本運算與數位化於程式設計學習使程序性思維可視化的機制	★ 基於抽象語法樹的陣列形狀錯誤偵測
★ 從合作學習角色分工獲得函式程式設計思維學習遞迴程式的機制	★ 基於抽象語法樹的深度複製及彈性別名之所有權系統解決 Java 表示暴露問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2032-8-1以後開放)

摘要(中)

編譯器除了將程式碼由高階語言轉換成組合語言或者是機器語言外，還會根據不同的目的對程式碼進行最佳化以提升程式碼的品質。然而，由於最佳化的演算法很多，使得最佳化的順序會影響程式碼品質，有時甚至會使程式碼品質更差，如指令下沉使得取餘數、取商數分開到不同基本程式塊，導致商餘合併無法套用，這樣的阻擋問題會使程式碼的效能變差。由於指令下沉為機器不相依的最佳化，商於合併則為機器相依的最佳化，應用於後端，因此兩者難以對調其順序以避開該衝突。我們提出一個演算法，以解決該阻擋問題。我們將該演算法介入並實作於 LLVM 最佳化器上的指令下沉內，以求解決所有會受影響的後端。最後根據可行性、相容性、針對性三個面向進行完整的評估，評估結果表示確實能夠有效的解決該阻擋問題且不會與其他最佳化演算法產生衝突而最佳化失敗，也能夠只針對有問題的指令進行處理而不會影響到正常的其他指令。

摘要(英)

Compiler compiles high-level language like C language to assembly language even machine code to let programmers don′t need to concern about what machine actually do and how to write machine code while developing. In addition, it can optimize the code base on different purpose. However, according to compiler phase-ordering problem, the orders of optimizations will impact the quality of programs. In some cases, the sequence of optimizations will make the program worse. For example, Instruction Sink separates division and modulo to different basic blocks and this act will make Division-Modulo Combine failed to be applied. Because the former is a machine-independent optimization and the latter is a machine-dependent optimization, it′s hard to reorder them to avoid the problem. We propose an algorithm to solve this blocking issue. We implement our algorithm in Instruction Sink in LLVM Optimizer to keep this problem from all impacted backend. In the end, we evaluate the feasibility, the compatibility and the pertinence. The result shows that our proposal can solve this problem and won′t conflict with other optimizations to let program worse. It also shows that our algorithm only targets the instruction with blocking issue and won′t change the other normal instructions.

關鍵字(中)

★ 編譯器
★ 程式碼最佳化
★ 相序問題
★ 程式碼移動
★ LLVM
★ 機器相依最佳化
★ 機器不相依最佳化

關鍵字(英)

★ compiler
★ code optimization
★ phase-ordering
★ code motion
★ LLVM
★ machine-dependent optimization
★ machine-independent optimization

論文目次

摘要 ............................................................i
Abstract ............................................................ii
目錄 ............................................................iv
一、緒論 ............................................................1
1.1 編譯器的架構 ............................................................ 1
1.2 編譯器最佳化的目的 ................................................... 2
1.2.1 針對能量耗損最佳化 .......................................... 3
1.2.2 針對容量最佳化 ................................................ 3
1.2.3 針對效能最佳化 ................................................ 3
1.3 作用於不同階段的編譯器最佳化 .................................... 4
1.3.1 機器不相依的最佳化演算法 ................................. 4
1.3.2 機器相依的最佳化演算法 .................................... 5
1.4 編譯器的相序問題 ...................................................... 6
1.5 指令下沉阻擋商餘合併 ................................................ 6
1.6 論文架構 .................................................................. 7
二、研究動機 ............................................................8
2.1 機器不相依最佳化——指令下沉..................................... 8
2.1.1 變數的定義與使用 ............................................. 8
2.1.2 指令下沉最佳化演算法 ....................................... 9
2.2 機器相依最佳化——商餘合併........................................ 10
2.3 指令下沉阻擋商餘合併致使最佳化失敗 ........................... 11
2.4 小結 ........................................................................ 13
三、提案 ............................................................15
3.1 產生問題的條件 ......................................................... 16
3.2 指令下沉的步驟 ......................................................... 17
3.3 SHANE .................................................................... 19
3.3.1 演算法作用於指令下沉最佳化內 ........................... 19
3.3.2 演算法流程 ...................................................... 20
3.3.3 本演算法的效果 ................................................ 23
四、實作 ............................................................25
4.1 實作環境 .................................................................. 25
4.2 演算法 SHANE 虛擬碼 ................................................ 26
4.2.1 判斷硬體資訊以及欲搜尋的指令 ........................... 27
4.2.2 尋找 Partner ................................................... 27
4.2.3 判斷 Partner 是否會被下沉................................. 28
4.3 處理特殊情況 ............................................................ 28
4.3.1 運算元經過整數位元數的串接與裁剪 ..................... 28
4.3.2 除數為常數 ...................................................... 29
五、評估 ............................................................31
5.1 可行性的評估 ............................................................ 31
5.2 相容性的評估 ............................................................ 33
5.2.1 評估結果 ......................................................... 33
5.2.2 LLVMO3 中的最佳化–DivRemPairs...................... 34
5.2.3 比較可行性與相容性 .......................................... 36
5.3 針對性的評估 ............................................................ 38
5.4 benchmarks 評估 ........................................................ 38
六、相關研究 ............................................................41
6.1 提前編譯器的架構 ...................................................... 41
6.1.1 GCC–GNU CompilerCollection ............................ 42
6.1.2 GCC 與 LLVM 編譯器架構的差異......................... 43
6.1.3 開源軟體的授權條款 .......................................... 43
6.2 及時編譯器 ............................................................... 44
6.3 使用機器學習進行最佳化的編譯器 ................................. 45
6.3.1 機器學習編譯器的缺點 ....................................... 47
6.3.2 本提案對機器學習編譯器的好處 ........................... 48
6.4 高階語言的內在函數 ................................................... 48
七、總結 ............................................................50
7.1 未來展望 .................................................................. 51
7.1.1 LLVM 內在函數應用於本提案 .............................. 51
7.1.2 本問題是否存在於有獨立取餘數指令的硬體 ............ 51
八、參考文獻 ............................................................53

參考文獻

[1]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: principles, techniques,
& tools. Pearson Education India, 2007.

[2]
U. Ammann, “On code generation in a pascal compiler,” Software: Practice and Experience, vol. 7, no. 3, pp. 391–423, 1977.

[3]
R. Bornat, Understanding and writing compilers: a do-it-yourself guide. Macmillan International Higher Education, 1979.

[4]
C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004. , IEEE, 2004, pp. 75–86.

[5]
G. C. Necula, “Translation validation for an optimizing compiler,” in Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, 2000, pp. 83–94.

[6]
V. Delaluz, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin, “Energy-oriented compiler optimizations for partitioned memory architectures,” in Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, 2000, pp. 138–147.

[7]
H. Ge and J. Liu, “Keeping smartphones cool with gallium phase change material,”
Journal of heat transfer, vol. 135, no. 5, 2013.

[8]
C. Lee, J. K. Lee, and T. Hwang, “Compiler optimization on instruction scheduling for low power,” in Proceedings 13th International Symposium on System Synthesis, IEEE, 2000, pp. 55–60.

[9]
K. D. Cooper and N. McIntosh, “Enhanced code compression for embedded risc processors,” ACM SIGPLAN Notices, vol. 34, no. 5, pp. 139–149, 1999.

[10]
C. Lefurgy, P. Bird, I.-C. Chen, and T. Mudge, “Improving code density using compression techniques,” in Proceedings of 30th Annual International Symposium on Microarchitecture, IEEE, 1997, pp. 194–203.

[11]
S. K. Debray, W. Evans, R. Muth, and B. De Sutter, “Compiler techniques for code compaction,” ACM Transactions on Programming languages and Systems (TOPLAS), vol. 22, no. 2, pp. 378–415, 2000.

53
[12]
Y. Zhuang and S. Chiba, “Better abstraction for efficient code in hpc programs,”
2015.

[13]
D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Computing Surveys (CSUR), vol. 26, no. 4, pp. 345–
420, 1994.

[14]
T. Kisuki, P. M. Knijnenburg, and M. F. O’Boyle, “Combined selection of tile sizes and unroll factors using iterative compilation,” in Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.

PR00622), IEEE, 2000, pp. 237–246.

[15]
S. Muchnick et al. , Advanced compiler design implementation. Morgan kaufmann, 1997.

[16]
J. Knoop, O. Rüthing, and B. Steffen, “Partial dead code elimination,” ACM Sig-plan Notices, vol. 29, no. 6, pp. 147–158, 1994.

[17]
D. Ebner, F. Brandner, B. Scholz, A. Krall, P. Wiedermann, and A. Kadlec, “Gen-eralized instruction selection using ssa-graphs,” in Proceedings of the 2008 ACM
SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, 2008, pp. 31–40.

[18]
A. Haj-Ali, Q. J. Huang, J. Xiang, et al. , “Autophase: Juggling hls phase order-ings in random forests with deep reinforcement learning,” Proceedings of Machine Learning and Systems, vol. 2, pp. 70–81, 2020.

[19]
Y. B. Asher, G. Haber, and E. Stein, “A study of conflicting pairs of compiler optimizations,” in 2017 IEEE 11th International Symposium on Embedded Multicore/
Many-core Systems-on-Chip (MCSoC), IEEE, 2017, pp. 52–58.

[20]
U. Garciarena and R. Santana, “Evolutionary optimization of compiler flag selection by learning and exploiting flags interactions,” in Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, 2016, pp. 1159–1166.

[21]
S.-A.-A. Touati and D. Barthou, “On the decidability of phase ordering problem in optimizing compilation,” in Proceedings of the 3rd Conference on Computing Frontiers, 2006, pp. 147–156.

[22]
G. Kane and J. Heinrich, MIPS RISC architectures. Prentice-Hall, Inc., 1992.

[23]
W. M. McKeeman, “Peephole optimization,” Communications of the ACM, vol. 8, no. 7, pp. 443–444, 1965.

[24]
T. Granlund and P. L. Montgomery, “Division by invariant integers using multi-plication,” in Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, 1994, pp. 61–72.

[25]
M. Beeler et al. , Item 120 in m. beeler, r. w. gosper, and r. schroeppel, hakmem, 1972.

54
[26]
L. contributors, LLVM ”test-suite” repository, https://github.com/llvm/llvm-
test-suite, 2022.

[27]
E. M. B. Consortium et al. , Coremark: An eembc benchmark, 2018.

[28]
R. P. Weicker, “Dhrystone: A synthetic systems programming benchmark,” Communications of the ACM, vol. 27, no. 10, pp. 1013–1030, 1984.

[29]
R. Longbottom, Whetstone benchmark history and results, 2005.

[30]
R. Stallman et al. , Using and porting GNU CC. Free Software Foundation, 1998, vol. 675.

[31]
C. Lattner, “LLVM and Clang: Next generation compiler technology,” in The BSD
conference, vol. 5, 2008.

[32]
D. Novillo, “GCC an architectural overview, current status, and future directions,”
in Proceedings of the Linux Symposium, vol. 2, 2006, p. 185.

[33]
A. Engelfriet, “Choosing an open source license,” IEEE software, vol. 27, no. 1, pp. 48–49, 2009.

[34]
A. H. Ashouri, W. Killian, J. Cavazos, G. Palermo, and C. Silvano, “A survey on compiler autotuning using machine learning,” ACM Computing Surveys (CSUR), vol. 51, no. 5, pp. 1–42, 2018.

[35]
J. Aycock, “A brief history of just-in-time,” ACM Computing Surveys (CSUR), vol. 35, no. 2, pp. 97–113, 2003.

[36]
A. W. Wade, P. A. Kulkarni, and M. R. Jantz, “AOT vs. JIT: impact of profile data on code quality,” in Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2017, pp. 1–10.

[37]
C. Liem, Retargetable Compilers for Embedded Core Processors: Methods and Experiences in Industrial Applications. Springer Science & Business Media, 2013.

[38]
F. Agakov, E. Bonilla, J. Cavazos, et al. , “Using machine learning to focus iterative optimization,” in International Symposium on Code Generation and Optimization (CGO’06), IEEE, 2006, 11–pp.

[39]
E. Park, J. Cavazos, L.-N. Pouchet, C. Bastoul, A. Cohen, and P. Sadayappan,
“Predictive modeling in a polyhedral optimization space,” International journal of parallel programming, vol. 41, no. 5, pp. 704–750, 2013.

[40]
A. H. Ashouri, A. Bignoli, G. Palermo, and C. Silvano, “Predictive modeling method-ology for compiler phase-ordering,” in Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, 2016, pp. 7–12.

[41]
L. Almagor, K. D. Cooper, A. Grosul, et al. , “Finding effective compilation sequences,” ACM SIGPLAN Notices, vol. 39, no. 7, pp. 231–239, 2004.

55
[42]
K. D. Cooper, A. Grosul, T. J. Harvey, et al. , “Acme: Adaptive compilation made efficient,” ACM SIGPLAN Notices, vol. 40, no. 7, pp. 69–77, 2005.

[43]
A. H. Ashouri, A. Bignoli, G. Palermo, C. Silvano, S. Kulkarni, and J. Cavazos,
“Micomp: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 14, no. 3, pp. 1–28, 2017.

[44]
A. H. Ashouri, G. Mariani, G. Palermo, E. Park, J. Cavazos, and C. Silvano,
“Cobayn: Compiler autotuning framework using bayesian networks,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 13, no. 2, pp. 1–25, 2016.

[45]
R. A. Johnson, D. W. Wichern, et al. , Applied multivariate statistical analysis.

Pearson London, UK: 2014, vol. 6.

[46]
M. Rigger, S. Marr, S. Kell, D. Leopoldseder, and H. Mössenböck, “An analysis of x86-64 inline assembly in c programs,” in Proceedings of the 14th ACM
SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2018, pp. 84–99.

[47]
D. Batten, S. Jinturkar, J. Glossner, M. Schulte, and P. D’Arcy, “A new approach to dsp intrinsic functions,” in Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, IEEE, 2000, 10–pp.

[48]
K. Asanović and D. A. Patterson, “Instruction sets should be free: The case for risc-v,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/
EECS-2014-146, 2014.

指導教授

莊永裕(YungYu Zhuang)

審核日期

2022-7-2

推文