博碩士論文 110526008 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:36 、訪客IP:18.225.55.103
姓名 黃柏學(Bo-Xue Huang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於領域知識的強化式學習 TCP 壅塞控制方法
(Reinforcement Learning TCP Congestion Control Method Based on Domain Knowledge)
相關論文
★ 無線行動隨意網路上穩定品質服務路由機制之研究★ 應用多重移動式代理人之網路管理系統
★ 應用移動式代理人之網路協同防衛系統★ 鏈路狀態資訊不確定下QoS路由之研究
★ 以訊務觀察法改善光突發交換技術之路徑建立效能★ 感測網路與競局理論應用於舒適性空調之研究
★ 以搜尋樹為基礎之無線感測網路繞徑演算法★ 基於無線感測網路之行動裝置輕型定位系統
★ 多媒體導覽玩具車★ 以Smart Floor為基礎之導覽玩具車
★ 行動社群網路服務管理系統-應用於發展遲緩兒家庭★ 具位置感知之穿戴式行動廣告系統
★ 調適性車載廣播★ 車載網路上具預警能力之車輛碰撞避免機制
★ 應用於無線車載網路上之合作式交通資訊傳播機制以改善車輛擁塞★ 智慧都市中應用車載網路以改善壅塞之調適性虛擬交通號誌
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 隨著網路技術的不斷發展,傳輸控制協定(Transmission Control Protocol,TCP)壅塞控制(congestion control)已成為網路性能優化的重要課題。近年來,強化式學習(Reinforcement Learning, RL)被廣泛應用於 TCP 壅塞控制的解決方案,其透過與環境互動並探索出一套最佳策略的方式讓它在資源調配的問題上表現亮眼。然而,由於深度學習的黑盒性質,純粹的強化式學習方案在面對未曾遭遇過的情境時可能產生非預期的行為。此外,過度頻繁地調用神經網路也會消耗大量運算資源,對網路設備造成負擔,所以對於強化式學習應用,勢必要考量到計算資源以及面臨陌生環境的應對。

本論文提出了一種名為"Reinforcement learning method for Congestion control based on Shifting Gears (RCSG)" ,應用於 TCP 發送端的強化式學習壅塞控制機制,可以提高 RL 在陌生情境下的應對能力並減少神經網路的調用頻率。該機制以壅塞控制演算法為主,並輔以神經網路,以減少運算資源的消耗。並且本論文還採用了專門設計的前處理流程來降低神經網路在陌生情境下的非預期行為。

實驗結果表明,RCSG 機制能夠在穩定的網路環境中分別讓 Reno 演算法和CUBIC 演算法最多減少 76.60%和 58.52%的排隊延遲,並且在不可靠的網路環境中最多可以分別提升 62.31%和 24.59%的吞吐量。並且即使遭遇到訓練中不曾遭遇過的情境時依舊能夠保持穩定的運作。該機制還能根據需求彈性地調整 AI 的控制頻率,若是需要更精確的控制能力可以選擇提高控制頻率,以更高的運算成本換取更好的性能,相對地如果計算資源有限的話則能夠調降控制頻率,透過減少部分強化式學習的控制能力來換取降低運算資源的消耗。
摘要(英) With the continuous development of network technology, TCP congestion control has become an important issue for network performance optimization. In recent years, Reinforcement Learning (RL) has been widely used in TCP Congestion Control solutions, which interacts with the environment and explores a set of optimal strategies. However, due to the black box feature of deep learning, pure reinforcement learning methods may take unexpected behaviors when facing unseen situations. In addition, frequent invocation of neural networks would consume amount of computing resources and cause a burden on network equipment. Therefore, for reinforcement learning solutions, it is necessary to consider computing resources allocation and strategies for unseen situations.

This paper proposes an RL congestion control method called “Reinforcement Learning method for Congestion Control based on Shifting Gears (RCSG)” to improve the ability of TCP sender in unseen situations and reduce the called frequency of neural network. The method is mainly based on congestion control algorithms supported by neural networks to reduce computational resource consumption. In addition, this paper also has a specially designed preprocessing flow for reducing unexpected behavior of neural networks in unseen situations.

The experimental results show that RCSG can reduce queuing delay of the Reno algorithm and the CUBIC algorithm by up to 76.60% and 58.52% in a stable network environment. In an unreliable network environment, it can increase the throughput by up to 62.31% and 24.59% respectively. Importantly, RCSG show stable performance even when encountering unseen scenarios. Furthermore, RCSG allows flexible adjustment of AI′s control frequency. Users can choose higher control frequencies for more precise control with higher computational costs, or lower control frequencies to conserve computational resources while sacrificing some degree of reinforcement learning control capability.
關鍵字(中) ★ 深度學習
★ 強化式學習
★ 壅塞控制
★ 計算機網路
關鍵字(英) ★ Deep Learning
★ Reinforcement Learning
★ Congestion Control
★ Computer Network
論文目次 摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 x
第一章 緒論 1
1.1. 概要 1
1.2. 研究動機 2
1.3. 研究目的 3
1.4. 章節架構 4
第二章 背景知識與相關研究 5
2.1. 壅塞控制(Congestion Control) 5
2.2. 強化式學習(Reinforcement Learning) 7
2.3. 領域知識(Domain Knowledge) 9
2.4. 相關研究 11
第三章 研究方法 15
3.1. 系統架構與設計 15
3.2. Transmit Module 運作方法 19
3.2.1. 慢啟動階段(Slow Start Phase) 20
3.2.2. 壅塞避免階段(Congestion Avoidance Phase) 21
3.2.3. 變速機制(Gear Switching Mechanism) 22
3.3. AI Module 運作方法 24
3.3.1. 最小動作間隔(Minimum Action Interval) 26
3.3.2. 記憶單元(Memory Model) 27
3.3.3. 狀態空間(State Space) 28
3.3.4. 獎勵函式(Reward Function) 29
3.3.5. 前處理(Preprocess Model) 30
3.3.6. 神經網路模型(Network Model) 32
3.3.7. 控制器(Controller) 32
3.3.8. 優化目標(Optimization Target) 34
3.4. 系統環境 35
第四章 實驗與討論 36
4.1. 訓練設置 37
4.1.1. ns3-gym 37
4.1.2. Asynchronous Advantage Actor-Critic 38
4.1.3. Long Short-Term Memory 38
4.1.4. 模型架構 39
4.2. 情境一:穩定連線環境 41
4.2.1. 實驗一:RCSG搭配Reno進行壅塞控制 43
4.2.2. 實驗二:RCSG搭配CUBIC進行壅塞控制 48
4.2.3. 實驗三:RCSG搭配不同控制間隔進行壅塞控制 52
4.3. 情境二:不可靠連線環境 56
4.3.1. 實驗四:RCSG搭配Reno進行壅塞控制 58
4.3.2. 實驗五:RCSG搭配CUBIC進行壅塞控制 62
4.3.3. 實驗六:RCSG搭配不同控制間隔進行壅塞控制 66
4.4. 情境三之一:陌生環境之穩定連線情境 70
4.4.1. 實驗七:RCSG搭配Reno進行壅塞控制 72
4.4.2. 實驗八:RCSG搭配CUBIC進行壅塞控制 76
4.4.3. 針對不同MTU對RCSG控制之影響 80
4.5. 情境三之二:陌生環境之不可靠連線情境 81
4.5.1. 實驗九:RCSG搭配Reno進行壅塞控制 83
4.5.2. 實驗十:RCSG搭配CUBIC進行壅塞控制 87
4.6. 針對不同獎勵函式與動作空間之特性比較 91
第五章 結論與未來研究方向 98
5.1. 結論 98
5.2. 研究限制 99
5.3. 未來研究 100
參考文獻 102
附錄 108
參考文獻 [1] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, and others, "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, 2017.
[2] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, and others, "Mastering chess and shogi by self-play with a general reinforcement learning algorithm," arXiv preprint arXiv:1712.01815, 2017.
[3] Y. Li, "Deep reinforcement learning: An overview," arXiv preprint arXiv:1701.07274, 2017.
[4] R. Nian, J. Liu, and B. Huang, "A review on reinforcement learning: Introduction and applications in industrial process control," Computers & Chemical Engineering, vol. 139, p. 106886, 2020.
[5] M. Van Otterlo and M. Wiering, "Reinforcement learning and Markov decision processes," in Reinforcement learning: State-of-the-art, pp. 3-42, Springer, 2012.
[6] A. Agarwal, S. M. Kakade, J. D. Lee, and G. Mahajan, "Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes," arXiv preprint arXiv:1908.00261, 2019.
[7] W. Li, F. Zhou, K. R. Chowdhury, and W. Meleis, "QTCP: Adaptive congestion control with reinforcement learning," IEEE Transactions on Network Science and Engineering, vol. 6, no. 3, pp. 445-458, 2018.
[8] Z. Xu, J. Tang, C. Yin, Y. Wang, and G. Xue, "Experience-driven congestion control: When multi-path TCP meets deep reinforcement learning," IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1325-1336, 2019.
[9] C. Tessler, Y. Shpigelman, G. Dalal, A. Mandelbaum, D. Haritan Kazakov, B. Fuhrer, G. Chechik, and S. Mannor, "Reinforcement learning for datacenter congestion control," ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 2, pp. 43-46, 2022.
[10] F. Ruffy, M. Przystupa, and I. Beschastnikh, "Iroko: A framework to prototype reinforcement learning for data center traffic control," arXiv preprint arXiv:1812.09975, 2018.
[11] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, "Practical black-box attacks against machine learning," in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506-519, 2017.
[12] C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead," Nature machine intelligence, vol. 1, no. 5, pp. 206-215, 2019.
[13] E. Giunchiglia, M. C. Stoian, and T. Lukasiewicz, "Deep learning with logical constraints," arXiv preprint arXiv:2205.00523, 2022.
[14] B. Hu and J. Li, "An adaptive hierarchical energy management strategy for hybrid electric vehicles combining heuristic domain knowledge and data-driven deep reinforcement learning," IEEE Transactions on Transportation Electrification, vol. 8, no. 3, pp. 3275-3288, 2021.
[15] J. Postel, "Transmission control protocol, " RFC 793, 1981.
[16] M. Allman, V. Paxson, and E. Blanton, "TCP congestion control, " RFC 5681, 2009.
[17] “TCP slow start” Access on: May 18, 2023. [Online]. Available: https://developer.mozilla.org/en-US/docs/Glossary/TCP_slow_start
[18] S. Sah, "Machine learning: a review of learning types," Preprints, 2020.
[19] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems, vol. 12, 1999.
[20] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992.
[21] H. Hasselt, "Double Q-learning," in Advances in neural information processing systems, vol. 23, 2010.
[22] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[23] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
[24] S. Fujimoto, D. Meger, and D. Precup, "Off-policy deep reinforcement learning without exploration," in International Conference on Machine Learning, pp. 2052-2062, 2019.
[25] C. Gelada and M. G. Bellemare, "Off-policy deep reinforcement learning by bootstrapping the covariate shift," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3647-3655, 2019.
[26] “The idea behind Actor-Critics and how A2C and A3C improve them” Access on: May 18, 2023. [Online]. Available: https://theaisummer.com/Actor_critics/ #back-to-a2c
[27] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020.
[28] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53-65, 2018.
[29] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, "Harnessing deep neural networks with logic rules," arXiv preprint arXiv:1603.06318, 2016.
[30] T. Dash, S. Chitlangia, A. Ahuja, and A. Srinivasan, "A review of some techniques for inclusion of domain-knowledge into deep neural networks," Scientific Reports, vol. 12, no. 1, p. 1040, 2022.
[31] K. Kurata, G. Hasegawa, and M. Murata, "Fairness comparisons between TCP Reno and TCP Vegas for future deployment of TCP Vegas," in Proceedings of INET, vol. 2000, no. 2.2, pp. 2, 2000.
[32] “TCP Reno and Congestion Management” Access on: May 19, 2023. [Online]. Available: https://intronetworks.cs.luc.edu/current/html/reno.html
[33] S. Ha, I. Rhee, and L. Xu, "CUBIC: a new TCP-friendly high-speed TCP variant," ACM SIGOPS Operating Systems Review, vol. 42, no. 5, pp. 64-74, 2008.
[34] I. Rhee, L. Xu, S. Ha, A. Zimmermann, L. Eggert, and R. Scheffenegger, "CUBIC for fast long-distance networks," RFC 8312, 2018.
[35] N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar, "A deep reinforcement learning perspective on internet congestion control," in International Conference on Machine Learning, pp. 3050-3059, 2019.
[36] S. Abbasloo, C.-Y. Yen, and H. J. Chao, "Classic meets modern: A pragmatic learning-based congestion control for the internet," in Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 632-647, 2020.
[37] S. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv preprint arXiv:1503.06462, 2015.
[38] Y. Zheng, H. Chen, Q. Duan, L. Lin, Y. Shao, W. Wang, X. Wang, and Y. Xu, "Leveraging domain knowledge for robust deep reinforcement learning in networking," in IEEE INFOCOM 2021-IEEE Conference on Computer Communications, pp. 1-10, 2021.
[39] I. M. Ozcelik and C. Ersoy, "ALVS: Adaptive Live Video Streaming using deep reinforcement learning," Journal of Network and Computer Applications, vol. 205, pp. 103451, 2022.
[40] H. Mao, R. Netravali, and M. Alizadeh, "Neural adaptive video streaming with Pensieve," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp. 197-210, 2017.
[41] H. Mao, S. B. Venkatakrishnan, M. Schwarzkopf, and M. Alizadeh, "Variance reduction for reinforcement learning in input-driven environments," arXiv preprint arXiv:1807.02264, 2018.
[42] V. Sivakumar, O. Delalleau, T. Rocktäschel, A. H. Miller, H. Küttler, N. Nardelli, M. Rabbat, J. Pineau, and S. Riedel, "Mvfst-rl: An asynchronous RL framework for congestion control with delayed actions," arXiv preprint arXiv:1910.04054, 2019.
[43] G. Chen, "A gentle tutorial of recurrent neural network with error backpropagation," arXiv preprint arXiv:1610.02583, 2016.
[44] “ikostrikov/pytorch-a3c - GitHub” Access on: May 26, 2023. [Online]. Available: https://github.com/ikostrikov/pytorch-a3c
[45] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in PyTorch," n NIPS Autodiff Workshop, 2017.
[46] P. Gawłowicz and A. Zubow, "ns3-gym: Extending openai gym for networking research," arXiv preprint arXiv:1810.03943, 2018.
[47] “tkn-tub/ns3-gym - GitHub” Access on: June 03, 2023. [Online]. Available: https://github.com/tkn-tub/ns3-gym
[48] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," arXiv preprint arXiv:1606.01540, 2016.
[49] G. F. Riley and T. R. Henderson, "The ns-3 network simulator," in Modeling and Tools for Network Simulation, pp. 15-34, Springer, 2010.
[50] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, pp. 1928-1937, 2016.
[51] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[52] Y. Yu, X. Si, C. Hu, and J. Zhang, "A review of recurrent neural networks: LSTM cells and network architectures," Neural Computation, vol. 31, no. 7, pp. 1235-1270, 2019.
指導教授 周立德(Li-Der Chou) 審核日期 2023-8-14
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明