適用於深度增強式學習之瀑布式排程方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：18.190.176.253

姓名

劉政威(Zheng-Wei Liu) 查詢紙本館藏

畢業系所

通訊工程學系在職專班

論文名稱

適用於深度增強式學習之瀑布式排程方法
(Waterfall Model for Deep Reinforcement Learning Based Scheduling)

相關論文

★ 基於馬賽克特性之低失真實體電路佈局保密技術	★ 多路徑傳輸控制協定下從無線區域網路到行動網路之無縫換手
★ 感知網路下具預算限制之異質性子頻段分配	★ 下行服務品質排程在多天線傳輸環境下的效能評估
★ 多路徑傳輸控制協定下之整合型壅塞及路徑控制	★ Opportunistic Scheduling for Multicast over Wireless Networks
★ 適用多用戶多輸出輸入系統之低複雜度比例公平性排程設計	★ 利用混合式天線分配之 LTE 異質網路 UE 與 MIMO 模式選擇
★ 基於有限預算標價式拍賣之異質性頻譜分配方法	★ 適用於 MTC 裝置 ID 共享情境之排程式分群方法
★ Efficient Two-Way Vertical Handover with Multipath TCP	★ 多路徑傳輸控制協定下可亂序傳輸之壅塞及排程控制
★ 移動網路下適用於閘道重置之群體換手機制	★ 使用率能小型基地台之拍賣是行動數據分流方法
★ 高速鐵路環境下之通道預測暨比例公平性排程設計	★ 用於行動網路效能評估之混合式物聯網流量產生器

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

第四代通訊系統已可滿足移動式設備的多媒體應用需求。透過基地台提供的排程服務，用戶設備可在通訊系統的下行鏈路獲取各自所需的資料封包，藉以滿足並獲得更好的應用服務，因此配給通道資源並提供用戶群排程服務的演算法相當關鍵。本文實現一行動通訊排程學習平台，提出基於Deep Deterministic Policy Gradient模型，並採用瀑布模型概念將排程算法流程依序解析為排序挑選、資源評估和通道分配三個階段，透過階段微型算法學習挑選在當前通訊環境下使單位時間資料吞吐量更多並滿足更多用戶需求的瀑布式排程方法。行動通訊排程學習平台由六大模組元件架構而成：基地台與通道資源、強化學習神經網路、用戶設備屬性、應用服務類型、環境資訊與獎勵函式，與階段微型算法與依賴注入。利用反轉控制與依賴注入降低平台軟體耦合性，在階段微型算法與六大模組元件的維護上變得相當容易。

摘要(英)

The fourth generation of communication systems has been able to meet the multimedia application needs of mobile devices. Through the scheduling service provided by the base station, the user equipment can obtain the data packets required by the downlink of the communication system to meet and obtain better application services, so the channel resources are allocated and the calculation of the user group scheduling service is provided. The law is quite critical. This paper implements a mobile communication scheduling learning platform, and proposes a Deep Deterministic Policy Gradient model. The waterfall model concept is used to analyze the scheduling algorithm flow into three stages: sorting selection, resource evaluation and channel allocation. A waterfall scheduling method that enables more data throughput per unit time and meets more user needs in the current communication environment. The mobile communication scheduling learning platform is composed of six modular components: base station and channel resources, enhanced learning neural network, user equipment attributes, application service types, environmental information and reward functions, and phase micro-algorithms and dependency injection. . Using inversion control and dependency injection to reduce platform software coupling, it is quite easy to maintain the stage micro-algorithm and the six module components.

關鍵字(中)

★ 排程
★ 強化學習

關鍵字(英)

★ Scheduling
★ Reinforcement Learning

論文目次

謝誌....................................................................................................i
中文摘要.............................................................................................iii
英文摘要.............................................................................................v
目錄....................................................................................................vii
圖目錄................................................................................................ix
表目錄................................................................................................xi
一、緒論..............................................................................1
1.1 前言..........................1
1.2 研究動機........................1
1.3 本文貢獻........................1
1.4 論文架構........................2
二、相關研究與技術............................................................3
2.1 強化學習........................3
2.2 ACtor-Critic......................4
2.3 Deep Deterministic Policy Gradient..........5
2.4 控制反轉........................7
2.5 依賴注入........................8
三、瀑布式排程方法............................................................9
3.1 瀑布模型........................9
3.2 階段微型算法.....................10
3.2.1 排序挑選階段.....................10
3.2.2 資源評估階段.....................12
3.2.3 通道分配階段.....................14
四、行動通訊排程學習平台架構..........................................17
4.1 基地台與通道資源...................17
4.2 強化學習神經網路...................18
4.3 用戶設備屬性.....................19
4.4 應用服務類型.....................19
4.5 環境資訊與獎勵函式..................20
4.5.1 環境資訊........................20
4.5.2 獎勵函式........................21
4.6 階段微型算法與依賴注入...............22
五、實驗流程.......................................................................25
5.1 微型算法挑選問題...................25
5.2 微型算法修剪方案...................27
5.2.1 排序挑選階段.....................28
5.2.2 資源評估階段.....................28
5.2.3 通道分配階段.....................28
5.3 排除挑選問題.....................30
六、總結..............................................................................33
6.1 總結..........................33
6.2 未來工作........................33
參考文獻.............................................................................................35

參考文獻

[1] Lin Wang, Lei Jiao, Ting He, Jun Li, and Max Mühlhäuser. Service entity placement for social virtual reality applications in edge
computing. IEEE INFOCOM 2018 - IEEE Conference on Computer
Communications, pages 468–476, 2018.
[2] 3GPP TS 23.501. System Architecture for 5G System. Technical
report.
[3] S.-C. Tseng, Z.-W. Liu, Y.-C. Chou, and C.-W. Huang. Radio resource scheduling for 5g nr via deep deterministic policy gradient.
in IEEE International Conference on Communications Workshops
(ICC WS), 2019.
[4] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation.
Advances in Neural Information Processing Systems 12, 1999.
[5] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. Advances
in Neural Information Processing Systems 12, 1999.
[6] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. Deterministic policy gradient algorithms. International Conference on Machine Learning, 2014.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
D. Silver, , and D. Wierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations, February 2016.
[8] Martin Fowler. Inversion of Control Containers and the Dependency Injection pattern. https://martinfowler.com/articles/
injection.html, 2004. [Online; accessed 23-January-2004].
[9] Abbas and Ali E. Constructing multiattribute utility functions for
decision analysis. In Risk and Optimization in an Uncertain World,
pages 62–98. INFORMS, 2010.

指導教授

黃志煒(Chih-Wei Huang)

審核日期

2019-7-31

推文