姓名 侯博允(HOU PO YUN) 畢業系所 通訊工程學系
論文名稱 行動邊緣計算環境下基於深度強化學習和契約激 勵的任務卸載與資源分配機制
(A Novel Mechanism Based on Deep Reinforcement Learning and Contract Incentives for Task Offloading and Resource Allocation in Mobile Edge Computing Environments)
摘要(中) 隨著物聯網(IoT)的快速發展,對計算敏感的終端設備數量顯著增加。透過將這
契約激勵機制。與傳統方法不同,DRL 能在不需要預先了解環境詳情的情況下運作。
DRL 通過學習和適應設計激勵機制,使得在動態和不確定環境中有效地激勵參與者完
摘要(英) With the rapid development of the Internet of Things (IoT), the number of computation-
sensitive end devices has significantly increased. By offloading the computational tasks
of these devices to edge servers, edge computing has demonstrated its benefits in reduc-
ing task latency and alleviating the computational burden on cloud servers. However,
indiscriminately offloading computational tasks may lead to inefficient use of edge server
resources, resulting in increased latency and higher computational costs. Therefore, de-
signing effective task offloading and resource allocation strategies to optimize latency and
energy consumption is currently a key research focus and challenge. Reducing the
task latency and computational burden on cloud servers has become an important is-
sue. Without appropriate incentive mechanisms, edge servers may be unwilling to share
resources, making the provision of suitable rewards crucial. Traditional incentive mech-
anisms, such as auction theory and Stackelberg games, rely on frequent information ex-
change, leading to high signaling costs. Considering the risk of privacy leaks, mobile
users may be reluctant to disclose private information, resulting in information asymme-
try between cloud platforms and edge servers. Previous research often assumed that cloud
platforms have complete information about edge servers, which is not the case in prac-
tice. This paper proposes a contract incentive mechanism based on deep reinforcement
learning (DRL). Unlike traditional methods, DRL can operate without prior knowledge
of the environment’s details. DRL learns and adapts to design incentive mechanisms,
effectively motivating participants to complete tasks in dynamic and uncertain environ-
ments, achieving the maximum utility of the cloud platform. The contributions of this
paper include proposing the joint resource allocation and computation offloading incen-
tive problem under information asymmetry, systematically analyzing the necessary and
sufficient conditions for optimal contracts, formulating the contract incentive problem
as a Markov decision process under incomplete information, and designing a deep deter-
ministic policy gradient (DDPG) method to obtain computation resource and incentive
reward strategies in high-dimensional action and state spaces.
關鍵字(中) ★ 激勵機制
★ 賽局
★ 契約理論
★ 強化學習
關鍵字(英) ★ Incentive mechanism
★ Game theory
★ Contract theory
★ Reinforcement Learning
論文目次 摘要i
Abstract ii
1 簡介1
2 背景與相關文獻探討4
2.1 計算卸載(Computing Offloading) 與資源分配(Resource Allocation) . . . 4
2.1.1 計算卸載(Computing Offloading) . . . . . . . . . . . . . . . . . . . 5
2.1.2 資源分配(Resource Allocation) . . . . . . . . . . . . . . . . . . . . 6
2.2 契約理論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 契約理論結合計算卸載之議題. . . . . . . . . . . . . . . . . . . . . 8
2.2.2 契約理論結合資源分配之議題. . . . . . . . . . . . . . . . . . . . . 10
2.3 強化學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 契約理論結合強化學習之背景. . . . . . . . . . . . . . . . . . . . . 14
3 研究方法15
3.1 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 傳輸模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 計算算能耗模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 契約模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 基於任務卸載的契約模型. . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 委託者效用函數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 參與者效用函數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.4 可行性條件(Feasibility Conditions) . . . . . . . . . . . . . . . . . 23
3.2.5 契約理論聯合卸載問題描述. . . . . . . . . . . . . . . . . . . . . . 24
3.2.6 可行性條件特性( Properties of Feasible Contract) . . . . . . . . . 25
3.2.7 最佳化契約問題描述. . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 小結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 深度強化式學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 強化學習模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 實驗與結果分析40
4.1 實驗對照組. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 實驗環境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 實驗參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.2 模型參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 超參數調整影響. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 學習率對於DDPG 的影響. . . . . . . . . . . . . . . . . . . . . . 45
4.3.2 衰減率對於DDPG 的影響. . . . . . . . . . . . . . . . . . . . . . 47
4.4 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 不同匯率之效用與成本比較. . . . . . . . . . . . . . . . . . . . . . 49
4.4.2 常態意願分佈下效用比較. . . . . . . . . . . . . . . . . . . . . . . 51
4.4.3 均勻意願分佈下效用比較. . . . . . . . . . . . . . . . . . . . . . . 53
4.4.4 雙眾數意願分佈下效用比較. . . . . . . . . . . . . . . . . . . . . . 56
5 結論與未來展望58
指導教授 胡誌麟(Chih-Lin Hu) 審核日期 2024-8-20
