多年來,資料中心的網路流量優化一直是一項熱門的研究議題,傳統流量優化演算法主要為以資料中心管理者的經驗法則與對網路環境的理解為基礎來實作。然而,隨著現在網路環境越加複雜且快速變化,傳統演算法可能無法適當的處理其中的流量。近年隨著強化學習的蓬勃發展,有許多的相關研究證實使用強化學習應用於網路流量控制的可行性。本研究提出可應用於資料中心流量控制的多代理人的強化學習框架,我們設計常見的拓僕作為模擬環境。利用網路最佳化中經常使用的效用函數作為強化學習中代理人的獎勵函數,再透過深度神經網路讓代理人學習如何最大化獎勵函數,藉此找出最佳的網路控制策略,此外,為了加強代理人於環境中的探索效率,我們在代理人的策略網路參數加入噪聲造成擾動。我們的實驗結果顯示兩件事:1) 當代理人以簡單的深度網路架構實作時,本框架效能亦不會有所損失使 2) 本框架可以達到接近傳統演算法的表現且不需要傳統演算法的必要假設。;Datacenter traffic optimization has been a popular study domain for years. Traditional methods to this problem are mainly based on rules crafted with datacenter operators’ experience and knowledge to the network environment. However, the traffic in a modern datacenter tends to be more complicated and dynamic, which may cause traditional method to fail. With the booming development in deep reinforcement learning, a number of research works have proven to be feasible to adopt deep reinforcement learning in the domain of traffic control. In this research, we propose a multi-agent reinforcement learning framework that can be applied to the problem of datacenter traffic control. The simulation environment is carefully designed to consist of popular topologies. With the reward function based on utility function that is often used for traffic optimization, our agents learn an optimal traffic control policy by maximizing the reward function with the deep neural network. Additionally, to improve the exploration efficiency of our agents in the environment, noise is introduced to perturb parameters of the agent’s policy network. Our experimental results show two points: 1) The performance of our framework does not downgrade when agents are implemented with a simple network architecture. 2) The proposed framework performs nearly as well as popular traffic control schemes without assumptions that are required by those traffic control schemes.