以深度強化學習優化具時間窗考量之貨櫃電池運輸路徑;Using Deep Reinforcement Learning to Optimize Transport Routes for Battery Containers with Time Window Constraints

NCUIR > School of Management at National Central University > Graduate Institute of Industrial Management > Electronic Thesis & Dissertation > Item 987654321/97281

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/97281

Title:	以深度強化學習優化具時間窗考量之貨櫃電池運輸路徑;Using Deep Reinforcement Learning to Optimize Transport Routes for Battery Containers with Time Window Constraints
Authors:	林純亨;Lin, Chun-Heng
Contributors:	工業管理研究所
Keywords:	永續發展;再生能源;移動式貨櫃電池;強化學習演算法;車輛路徑規劃;sustainable development;renewable energy;containerized battery storage;reinforcement learning;vehicle routing optimization
Date:	2025-07-12
Issue Date:	2025-10-17 11:05:04 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著能源轉型與再生能源滲透率日益提升，電力系統面臨供需不穩與輸配靈活性不足等挑戰。為解決間歇性能源所帶來的即時調配需求，貨櫃式移動電池儲能系統成為具潛力的補充方案，其可靈活部署於需求端，提供去中心化的能源調度能力。然而，實際運用中需考量續航力限制、電池容量、需求異質性與時間窗等複雜條件，使得運輸調度問題具高度非線性與多重約束性，傳統啟發式方法在擴展性與實時性上難以勝任。本研究提出一套結合深度強化學習（Deep Reinforcement Learning, DRL）與多重限制建模之智能配送策略，針對具時間窗約束之貨櫃電池運輸任務進行建模與優化。研究採用 Proximal Policy Optimization（PPO）演算法，於自建模擬環境中進行多規模訓練，單一代理人需考量電力需求時段、電池資源分配與補電時機，動態學習多趟次配送決策。實驗設計涵蓋三種場景規模，並整合供應總量、總行駛距離、準時配送次數與分批配送策略作為效能評估指標。結果顯示，所提出之模型可於限定訓練回合內穩定學得具結構性的高績效策略，於小型問題中可完整達成全部任務，中型問題維持高任務覆蓋與良好準時率，大型問題則能在資源緊縮條件下自動生成具優先順序與補電安排之多趟配送路徑，展現高度彈性與可擴展性。綜上所述，本研究證實深度強化學習技術可有效應用於具複合限制之能源配送任務，提供高準時性、資源效率與策略可解釋性的調度方案，對於未來綠色能源運輸系統與智慧電網末端能源資源配置具有實務應用潛力與研究推進價值。 ;The increasing integration of renewable energy sources has heightened the need for flexible and efficient energy distribution systems. Mobile container-based battery storage offers a promising solution for bridging supply-demand gaps in decentralized networks. However, optimizing delivery under constraints such as vehicle range, battery capacity, heterogeneous demand, and time windows presents a highly complex scheduling problem. This study develops a deep reinforcement learning (DRL)–based approach using the Proximal Policy Optimization (PPO) algorithm to solve the time-window-constrained delivery routing problem for mobile battery units. A custom simulation environment is constructed to model multi-trip decisions involving recharging, demand coverage, and route planning under real-world constraints. Experiments were conducted across three problem scales, evaluating model performance based on total energy supplied, travel distance, on-time delivery count, and routing structure. Results demonstrate that the PPO agent consistently learns effective and interpretable delivery strategies. In small-scale scenarios, full demand coverage and zero delay are achieved. In medium and large-scale problems, the model adapts to resource limits by prioritizing deliveries, managing split routes, and maintaining high time window compliance. Overall, this research demonstrates the effectiveness of DRL in solving constrained energy delivery problems and provides a scalable scheduling framework for future applications in mobile energy systems and intelligent storage dispatch.
Appears in Collections:	[Graduate Institute of Industrial Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	2	View/Open

社群 sharing

Loading...