A Hybrid Intervention Reinforcement Learning for Autonomous Driving under Imperfect Mentor Conditions

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/98292

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98292

題名:	A Hybrid Intervention Reinforcement Learning for Autonomous Driving under Imperfect Mentor Conditions
作者:	龎皓崙;Pang-haolun
貢獻者:	資訊管理學系
關鍵詞:	強化學習;人機協作學習;自動駕駛;知識蒸餾;多源學習;安全關鍵系統;Reinforcement Learning;Human-in-the-Loop;Autonomous Driving;Knowledge Distillation;Multi-source Learning;Safety-critical System
日期:	2025-07-21
上傳時間:	2025-10-17 12:35:39 (UTC+8)
出版者:	國立中央大學
摘要:	隨著自動駕駛技術成為現今重要的技術之一，如何在保證安全的前提下提升強化學習效率成為產業界面臨的核心挑戰。傳統強化學習方法依賴大量試錯探索，在安全關鍵應用中風險過高且樣本效率低落；現有人機協作方法則假設完美導師存在，忽略了實務環境中專家資源稀缺且品質參差的現實。為解決此問題，本研究提出混合介入強化學習（Hybrid Intervention Reinforcement Learning, HIRL）框架，創新性地整合學習代理、預訓練AI導師模型與人類領航員三種異質知識源。透過設計基於價值函數的介入機制取代傳統動作相似度判斷，並引入雙重準則（均值差異與方差閾值）的不確定性感知策略，使學習代理即使在不完美指導下仍能發展出優越決策能力。在MetaDrive模擬環境的全面實驗評估中，HIRL在50個測試場景達到85.6%成功率，相較最佳基準方法SAC提升14%；平均每回合安全違規降至0.68次，較傳統方法降低19%；訓練收斂僅需467回合，效率提升38%。特別值得注意的是，在中低品質導師（成功率78-85%）指導下，學習代理最終表現仍可達80-90%，證實了多源知識整合產生的互補效應。透過分析介入模式演化，發現人類介入率從訓練初期50%逐步降至5%，展現系統漸進式自主能力發展。本研究為資源受限環境下的智慧系統部署提供了實證基礎，對於物流配送路徑優化、智慧工廠機器人控制等需要平衡效率與安全的產業應用具有重要參考價值，並為下一代人機協作系統設計提供理論指導。;As autonomous driving technology advances toward practical deployment, achieving efficient reinforcement learning while ensuring safety remains a fundamental challenge for the industry. Traditional reinforcement learning methods rely on extensive trial-and-error exploration, posing unacceptable risks in safety-critical applications with poor sample efficiency. Existing human-in-the-loop approaches assume perfect mentor availability, overlooking the reality of scarce and variable-quality expert resources in practical environments. To address these limitations, this study proposes the Hybrid Intervention Reinforcement Learning (HIRL) framework, which innovatively integrates three heterogeneous knowledge sources: learning agents, pre-trained AI mentor models, and human navigators. HIRL enables learning agents to develop superior decision-making capabilities even under imperfect guidance. Comprehensive experimental evaluation in MetaDrive simulation environments across 50 test scenarios demonstrates that HIRL achieves 85.6% success rate. This research provides empirical foundations for intelligent system deployment in resource-constrained environments, offering significant insights for industrial applications requiring efficiency-safety balance such as logistics route optimization and smart factory robot control, while providing theoretical guidance for next-generation human-AI collaborative system design.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	39	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....