抗生素抗藥性威脅日益加劇,促使科學界積極發展各種計算方法以設計抗菌胜肽(AMP)。雖然以深度學習為基礎的生成式架構(如生成對抗網路GAN、變分自編碼器VAE及潛在擴散模型LDM)在新型抗菌胜肽的生成上取得一定成果,但在平衡抗菌活性、毒性與結構穩定性方面仍面臨諸多挑戰。本研究提出一個以強化學習為核心的框架,結合 AlphaZero 的自我對弈蒙地卡羅樹搜索(MCTS),以系統性地探索胜肽設計空間。透過整合多目標代理模型作為適應度函數(涵蓋抗菌活性、宿主毒性以及預測結構穩定性),本方法可同時優化多項關鍵性質。MCTS 框架有效平衡探索與利用,能高效產生多樣且新穎的 AMP 候選序列,並降低計算成本。實驗結果顯示,本方法不僅提升了產生之 AMP 的品質與多樣性,亦確保所設計序列具備結構與生物學上的相關性。此成果展現自我對弈強化學習作為多目標胜肽優化之彈性且有效工具的潛力,為抗藥性問題的解決開闢新途徑。;The accelerating threat of antimicrobial resistance has driven the development of computational methods for antimicrobial peptide (AMP) design. While deep learning-based generative frameworks such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Latent Diffusion Models (LDMs) have demonstrated success in de novo AMP generation, they face persistent challenges in balancing antimicrobial activity, hemolytic potential, and structural stability. In this study, we propose a reinforcement learning(RL)-based framework that leverages AlphaZero′s self-play Monte Carlo Tree Search (MCTS) to systematically explore the peptide design space. By integrating multi-objective surrogate models as fitness functions including antimicrobial potency, hemolysis risk, and predicted structural stability our method enables simultaneous optimization of key properties. The MCTS framework balances exploration and exploitation, efficiently generating diverse, novel AMP candidates while reducing computational costs. Experimental results show that our approach not only improves the quality and diversity of generated AMPs compared to state-of-the-art methods, but also ensures the structural and biological relevance of the designed sequences. This work demonstrates the potential of self-play RL as a flexible and effective tool for multi-objective peptide optimization, providing new avenues for combating antibiotic resistance.