抗菌肽(AMPs)由於抗生素的失效,已成為下一代有潛力的抗菌化合物。在AMP開發中評估溶血活性是一個重大挑戰,因為它直接影響治療的安全性和有效性,需要在抗菌效力和最小副作用之間取得平衡。在本研究中,我們開發了一種綜合方法,將分子動力學(MD)模擬與機器學習(ML)技術相結合,以預測AMP引起的人類紅細胞(HRBC)溶血活性。利用粗粒度MD(CG-MD)和傘形取樣(US)模擬AMPs的膜插入過程,我們構建了綜合的自由能剖面和動態特性,並將其整合到ML模型中。為了提高外推能力並發現卓越的AMPs,我們應用了以下策略:(i)基於溶血和非溶血的極值構建訓練數據庫。(ii)利用CG-MD模擬進行ML訓練,提供基於物理的新信息以改進外推能力。(iii)使用序列相似性低於70%的代表性AMPs,避免數據過度集中。我們使用LightGBM(LGBM)學習器的預測ML模型在訓練(內插)和測試(外推)數據集上均展現出高準確性(>90%)。ML模型的外推能力首先在與訓練集序列相似性低於40%的測試集中進行評估。我們的預測ML模型在開發卓越AMPs方面展現出適用性,同時在訓練集外的新AMPs上也展示了顯著的泛化性。儘管當訓練集的規模顯著減少時性能略有下降,我們的方法仍然保持穩健。我們的研究表明,AMPs在水相和膜相之間的分配在溶血活性中起著關鍵作用。此外,我們的方法也適用於開發ML模型來預測抗菌活性,發現卓越的AMPs,並解決具有有限可用數據的生物系統問題。;Antimicrobial peptides (AMPs) have emerged as promising next-generation antibacterial compounds due to the failure of antibiotics. Assessing hemolytic activity in AMP development poses a significant challenge as it directly impacts therapeutic safety and efficacy, necessitating a balance between antibacterial potency and minimal side effects. Here, we developed a comprehensive approach integrating Molecular Dynamics (MD) simulations with machine learning (ML) techniques to predict AMP-induced hemolytic activity in human red blood cells (HRBC). Utilizing Coarse-Grained MD (CG-MD) and Umbrella Sampling (US) to simulate the membrane insertion of AMPs, we construct comprehensive free energy profiles and dynamical properties for integration into ML models. To enhance extrapolative capacity and discover exceptional AMPs, we applied the following strategies: (i) Constructing a training database based on extrema of hemolysis and non-hemolysis. (ii) Utilizing CG-MD simulations for ML training to provide new physics-based information for improved extrapolative capability. (iii) Using representative AMPs with sequence identity lower than 70% at the extrema to avoid overly concentrated data. Our predictive ML models, utilizing the LightGBM (LGBM) learner, exhibit high accuracy (> 90%) on both training (interpolative) and testing (extrapolative) datasets. The extrapolative capability of the ML models was first assessed on a test set with sequence identity below 40% to the training set. Our predictive ML models demonstrate applicability in developing exceptional AMPs. They also exhibit significant generalizability, as assessed by new AMPs outside the training set. Although performance slightly decreases when the size of the training set is significantly reduced, our approach remains robust. Our work shows the partition of AMPs between aqueous and membrane phases play a crucial role in hemolytic activity. Additionally, our methodology is applicable for developing ML models to predict antimicrobial activity, discover exceptional AMPs, and address biological systems with limited available data.