波動度是衡量金融資產風險的關鍵指標,如何有效估計這一無法直接觀察的隱變數,始終是金融領域的重要課題。傳統方法多仰賴特定模型假設進行推估,然而此類模型依賴性強,易受限制。本研究提出一種結合強化學習技術的非參數波動度估計方法,無需預設資產報酬的分佈或結構假設,即可動態捕捉波動度的變化行為。我們進一步將本方法應用於美國 S\&P 500、英國 FTSE 100、日本 NIKKEI 225、德國 DAXI、中國 SHCOMP 及臺灣 TAIEX 等主要股價指數,進行波動度預測與實證分析。為探討國際金融市場中風險的傳遞機制,亦使用格蘭傑因果關係檢定,分析各市場波動度間是否存在領先與落後關係。研究結果顯示,本方法能有效捕捉跨市場風險動態,為金融市場參與者提供一種新穎且實用的風險監控工具。;Volatility is a key indicator for assessing financial asset risk, and accurately estimating this unobservable latent variable has long been a central issue in the field of finance. Traditional approaches often rely on specific model assumptions, which can limit their flexibility and generalizability. This study proposes a novel non-parametric volatility estimation framework based on reinforcement learning, which eliminates the need for predefined distributional or structural assumptions. We apply the proposed method to major stock market indices—including the U.S. S\&P 500, U.K. FTSE 100, Japan’s NIKKEI 225, Germany’s DAXI, China’s SHCOMP, and Taiwan’s TAIEX—to conduct volatility forecasting and empirical analysis. Furthermore, we employ Granger causality tests to examine lead-lag relationships in volatility risk across these international markets, exploring the transmission dynamics of financial risk. The results demonstrate that our approach effectively captures cross-market volatility behavior, offering a novel and practical tool for market participants to monitor financial risk.