姓名 張孝宇(Hsiao-Yu Chang)  查詢紙本館藏   畢業系所 數學系
論文名稱 基於時間序列模型之MLB勝負預測
摘要(中) 棒球勝負是一個複雜且多變的問題,此問題受到眾多因素的影響,例
本研究所使用的資料取自 Baseball Reference 網站,從中獲取了 2011
年至 2022 年各隊伍的投手和打者統計數據,本研究將此數據集經過數據
預處理後,採用了 2013 年到 2022 年的資料,其中不包含 2020 年的資
本研究採用了循環神經網絡(Recurrent Neural Network, RNN)、長
短期記憶(Long Short-Term Memory, LSTM)、門控循環單元(Gated
Recurrent Unit, GRU)這三個時間序列模型,來做訓練並觀察其結果。
6 場預測下 1 場資料型態的結果,其準確率有 57% 左右,而 ROC 曲面
下面積則有 52% 左右。
摘要(英) Baseball winning or losing is a complex and dynamic problem, which is affected by many factors, such as player performance, team strength, playing field, and so on. When analyzing such problems in the past, time series models were not used for analysis, so this study attempts to use this type of model for data analysis.
The data used in this study were obtained from the Baseball Reference website, comprising statistical data for pitchers and batters of each team from 2011 to 2022. After data preprocessing, the study focused on the data from 2013 to 2022, excluding the data from 2020. Subsequently, the data was segmented based on individual games. The main objective was to utilize historical game data to predict future games. The study then presents the test results, and analyzes and discusses the factors influencing the prediction outcomes.
In this study, three time series models, namely Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), were employed for training and evaluating the results.
The final results were compared based on the presence or absence of feature selection, various model architectures, and data formats. Among them, the best-performing approach was using LSTM architecture without feature selection, where the model predicted the outcome of one game based on the previous six games. The accuracy achieved in this setting was around 57%, and the area under the ROC curve was around 52%.
關鍵字(中) ★ 美國職棒大聯盟
★ 時間序列預測
★ 特徵選取
關鍵字(英) ★ Major League Baseball
★ time series forecasting
★ feature selection
論文目次 摘要 ix
Abstract xi
誌謝 xiii
目錄 xv
一、 緒論 1
1.1 文獻回顧 ......................................... 1
1.2 動機與目的 ....................................... 2
1.3 研究方法 ......................................... 3
二、 論文背景知識 5
2.1 窗口法 ........................................... 5
2.2 循環神經網絡 ...................................... 6
2.3 長短期記憶 ........................................ 7
2.3.1 遺忘門(forget gate) ........................... 8
2.3.2 輸入門(input gate) ............................ 9
2.3.3 輸出門(output gate) ........................... 10
2.4 門控循環單元 ....................................... 11
2.4.1 重置門 .......................................... 12
2.4.2 更新門 .......................................... 13
2.5 特徵選取 ........................................... 14
2.5.1 單變量特徵選取(univariate feature selection)..... 15
2.5.2 互信息(Mutual Information, MI) ................. 16
2.5.3 特徵重要性(feature importance).................. 20
三、 實驗過程 23
3.1 實驗流程 ........................................... 23
3.2 資料收集 ........................................... 24
3.3 數據預處理 ......................................... 24
3.4 特徵選取 ........................................... 27
3.5 模型架構 ........................................... 28
四、 實驗結果 31
4.1 個別展示 ........................................... 31
4.2 綜合展示 ........................................... 37
五、 總結 41
參考文獻 43
指導教授 洪盟凱 審核日期 2023-7-24
