基於時間序列模型之MLB勝負預測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：18.222.218.204

姓名

張孝宇(Hsiao-Yu Chang) 查詢紙本館藏

畢業系所

數學系

論文名稱

基於時間序列模型之MLB勝負預測

相關論文

★ 氣流的非黏性駐波通過不連續管子之探究	★ An Iteration Method for the Riemann Problem of Some Degenerate Hyperbolic Balance Laws
★ 影像模糊方法在蝴蝶辨識神經網路中之應用	★ 單一非線性平衡律黎曼問題廣義解的存在性
★ 非線性二階常微方程組兩點邊界值問題之解的存在性與唯一性	★ 對接近音速流量可壓縮尤拉方程式的柯西問題去架構區間逼近解
★ 一些退化擬線性波動方程的解的性質.	★ 擬線性波方程中片段線性初始值問題的整體Lipchitz連續解的
★ 水文地質學的平衡模型之擴散對流反應方程	★ 非線性守恆律的擾動Riemann 問題的古典解
★ BBM與KdV方程初始邊界問題解的週期性	★ 共振守恆律的擾動黎曼問題的古典解
★ 可壓縮流中微黏性尤拉方程激波解的行為	★ 非齊次雙曲守恆律系統初始邊界值問題之整域弱解的存在性
★ 有關非線性平衡定律之柯西問題的廣域弱解	★ 單一雙曲守恆律的柯西問題熵解整體存在性的一些引理

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

棒球勝負是一個複雜且多變的問題，此問題受到眾多因素的影響，例
如球員表現、隊伍實力、比賽場地等等。在過去分析這類問題時，並未
使用時間序列模型來做分析，因此本研究嘗試使用這種類型的模型，用
於進行數據分析。
本研究所使用的資料取自 Baseball Reference 網站，從中獲取了 2011
年至 2022 年各隊伍的投手和打者統計數據，本研究將此數據集經過數據
預處理後，採用了 2013 年到 2022 年的資料，其中不包含 2020 年的資
料，之後，依場次進行切割，其目的是利用歷史比賽數據來預測未來比
賽的勝負，最後，觀察各隊伍訓練及測試之結果，並分析探討影響預測
結果的因素。
本研究採用了循環神經網絡（Recurrent Neural Network, RNN）、長
短期記憶（Long Short-Term Memory, LSTM）、門控循環單元（Gated
Recurrent Unit, GRU）這三個時間序列模型，來做訓練並觀察其結果。
最終結果是透過有無特徵選取，各個模型架構及資料形態下的結果，
來進行比較，其中最好的是，沒有做特徵選取，長短期記憶架構下，用前
6 場預測下 1 場資料型態的結果，其準確率有 57% 左右，而 ROC 曲面
下面積則有 52% 左右。

摘要(英)

Baseball winning or losing is a complex and dynamic problem, which is affected by many factors, such as player performance, team strength, playing field, and so on. When analyzing such problems in the past, time series models were not used for analysis, so this study attempts to use this type of model for data analysis.
The data used in this study were obtained from the Baseball Reference website, comprising statistical data for pitchers and batters of each team from 2011 to 2022. After data preprocessing, the study focused on the data from 2013 to 2022, excluding the data from 2020. Subsequently, the data was segmented based on individual games. The main objective was to utilize historical game data to predict future games. The study then presents the test results, and analyzes and discusses the factors influencing the prediction outcomes.
In this study, three time series models, namely Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), were employed for training and evaluating the results.
The final results were compared based on the presence or absence of feature selection, various model architectures, and data formats. Among them, the best-performing approach was using LSTM architecture without feature selection, where the model predicted the outcome of one game based on the previous six games. The accuracy achieved in this setting was around 57%, and the area under the ROC curve was around 52%.

關鍵字(中)

★ 美國職棒大聯盟
★ 時間序列預測
★ 特徵選取

關鍵字(英)

★ Major League Baseball
★ time series forecasting
★ feature selection

論文目次

摘要 ix
Abstract xi
誌謝 xiii
目錄 xv
一、緒論 1
1.1 文獻回顧 ......................................... 1
1.2 動機與目的 ....................................... 2
1.3 研究方法 ......................................... 3
二、論文背景知識 5
2.1 窗口法 ........................................... 5
2.2 循環神經網絡 ...................................... 6
2.3 長短期記憶 ........................................ 7
2.3.1 遺忘門（forget gate） ........................... 8
2.3.2 輸入門（input gate） ............................ 9
2.3.3 輸出門（output gate） ........................... 10
2.4 門控循環單元 ....................................... 11
2.4.1 重置門 .......................................... 12
2.4.2 更新門 .......................................... 13
2.5 特徵選取 ........................................... 14
2.5.1 單變量特徵選取（univariate feature selection）..... 15
2.5.2 互信息（Mutual Information, MI） ................. 16
2.5.3 特徵重要性（feature importance）.................. 20
三、實驗過程 23
3.1 實驗流程 ........................................... 23
3.2 資料收集 ........................................... 24
3.3 數據預處理 ......................................... 24
3.4 特徵選取 ........................................... 27
3.5 模型架構 ........................................... 28
四、實驗結果 31
4.1 個別展示 ........................................... 31
4.2 綜合展示 ........................................... 37
五、總結 41
參考文獻 43

參考文獻

[1] R. Jia, C. Wong, and D. Zeng, “Predicting the major league baseball season,” 2013.
[2] C. S. Valero, “Predicting win-loss outcomes in mlb regular season games –a comparative study using data mining methods,” International Journal of Computer Science in Sport, vol. 15, no. 2, pp. 91–112, 2016.
[3] T. Elfrink and S. Bhulai, “Predicting the outcomes of mlb games with a machine learning approach,” 2018.
[4] S.-F. Li, M.-L. Huang, and Y.-Z. Li, “Exploring and selecting features to predict the next outcomes of mlb games,” Entropy, vol. 24, no. 2, 2022.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[6] R. C. Staudemeyer and E. R. Morris, Understanding lstm – a tutorial into long short-term memory recurrent neural networks, 2019. arXiv: 1909.09586 [cs.NE].
[7] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. arXiv: 1412.3555 [cs.NE].
[8] K. Cho, B. van Merrienboer, C. Gulcehre, et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, 2014. arXiv: 1406.1078 [cs.CL].
[9] Time series forecasting, https://www.tensorflow.org/tutorials/structured_data/time_series.
[10] 張志勇, 人工智慧 / 張志勇, 廖文華, 石貴平, 王勝石, 游國忠編著, chi, 二版. 新北市: 全華圖書股份有限公司, 2021, ISBN: 9789865039226.
[11] 良. 龍, AI 黃金時期正好學: TensorFlow 2 高手有備而來 / 龍良曲著 (深智;DM2103), chi, 初版. 臺北市: 深智數位, 2021, ISBN: 9789865501716.
[12] J. van der Westhuizen and J. Lasenby, The unreasonable effectiveness of the forget gate, 2018. arXiv: 1804.04849 [cs.NE].
[13] A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Phys. Rev. E, vol. 69, p. 066 138, 6 Jun. 2004.
[14] B. C. Ross, “Mutual information between discrete and continuous data sets,” PLOS ONE, vol. 9, no. 2, pp. 1–5, Feb. 2014.
[15] Github—jldbc/pybaseball: Pull current and historical baseball statistics using python (statcast, baseball reference, fangraphs), https://github.com/jldbc/pybaseball.
[16] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2017. arXiv: 1412.6980 [cs.LG].

指導教授

洪盟凱

審核日期

2023-7-24

推文