透過機器學習預測美國職棒大聯盟球員薪資

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：55

、訪客IP：3.147.127.55

姓名

李承祐(Cheng-Yu Lee) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

透過機器學習預測美國職棒大聯盟球員薪資
(Using Machine Learning to predict salaries of Major League Baseball players)

相關論文

★ 在社群網站上作互動推薦及研究使用者行為對其效果之影響	★ 以AHP法探討伺服器品牌大廠的供應商遴選指標的權重決定分析
★ 以AHP法探討智慧型手機產業營運中心區位選擇考量關鍵因素之研究	★ 太陽能光電產業經營績效評估－應用資料包絡分析法
★ 建構國家太陽能電池產業競爭力比較模式之研究	★ 以序列採礦方法探討景氣指標與進出口值的關聯
★ ERP專案成員組合對績效影響之研究	★ 推薦期刊文章至適合學科類別之研究
★ 品牌故事分析與比較-以古早味美食產業為例	★ 以方法目的鏈比較Starbucks與Cama吸引消費者購買因素
★ 探討創意店家創業價值之研究- 以赤峰街、民生社區為例	★ 以領先指標預測企業長短期借款變化之研究
★ 應用層級分析法遴選電競筆記型電腦鍵盤供應商之關鍵因子探討	★ 以互惠及利他行為探討信任關係對知識分享之影響
★ 結合人格特質與海報主色以類神經網路推薦電影之研究	★ 資料視覺化圖表與議題之關聯

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

美國職棒大聯盟(MLB, Major League Baseball)是全世界具有龐大關注度的運動之
一，近年來除了關注球員以及球隊的表現外，球員的薪資也是球迷討論中的焦點之一，
總會引起球迷的討論，也會開始檢視該球員的表現是否真的符合他的身價。
所以如何評估球員薪資的依據一直是很熱門的話題，最直接的依據就是球員在比賽
中的成績表現，除了球員本身在賽場上所呈現的數據表現外，許多學者也提出一些可能
會影響球員薪資的變數。目前已經有許多關於大聯盟薪資的研究，影響薪資的原因有很
多種，甚至有學者將球員分成投手與打者兩者進行分析。
因此本研究致力於將球員當年度的薪資與隔年度的薪資漲幅做區間，利用機器學習
的方法，如極限梯度提升(XGBoost)、支援向量機(SVM)與 K 鄰近法(KNN)建構分類
(Classificaition)預測模型，除了建構預測球員薪資漲幅的模型，也利用極限梯度提升去
驗證我們在本研究所新增的變數，結果顯示本研究所新增的變數可以做為預測薪資的依
據。

摘要(英)

Major League Baseball is one of the most watched sports in the world. In recent years, in
addition to focusing on the performance of a player and his team, a player′s salary has also been a
focus of fan discussion, always generating discussion and beginning to examine whether a player′s
performance really matches his worth.
Therefore, how to evaluate the salary of players has always been a hot topic. The most direct basis
is the performance of players in the game. In addition to the statistical performance of players on
the field, many scholars have also proposed some variables that may affect the salary of players. At
present, there have been many studies on the salary of major league baseball, and there are many
reasons for the influence of salary. Some scholars even divide the players into pitcher and hitter for
analysis.
Therefore, this study focused on the players into the compensation to the annual salary increase do
interval, using machine learning methods, such as limit gradient (XGBoost) and support vector
machine (SVM) and K Nearest Neighbor (KNN) to do a classficiation prediction model, in addition
to build models of forecasting player salary increase, also use limit gradient to validate our new
variables in this research institute, the results show that the new variables can be predicted as salary
in our study.

關鍵字(中)

★ 美國職棒
★ 限梯度提升
★ 支援向量機
★ 鄰近法
★ 薪資預測
★ 分類

關鍵字(英)

★ MLB
★ XGBoost
★ SVM
★ KNN
★ Predicting Salaries
★ Classification

論文目次

中文摘要................................................................................................ i
ABSTRACT......................................................................................... ii
目錄...................................................................................................... iii
圖目錄................................................................................................... v
表目錄.................................................................................................. vi
第一章緒論......................................................................................... 1
1-1 研究背景.................................................................................................................1
1-2 研究動機.................................................................................................................2
1-3 研究目的...............................................................................................................3
1-4 論文結構...............................................................................................................5
第二章文獻探討................................................................................. 6
2-1 美國職棒薪水變數的文獻探討..............................................................................6
第三章研究方法............................................................................... 13
3-1 研究設計...............................................................................................................13
3-2 分類模型...............................................................................................................14
3-2-1 極限梯度提升(XGboost)...................................................................................14
3-2-2 支援向量機(SVM)...........................................................................................16
3-2-3 K 鄰近算法(KNN)............................................................................................17iv
第四章研究分析............................................................................... 19
4-1 美國職棒概述.......................................................................................................19
4-2 資料來源與資料集...............................................................................................22
4-3 資料預處理...........................................................................................................27
4-4 結果驗證...............................................................................................................30
4-4-1 XGBoost 模型預測結果.....................................................................................30
4-4-2 SVM 模型預測結果...........................................................................................37
4-4-3 KNN 模型預測結果...........................................................................................41
4-5 準確度的比較.......................................................................................................46
第五章結論與建議........................................................................... 47
5-1 研究結論...............................................................................................................47
5-2 研究限制與建議...................................................................................................48
參考資料............................................................................................. 49

參考文獻

[1] 林玉凡. (2015). 改變棒球的大數據統計. Retrieved from
https://group.dailyview.tw/article/detail/280
[2] 林柏辰. (2015). 自由球員實施 40 年最高薪資漲逾 100 倍.
[3] 張佑生. (2019). 「最強吸金機」波拉斯 MLB 經紀界之王. Retrieved from
https://udn.com/news/story/6813/4255420
[4] 陳重嘉. (2013). 從洋基的補強談 MLB 的豪華稅制. Retrieved from
https://tw.sports.yahoo.com/blogs/mlb/從洋基的補強談 mlb 的豪華稅制
-010258425.html
[5] Adankon, M. M., & Cheriet, M. J. P. R. (2009). Model selection for the LS-SVM.
Application to handwriting recognition. 42(12), 3264-3270.
[6] Baumer, B. S., Jensen, S. T., & Matthews, G. J. J. J. o. Q. A. i. S. (2015). openWAR: An
open source system for evaluating overall player performance in major league
baseball. 11(2), 69-84.
[7] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and
regression trees: CRC press.
[8] Brown, M. (2019).
https://www.forbes.com/sites/maurybrown/2019/02/11/inside-the-numbers-the-pl
ayer-salary-battle-lines-between-mlb-and-the-mlbpa/#44e659ee5c14. Retrieved
from
https://www.forbes.com/sites/maurybrown/2019/02/11/inside-the-numbers-th
e-player-salary-battle-lines-between-mlb-and-the-mlbpa/#44e659ee5c14
[9] Calandra, W. (2020). The MLB Has A Competitive Balance Issue, And It’s Related To
Money And Payroll Inequalities. Retrieved from
https://georgetownvoice.com/2020/02/18/the-mlb-has-a-competitive-balance-iss
ue-and-its-related-to-money-and-payroll-inequalities/
[10] Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper
presented at the Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining.
[11] Chen, W.-H., Hsu, S.-H., Shen, H.-P. J. C., & Research, O. (2005). Application of SVM
and ANN for intrusion detection. 32(10), 2617-2634.
[12] Cherkassky, V., & Ma, Y. J. N. n. (2004). Practical selection of SVM parameters and
noise estimation for SVM regression. 17(1), 113-126.
[13] Dinerstein, M. J. R. M. (2007). Free Agency and Contract Options: How Major League
Baseball Teams Value Players. 1, 2007.
[14] Friedman, J. H. (2001). Greedy function approximation: a gradient boosting
machine. Annals of statistics, 1189-1232.
[15] Gatto, T. (2020). MLB payrolls 2020: Five takeaways from reported salary figures.
Retrieved from
https://www.sportingnews.com/us/mlb/news/mlb-payrolls-2020-salary-luxury-t
ax-yankees-dodgers/1n9r4tfqs5ycu1w2pksgajfh70
[16] Hakes, J. K., & Turner, C. J. J. o. P. A. (2011). Pay, productivity and aging in Major
League Baseball. 35(1), 61-74.
[17] Hochberg, D. (2011). The Effect of Contract Year Performance on Free Agent Salary in 50
Major League Baseball.
[18] Holmes, P. J. L. E. (2011). New evidence of salary discrimination in major league
baseball. 18(3), 320-331.
[19] James, B. (1988). The Bill James historical baseball abstract: Random House
Incorporated.
[20] Li, X., Wu, S., Li, X., Yuan, H., & Zhao, D. J. C. J. o. M. E. (2020). Particle Swarm
Optimization-Support Vector Machine Model for Machinery Fault Diagnoses in
High-Voltage Circuit Breakers. 33(1), 1-10.
[21] Magel, R., & Hoffman, M. (2015). Predicting salaries of major league baseball players.
International Journal of Sports Science, 5(2), 51-58.
[22] Meltzer, J. J. A. S. U. (2005). Average salary and contract length in Major League
Baseball: When do they diverge?
[23] Palmer, M. C., & King, R. H. J. E. E. J. (2006). Has salary discrimination really
disappeared from major league baseball? , 32(2), 285-297.
[24] Rottenberg, S. J. J. o. p. e. (1956). The baseball players′ labor market. 64(3), 242-258.
[25] Scully, G. W. J. T. A. E. R. (1974). Pay and performance in major league baseball.
64(6), 915-930.
[26] Strobl, C., Malley, J., & Tutz, G. J. P. m. (2009). An introduction to recursive
partitioning: rationale, application, and characteristics of classification and
regression trees, bagging, and random forests. 14(4), 323.
[27] Torlay, L., Perrone-Bertolotti, M., Thomas, E., & Baciu, M. J. B. i. (2017). Machine
learning–XGBoost analysis of language networks to classify patients with epilepsy.
4(3), 159-169.
[28] Weinberger, K. Q., & Saul, L. K. J. J. o. M. L. R. (2009). Distance metric learning for
large margin nearest neighbor classification. 10(Feb), 207-244.
[29] Zhang, M.-L., & Zhou, Z.-H. J. P. r. (2007). ML-KNN: A lazy learning approach to
multi-label learning. 40(7), 2038-2048.

指導教授

許秉瑜(Ping-Yu Hsu)

審核日期

2022-6-29

推文