博碩士論文 107221014 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:58 、訪客IP:13.58.121.131
姓名 萬柏良(Bo-Liang Wan)  查詢紙本館藏   畢業系所 數學系
論文名稱 以機器學習方法預測美國職棒大聯盟打者薪資
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2025-6-30以後開放)
摘要(中) 本研究的預測目標是美國職棒大聯盟打者薪資,從打者歷年的打擊數據 (安打、得分、全
壘打、...)、守備數據 (刺殺、助殺、失誤、...)、其他紀錄 (年度、年資、年齡、出賽次數、先
發次數) 找出合適的自變數,將次年的薪資作為應變數,投入多個迴歸模型訓練。本研究以
2003-2014 年度紀錄投入訓練,預測 2015 年度過後美國職棒大聯盟打者將會獲得之薪資。
當中資料前置處裡做了三件事:
1. 排除了外援打者 (來自古巴聯賽、委內瑞拉職業棒球聯盟、多明尼加冬季棒球聯盟、...)
的數據。
2. 薪資取自然對數。
3. 原先數據僅記錄當年的表現數據 (打擊數據、守備數據)。變更為記錄最近五年來的表現
數據 (打擊數據、守備數據) 之加總。
摘要(英) This research aims to predict Major League Baseball batter’s salary. The batters’ batting
records(H,R,HR,...), fielding records(PO,E,A,...) and other records(year, seniority,age,G,GS)
are independent variables. With the help of feature engineering, we can find out the suitable
feature variables which are fed for training a prediction model. This research uses the record
from 2003-2014 as the dataset of a regression model that predicts batters’ salary after 2015.
In data preprocessing we did three things:
1. Drop the international players(from Serie Nacional de Béisbol, Venezuelan Professional
Baseball League, Dominicana Professional Baseball League,...) data.
2. Natural logarithm of salary.
3. Original data table record performance in each year(batting record, fielding record). However, we changed record method, use sum of last five years performance record(batting
record, fielding record).
關鍵字(中) ★ 美國職棒大聯盟
★ 機器學習
★ 薪資預測
關鍵字(英) ★ Major League Baseball
★ Machine Learning
★ Salary prediction
論文目次 摘要 i
Abstract ii
目錄 iii
圖目錄 v
1 緒論 1
1.1 研究動機 1
1.2 研究目的 1
1.3 研究問題 1
1.4 研究對象 2
2 背景知識 3
2.1 棒球數據 3
2.2 薪資仲裁制度 [12] 3
2.3 統計方法 3
2.3.1 皮爾森相關係數 r 3
2.3.2 共線性問題 [17] 4
2.4 線性迴歸模型 [2] 5
2.5 決策樹模型 [14] 6
2.6 集成學習 [14] 7
2.6.1 Bagging 7
2.6.2 Boosting[3] 7
2.6.3 Feature Importance[4] 13
2.7 判定係數 [5] 14
3 實驗 & 結果 16
3.1 實驗流程 16
3.2 預測效能 (機器學習訓練與測試) 26
4 結論 27
參考文獻 28
參考文獻 [1] Charu C. Aggarwal. Outlier Analysis. Springer Cham. ISBN:978-3-319-47577-6, (2017).
[2] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine
Learning. Cambridge University Press. ISBN:9781108679930, (2020).
[3] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The
Annals of Statistics, Oct., 2001, Vol. 29, No. 5, pp. 1189-1232, (2001).
[4] Joseph Gatto, Ravi Lanka, Yumi Iwashita, and Adrian Stoica. Single sample feature importance: An interpretable algorithm for low-level feature analysis. arXiv:1911.11901,
(2019).
[5] Stanton A. Glantz and Bryan K. Slinker. Primer of applied regression and analysis of
variance. Mcgraw-Hill. ISBN:978-0070234079, (1990).
[6] James Richard Hill and William Spellman. Pay discrimination in baseball: Data from the
seventies. Industrial Relations.23, 103-112, (1984).
[7] Martin J Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation
algorithms: Tutorial. DEBS ’17: Proceedings of the 11th ACM International Conference
on Distributed and Event-based Systems.9781450350655, (2017).
[8] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to
Statistical Learning with Applications in R. Springer Texts in Statistics. ISBN: 978-1-4614-
7138-7. (2013).
[9] James R. Lackritz. Salary evaluation for professional baseball players. The American
Statistician Vol. 44, No. 1, (1990).
[10] Sean Lahman. Lahman’s baseball database. https://www.seanlahman.com/, (2020).
[11] Don N. MacDonald and Morgan O. Reynolds. Are baseball players paid their marginal
products? Managerial and Decision Economics Vol. 15, No. 5, Special Issue: The Economics of Sports Enterprises, pp. 443-457, (1994).
[12] Major League Baseball. Salary Arbitration, (2022).
https://www.mlb.com/glossary/transactions/salary-arbitration.
[13] Gerald W. Scully. Pay and performance in major league baseball. American Economic
Review. vol. 64, issue 6, 915-30, (1974).
[14] C. Sheppard. Tree-based Machine Learning Algorithms: Decision Trees, Random Forests,
and Boosting. CreateSpace Independent Publishing Platform ISBN:9781975860974,
(2017).
[15] John W Tukey. Exploratory Data Analysis. Addison-Wesley. ISBN:978-0-201-07616-5,
(1977).
[16] Mehmet Barlas Uzun, Gülbin Özçelikay, and Gizem Aykaç Gülpınar. The situation
of curriculums of faculty of pharmacies in turkey. Marmara Pharmaceutical Journal.
21(24530):183-189, (2016).
[17] 蕭文龍. 多變量分析最佳入門實用書 (第二版). 碁峰 ISBN:9789861817347, (2009).
指導教授 洪盟凱 胡中興(John M. Hong Chung-Hsing Alex Hu) 審核日期 2022-7-7
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明