摘要(英) |
This research aims to predict Major League Baseball batter’s salary. The batters’ batting
records(H,R,HR,...), fielding records(PO,E,A,...) and other records(year, seniority,age,G,GS)
are independent variables. With the help of feature engineering, we can find out the suitable
feature variables which are fed for training a prediction model. This research uses the record
from 2003-2014 as the dataset of a regression model that predicts batters’ salary after 2015.
In data preprocessing we did three things:
1. Drop the international players(from Serie Nacional de Béisbol, Venezuelan Professional
Baseball League, Dominicana Professional Baseball League,...) data.
2. Natural logarithm of salary.
3. Original data table record performance in each year(batting record, fielding record). However, we changed record method, use sum of last five years performance record(batting
record, fielding record). |
參考文獻 |
[1] Charu C. Aggarwal. Outlier Analysis. Springer Cham. ISBN:978-3-319-47577-6, (2017).
[2] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine
Learning. Cambridge University Press. ISBN:9781108679930, (2020).
[3] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The
Annals of Statistics, Oct., 2001, Vol. 29, No. 5, pp. 1189-1232, (2001).
[4] Joseph Gatto, Ravi Lanka, Yumi Iwashita, and Adrian Stoica. Single sample feature importance: An interpretable algorithm for low-level feature analysis. arXiv:1911.11901,
(2019).
[5] Stanton A. Glantz and Bryan K. Slinker. Primer of applied regression and analysis of
variance. Mcgraw-Hill. ISBN:978-0070234079, (1990).
[6] James Richard Hill and William Spellman. Pay discrimination in baseball: Data from the
seventies. Industrial Relations.23, 103-112, (1984).
[7] Martin J Hirzel, Scott Schneider, and Kanat Tangwongsan. Sliding-window aggregation
algorithms: Tutorial. DEBS ’17: Proceedings of the 11th ACM International Conference
on Distributed and Event-based Systems.9781450350655, (2017).
[8] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to
Statistical Learning with Applications in R. Springer Texts in Statistics. ISBN: 978-1-4614-
7138-7. (2013).
[9] James R. Lackritz. Salary evaluation for professional baseball players. The American
Statistician Vol. 44, No. 1, (1990).
[10] Sean Lahman. Lahman’s baseball database. https://www.seanlahman.com/, (2020).
[11] Don N. MacDonald and Morgan O. Reynolds. Are baseball players paid their marginal
products? Managerial and Decision Economics Vol. 15, No. 5, Special Issue: The Economics of Sports Enterprises, pp. 443-457, (1994).
[12] Major League Baseball. Salary Arbitration, (2022).
https://www.mlb.com/glossary/transactions/salary-arbitration.
[13] Gerald W. Scully. Pay and performance in major league baseball. American Economic
Review. vol. 64, issue 6, 915-30, (1974).
[14] C. Sheppard. Tree-based Machine Learning Algorithms: Decision Trees, Random Forests,
and Boosting. CreateSpace Independent Publishing Platform ISBN:9781975860974,
(2017).
[15] John W Tukey. Exploratory Data Analysis. Addison-Wesley. ISBN:978-0-201-07616-5,
(1977).
[16] Mehmet Barlas Uzun, Gülbin Özçelikay, and Gizem Aykaç Gülpınar. The situation
of curriculums of faculty of pharmacies in turkey. Marmara Pharmaceutical Journal.
21(24530):183-189, (2016).
[17] 蕭文龍. 多變量分析最佳入門實用書 (第二版). 碁峰 ISBN:9789861817347, (2009). |