摘要: | 本研究的預測目標是美國職棒大聯盟打者薪資,從打者歷年的打擊數據 (安打、得分、全 壘打、...)、守備數據 (刺殺、助殺、失誤、...)、其他紀錄 (年度、年資、年齡、出賽次數、先 發次數) 找出合適的自變數,將次年的薪資作為應變數,投入多個迴歸模型訓練。本研究以 2003-2014 年度紀錄投入訓練,預測 2015 年度過後美國職棒大聯盟打者將會獲得之薪資。 當中資料前置處裡做了三件事: 1. 排除了外援打者 (來自古巴聯賽、委內瑞拉職業棒球聯盟、多明尼加冬季棒球聯盟、...) 的數據。 2. 薪資取自然對數。 3. 原先數據僅記錄當年的表現數據 (打擊數據、守備數據)。變更為記錄最近五年來的表現 數據 (打擊數據、守備數據) 之加總。;This research aims to predict Major League Baseball batter’s salary. The batters’ batting records(H,R,HR,...), fielding records(PO,E,A,...) and other records(year, seniority,age,G,GS) are independent variables. With the help of feature engineering, we can find out the suitable feature variables which are fed for training a prediction model. This research uses the record from 2003-2014 as the dataset of a regression model that predicts batters’ salary after 2015. In data preprocessing we did three things: 1. Drop the international players(from Serie Nacional de Béisbol, Venezuelan Professional Baseball League, Dominicana Professional Baseball League,...) data. 2. Natural logarithm of salary. 3. Original data table record performance in each year(batting record, fielding record). However, we changed record method, use sum of last five years performance record(batting record, fielding record). |