基於樹的集成方法在房屋銷售價格預測中的應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：2

、訪客IP：18.119.133.228

姓名

李思娜(Intan Lisnawati) 查詢紙本館藏

畢業系所

數學系

論文名稱

基於樹的集成方法在房屋銷售價格預測中的應用
(Tree-Based Ensemble Methods with an Application in House Sale Price Prediction)

相關論文

★ Probability on Trees and Networks	★ 探討 Heston模型下的參數校準：以外匯、台指選擇權為例
★ 一類具有非利普希茨漂移項的隨機微分方程之相變

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

给定一些输入，我们想要对相应的输出进行预测。为了对单个估计器做出更好的预测，集成方法结合了使用给定学习算法构建的几个基本估计器的预测。还可以调整每种方法的参数，以使真实值和预测值之间的差距更小。通过使用房屋销售价格训练数据集，我们应用一些集成方法来预测看不见的房屋销售价格数据集，并根据它们的均方根误差(RMSE) 值查看准确性。结果表明，Gradient Boosting 给出的RMSE 最小，为22,766 美元，同时随机森林为23,269 美元，XGBoost 为24,069 美元，决策树为35,637 美元

摘要(英)

Given some input, we want to make a prediction for the correspond- ing output. In order to make a better prediction over a single estimator, ensemble methods combine the predictions of several base estimators built with a given learning algorithm. Each method’s parameter also can be adjusted in order to get a closer gap between the real and the predicted value. By using House Sale Price training data set, we apply some ensem- ble methods to predict the unseen House Sale Price data set and see the accuracy based on their Root Mean Squared Error (RMSE) value. It shows that Gradient Boosting gives the smallest RMSE, US$ 22,766, meanwhile Random Forest US$ 23,269, XGBoost US$ 24,069, and Decision Tree US$ 35,637.

關鍵字(中)

★ 预测
★ 合奏法
★ 基础学习者
★ 损失函数

關鍵字(英)

★ prediction
★ ensemble method
★ base learner
★ loss function

論文目次

Abstract vi
Acknowledgement vii
Table of contents ix
List of Tables x
List of Figures xii
1 Introduction 1
1.1 Background 1
1.2 TheFounders 2
1.3 Goals 5
2 Decision Trees 5
2.1 DecisionTreesTerminology 5
2.2 RoughIdea 6
2.3 MathematicalFormulation 10
2.4 SimpleSimulation 12
3 Ensemble Methods 15
3.1 RandomForests 15
3.1.1 RoughIdea 15
3.1.2 Bootstrapping 15
3.1.3 RandomForestsAlgorithm 16
3.1.4 SimpleSimulation 17
3.2 GradientBoosting 18
3.2.1 RoughIdea 18
3.2.2 Gradient Points in The Direction of Maximum Increase 19
3.2.3 SimpleSimulation 21
3.2.4 Plugging Base Learner in Gradient Boosting 26
3.2.5 Gradient Boosting Algorithm 28
3.3 XGBoos 29
3.3.1 RoughIdea 29
3.3.2 XGBoostAlgorithm 35
4 Numerical Simulations 37
4.1 DataPreprocessing 37
4.1.1 About Ames 37
4.1.2 Exploratory Data Analysis 37
4.1.3 Feature Selection 41
4.2 The Benchmark 45
4.3 Numerical Simulation by using Decision Tree 47
4.3.1 The Tree’s Appearancein A DecisionTree 49
4.4 Numerical Simulation by using Random Forests 50
4.4.1 The Tree’s Appearance in Random Forests 52
4.5 Numerical Simulation by using Gradient Boosting 54
4.5.1 The Tree’s Appearance in Gradient Boosting 58
4.6 Numerical Simulation by using XGBoost 60
4.6.1 The Tree’s Appearance in XGBoost 63
5 Conclusion 65
References 66

參考文獻

[1] About ames. https://www.cityofames.org/about-ames/about-ames. (Accessed on 08/08/2022).
[2] Distributed (deep) machine learning common. http://dmlc.io. (Accessed on 08/08/2022).
[3]Jerome h. friedman: Applying statistics to data and ma- chine learning. https://www.historyofdatascience.com/ jerome-friedman-applying-statistics-to-data-and-machine-learning/. (Accessed on 08/08/2022).
[4] Jeremy Adler and Ingela Parmryd. Quantifying colocalization by correla- tion: The pearson correlation coefficient is superior to the mander’s overlap coefficient. Cytometry Part A, 77(8):733–742, 2010.
[5] Abbas Alharan, Radhwan Alsagheer, and Ali Al-Haboobi. Popular decision tree algorithms of data mining techniques: A review. International Journal of Computer Science and Mobile Computing, 6:133–142, 06 2017.
[6] E Chandra Blessie and E Karthikeyan. Sigmis: a feature selection algorithm using correlation based method. Journal of Algorithms & Computational Technology, 6(3):385–394, 2012.
[7] Louis-Ashley CAMUS. The explanation of the color circle around your profil !! https://www.kaggle.com/general/193193. (Accessed on 09/09/2022).
[8] Tianqi Chen. https://www.linkedin.com/in/tianqi-chen-679a9856/. (Accessed on 08/08/2022).
[9] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting sys- tem. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794, 2016.
[10] Dean De Cock. House prices - advanced regres- sion techniques. https://www.kaggle.com/competitions/ house-prices-advanced-regression-techniques. (Accessed on 08/08/2022).
[11] Adele Cutler. Remembering leo breiman. The Annals of Applied Statistics, 4(4):1621–1633, 2010.
[12] Nicholas I. Fisher. A conversation with jerry friedman. Statistical Science, 30(2):268–295, 2015.
[13] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232, 1999.
[14] Jerome Harold Friedman. Jerome h. friedman. https://jerryfriedman. su.domains. (Accessed on 08/08/2022).
[15] Jerome Harold Friedman. Vita. https://jerryfriedman.su.domains/ ftp/vita.pdf, December 2012. (Accessed on 08/08/2022).
[16] James Gareth, Witten Daniela, Hastie Trevor, and Tibshirani Robert. An Introduction To Statistical Learning: with Applications in R. Spinger, 2017.
[17] Map of Ames. https://www.istockphoto.com/vector/ iowa-outline-vector-map-usa-printable-gm1176116889-327779552. (Accessed on 08/08/2022).
[18] John Rice Peter Bickel, Michael Jordan. In memoriam leo breiman.
https://senate.universityofcalifornia.edu/_files/inmemoriam/ html/leobreiman.htm. (Accessed on 08/08/2022).
[19] Patrick Schober, Christa Boer, and Lothar A Schwarte. Correlation co- efficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5):1763–1768, 2018.
[20] Scikit-Learn. Decision trees — scikit-learn documentation. https: //scikit-learn.org/stable/modules/tree.html. (Accessed on 08/08/2022).
[21] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[22] Berkeley Statistics. In memory of leo breiman. https://statistics. berkeley.edu/about/memoriam/memory-leo-breiman. (Accessed on 08/08/2022).
[23] Richard E Williamson, Richard H Crowell, and Hale F Trotter. Calculus of Vector Functions. Prentice Hall, 1972.
[24] Hulin Wu, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2021.
[25] Zhi-Hua Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, 2012.

指導教授

須上苑(Shang-Yuan Shiu)

審核日期

2022-9-30

推文