||Flight delays can be caused by many reasons. Some factors are controllable such as factors relating to airlines’ factors, airport ground handling, aircraft maintenance, improper flight scheduling. On the other hand, there are some uncontrollable factors, such as weather, air traffic control, mechanical failure. For the related studies of flight delays, very few explore the use of data mining methods. This research focuses on an airline corporation and the main factors to the cause of the delay of Taipei flight are collected from 2004 to 2014 as the dataset. Data mining techniques are used to discover useful information about flight delays and can provide some guidelines for the company and academia about the delay factors.|
The experiments were conducted by WEKA3.6.10. The information focuses on annual departure of airlines from 2004 to 2013, and the Class Label design is based on the flight delay. In addition, two feature selection methods are used to select representative features from the dataset, which are information gain and the genetic algorithm. The decision trees (C4.5 and CART), support vector machine (SVM), and multiple classifiers by bagging and boosting are developed as the prediction models for comparison. Furthermore, the data of 2014 are used to validate some better prediction models.
Our research has evidently showed that using the training data of 2004 flight information and highly predictable model is the most accurate research method. The increased quantity of the data and the performances of the prediction methods have presented contrasting results, which means that higher quantity data will result in the loss of the predictability of the airlines. According to the incorrect prediction of airline delays, our logical explanation has concluded that when the delayed of flights has been incorrectly predicted, it results in the massive loss of production cost. This research has identified the better prediction models of flight delays for the airline companies. We have found that the greatest cause of the delayed of airlines based on our prediction models is due to the lack of regular maintenance on the machineries. We should perform regular machinery check-ups and reorganize airline schedules in order to prevent future accidents and effectively reduce the operation time and flight delayed time.
1.周加恩 (2012) ，「網路安全偵測之分類效能提昇」，國防大學理工學院 資訊工程學系資訊工程碩士班碩士學位論文。
2.林明勳 (2013) ，「自動化觸控面板製造品質預測模式之研究」國立中山大學資訊管理學系碩士論文。
3.侯育周 (2007) ，「隨機性班機到離延誤下動態機門指派之研究」，國立中央大學土木工程學系碩士論文。
4.洪振富 (2010) ，「距離式特徵於資料自動分類之研究」國立中央大學資訊管理學系碩士論文。
5.范有寧、黃聖祐、陳靜枝 (2010) ，「運用資料探勘輔助商品分類之需求預測方法」，資訊管理學報第十七卷專刊。
6.高棋楠 (2012) ，「資料探勘技術建構財務危機公司預警模式之研究」國立中正大學會計與資訊科技研究所碩士論文。
7.張耀明 (1999) ，「台灣城際旅行時間可靠度之分析與量測研究」，國立交通大學交通運輸研究所碩士論文。
8.陳彥琴 (2005) ，「應用灰色理論預測新上市之生技保健食品銷售量」，國立成功大學工業與資訊管理學系碩士在職專班論文。
9.黃意真 (2000) ，「班機延誤賠償之研究」，國立交通大學交通運輸研究所碩士論文。
10.楊正三、莊麗月、陳禹融、楊正宏 (2008) ，「利用資訊增益與瀰集演算法於基因微陣列之特徵選取與分類問題」，資訊科技國際研討會論文集。
11.葉建良 (2006) ，「利用CART分類與迴歸樹建立消費者信用貸款違約風險評估模型之研究-以國內A銀行為例」天主教輔仁大學應用統計研究所碩士論文。
12.廖學華 (2005) ，「以加權隨機子空間法為基礎之多重分類器系統」國立臺中教育大學教育測驗統計研究所碩士論文。
13.蔡世昌 (2012) ，「航空網路中航班延誤之因果模式」，國立交通大學交通運輸研究所博士論文。
14.蕭舜益 (2005) ，「運用關聯法則探勘於初等教育資料分析 – 以體適能為例」朝陽科技大學資訊管理系碩士論文。
1.Abdelghany, K.F., Shah, S.S., Raina, S., Abdelghany, A.F.,(2004). A model for projecting flight delays during irregular operation conditions. Journal of Air Transport Management 10, 385-394.
2.AhmadBeygi, S., Cohn, A., Guan, Y., Belobaba, P., (2008). Analysis of the potential for delay propagation in passenger airline networks. Journal of Air Transport Management 14, 221-236.
3.Brighton, H. and Mellish, C. (2002) Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, vol. 6, pp. 153-172.
4.Burges, C.J.C. (1998).A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167.
5.C. J. C. Burges, (1998) .A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, Vol. 2, No. 2.
6.C.-F. Tsai, (2009) . Feature selection in bankruptcy prediction, Knowledge-Based Systems, Vol. 22, No. 2, pp. 120-127.
7.C.-F. Tsai, McGarry, K., and Tait, J. (2006) CLAIRE: a modular support vector image indexing and classification system. ACM Transactions on Information Systems, vol. 24, no. 3, pp. 353-379.
8.Cavcar, A., Cavcar, M., (2004) .Impact of aircraft performance characteristics on air traffic delays. Turkish Journal of Engineering and Environmental Sciences 28, 13-23.
9.Cook, A.J., Tanner, G., 2011. European airline delay cost reference values:Final Report. Eurocontrol, Brussels, Belgium.
10.Dasgupta, A., Drineas, P., Harb, B., Josifovski, V., and Mahoney, M.W. (2007) . Feature selection methods for text classification. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230-239.
11.Donald K, Andreas K, Charles R, (2012), Flight Delay Prediction, Master’s Thesis Nr. 49 Systems Group, Department of Computer Science, ETH Zurich in collaboration with Amadeus IT Group SA.
12.Fayyad, U. (1996), Data Mining and Knowledge Discovery in Databases, Communications of the ACM, 39, 11, 22-25.
13.Green, T., (2002) .Evaluating airline schedules for improved operational dependability, American Airlines.
14.Kotsiantis, S.B.“Supervised Machine Learning: A Review of Classification Techniques,＂Informatica (31:1), 2007, pp. 249-268.
15.Quinlan. J. R., “Induction of decision trees,” Machine Learning, No. 1, pp. 81-106, (1986)
16.Reinartz, T. (2002) A unifying view on instance selection. Data Mining and Knowledge Discovery, vol. 6, pp. 191-210.
17.Sikora Riyaz, Piramuthu Selwyn, (2007) . Framework for efficient feature selection in genetic algorithm based data mining, European Journal of Operational Research, Vol. 180, Issue 2, pp. 723-737.