參考文獻 |
[1]. Mayer-Schönberger, V., & Cukier, K. (2013). Big data: a revolution that will transform how we live, work and think. London: John Murray.
[2]. Hilbert, M., and López, P. (2011). The World′s Technological Capacity to Store, Communicate, and Compute Information. Science, 332(6025), 60-65.
[3]. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3), 37-54.
[4]. Berry, M. J. A., and Linoff, G. (1997). Data Mining Techniques: for Marking, Sales, and Customer Support. New York, John Wiley and Sons Inc.
[5]. Kleissner, C. (1998). Data Mining for the Enterprise, Proceedings of the 31st Annual Hawaii International Conference on System Sciences. 7, 295-304.
[6]. Pyle, D. (1999). Data Preparation for Data Mining. Morgan Kaufmann, San Francisco.
[7]. Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. Data Mining and Knowledge Discovery Handbook, 853-867.
[8]. Kotsiantis, S., Kanellopoulos, D. and Pintelas, P. (2006). Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering. 30(1), 25-36.
[9]. Galar, M., Fernández, A., Barrenechea, E., Bustince, H. and Herrera, F. (2012). A review on ensembles for class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics – part C: Applications and Reviews, 42(4), 463–484.
[10]. Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo,J. Y., Baker, J. A. and Tourassi G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Netw., 21(2-3), 427-436.
[11]. Zhu, Z.-B., and Song, Z.-H. (2010). Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chem. Eng. Res. Des., 88(8), 936-951.
[12]. Liu, Y.-H., and Chen, Y.-T. (2005). Total margin-based adaptive fuzzy support vector machines for multiview face recognition. Proc. IEEE Int. Conf. Syst., Man Cybern., 2, 1704-1711.
[13]. Yin, L., Ge, Y., Xiao, K., Wang, X. and Quan, X. (2013). Feature selection for high-dimensional imbalanced data. Neurocomputing, 105, 3-11.
[14]. Liu, X.-Y., and Zhou, Z.-H. (2013). Ensemble Methods for Class Imbalance Learning. Imbalanced Learning: Foundations, Algorithms, and Applications, First Edition, 61-82.
[15]. Chawla, N.V. (2003). C4.5 and Imbalanced Data Sets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. In Workshop on Learning from Imbalanced Data Sets II.
[16]. Kubat, M. and Matwin, S. (1997.) Addressing the Curse of Imbalanced Training Sets: One-Side Selection. Proceedings of the Fourteenth International Conference on Machine Learning, 179-186.
[17]. Drummond, C. and Holte, R. C. (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Data Sets II, International Conference on Machine Learning.
[18]. Zadrozny, B. and Elkan, C. (2001) Learning and Making Decisions When Costs and Probabilities are Both Unknown. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 204-213.
[19]. Japkowicz, N. (2004) Concept-Learning in the Presence of Between-Class and Within-Class Imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, 67-77.
[20]. Domingos, P. (1999) MetaCost: a general method for making classifiers cost-sensitive. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 155-164.
[21]. Chawla, N. V., Lazarevic, A., Hall, L. O. and Bowyer, K. W. (2002) SMOTEBoost: Improving Prediction of the Minority Class in Boosting. , Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases, 107-119.
[22]. Wang, S. and Yao, X. (2009) Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symp. Comput. Intell. Data Mining, 324-331.
[23]. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108.
[24]. Anderberg, M.R. (1973) Cluster Analysis for Applications. Academic Press.
[25]. Bradley, A. P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145-1159.
[26]. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J. and Steinberg , D. (2007) Top 10 algorithms in data mining. Knowl. Inf. Syst., 14, 1-37.
[27]. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20-29.
[28]. Japkowicz, N., and Stephen, S. (2002). The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis, 6(5), 429-449.
[29]. Drummond, C., & Holte, R. C. (2003, August). C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II (Vol. 11).
[30]. Tomek, I. (1976). Two modifications of CNN. IEEE Trans. Syst. Man Cybern., 6, 769-772.
[31]. Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory IT-14, 515-516.
[32]. Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16, 321–357.
[33]. Weiss, G., (2004). Mining with rarity: A unifying framework.SIGKDD Explorations, 6(1), 7-19.
[34]. Cohen, W. W., (1995). Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, 115-123.
[35]. Raskutti, B. and Kowalczyk, A., (2004). Extreme rebalancing for svms: a case s tudy. SIGKDD Explorations, 6(1), 60-69.
[36]. Longadge, R., Dongre, S. S., and Malik, L. (2013). Class Imbalance Problem in Data Mining: Review. International Journal of Computer Science and Network, 2(1).
[37]. Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
[38]. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
[39]. Fix, E., & Hodges Jr, J. L. (1951). Discriminatory analysis-nonparametric discrimination: consistency properties. California Univ Berkeley.
[40]. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[41]. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
[42]. Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5(2), 197-227.
[43]. Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148-156).
[44]. López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141.
[45]. Provost, F., & Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning, 52(3), 199-215.
[46]. MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
[47]. Forgy, E. W., (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, 21, 768.
[48]. 陳景祥(2010)。R軟體:應用統計方法。臺北市:東華。 |