應用自動化文本分類及電子書推薦提升點擊率

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：80

、訪客IP：3.15.214.191

姓名

楊智凱(Zhi-kai Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

應用自動化文本分類及電子書推薦提升點擊率
(Applying Automatic Text Classification and E-book Recommendation to Improve Click Rate)

相關論文

★ 應用智慧分類法提升文章發佈效率於一企業之知識分享平台	★ 家庭智能管控之研究與實作
★ 開放式監控影像管理系統之搜尋機制設計及驗證	★ 資料探勘應用於呆滯料預警機制之建立
★ 探討問題解決模式下的學習行為分析	★ 資訊系統與電子簽核流程之總管理資訊系統
★ 製造執行系統應用於半導體機台停機通知分析處理	★ Apple Pay支付於iOS平台上之研究與實作
★ 應用集群分析探究學習模式對學習成效之影響	★ 應用序列探勘分析影片瀏覽模式對學習成效的影響
★ 一個以服務品質為基礎的網際服務選擇最佳化方法	★ 維基百科知識推薦系統對於使用e-Portfolio的學習者滿意度調查
★ 學生的學習動機、網路自我效能與系統滿意度之探討-以e-Portfolio為例	★ 藉由在第二人生內使用自動對話代理人來改善英文學習成效
★ 合作式資訊搜尋對於學生個人網路搜尋能力與策略之影響	★ 數位註記對學習者在線上學習環境中反思等級之影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來大量的開放式教育資源已經逐漸融入於學生各階段的學習過程中，開放式教育資源不但能幫助學生實現自我學習，也能減少教師的課前的備課時間，讓教師專注於解決學生在學習過程所遇到的學習困難。然而，隨著開放式教育資源的數量不斷增加，如何提升各類型教材的被使用程度來幫助學生精準獲得所需要的學習教材，成為開放式教育資源需要解決的問題。有鑑於此，本研究以教育部教育大市集為平台，應用機器學習方法及文本分類技術來提升各類型教材的被使用程度，進而幫助學生精準獲得所需要的學習教材。本研究將會透過比較多種不同的分類模型，從中選擇相對適合此資料集的分類模型並透過LDA進行特徵萃取找出最佳特徵集合，最後使用SGD和投票機制來做模型的修正和決定分類模型。實驗的環境本研究使用Spark進行分散式處理，並利用Cassandra資料庫系統儲存前處理過的資料，而利用隨機森林、支持向量機、邏輯迴歸、類神經網路…等分類器分類好的教材及推薦清單會儲存在MySQL關聯式資料庫系統，最後透過PHP以及JavaScript網頁技術進行使用者介面的推薦。

摘要(英)

In recent years, a large number of open educational resources have been gradually blend in the learning process of students at all stages. Open educational resources not only can help students achieve self-learning but also let teachers to reduce their lesson preparative time and focus solving the learning problems of students. However, with the number of open educational resources increasing in these year, how to help students precisely get the required learning materials has become a problem that open educational resources need to solve. In view of this, the study takes the Education Market as a platform. Apply machine learning methods and text classification techniques to improve the usage amount of various types teaching materials, and then helps students precisely get the required learning materials. The study will compare the various classification models to select the appropriate classification model for our data set. After that ,The study will use LDA feature extraction to find the best feature component. In addtion to LDA,the study use SGD and voting mechanisms to make model optimize and integrated the classification model. This study use Spark experimental environment to do distributed processing and use Cassandra database system to store pre-processed data. In our study, we apply multiple different classifiers, such as random forests, support vector machines, logistic regression, neural networks to classify teaching materials and establish recommendation list and the result will be stored in MySQL associative database system. Finally, we can through PHP and JavaScript web technologies to do recommend on everyone user interface.

關鍵字(中)

★ 文本分類
★ 自動化文本分類
★ 開放教材資源
★ 電子書
★ 機器學習

關鍵字(英)

★ Text classification
★ Automatic text classification
★ Open Educational Resources
★ E-book
★ Machine learning

論文目次

目錄
摘要 VI
ABSTRACT VII
圖目錄 X
表目錄 XI
一、緒論 1
二、文獻探討 2
2.1 多元分類(Multiclass Classification) 2
2.1.1類神經網路(Neural Network) 2
2.1.2邏輯迴歸(Logistic Regression) 3
2.1.3支持向量機(Support Vector Machine, SVM) 4
2.1.4隨機森林(Random Forests) 5
2.2 特徵選取(Feature Selection) 6
2.3 特徵萃取(Feature extraction) 7
2.4 梯度下降法(Gradient Descent) 8
三、系統設計 8
3.1 系統環境 9
3.2 系統架構 11
3.3 資料收集 13
3.3.1知識架構 13
3.3.2數學詞庫 14
3.3.3電子教材的文本內容 14
3.3.4資料前處理 15
3.3.4.1資料清理 15
3.3.4.2特徵截取 16
3.3.4.3資料轉換 16
3.4 資料儲存 17
3.5 資訊萃取與分析 17
3.5.1分類器的挑選 18
3.5.2資料集的降維 20
3.5.3分類器的優化 20
3.5.4分類器的結合 21
3.6 資訊應用 21
3.6.1 推薦清單的路徑建置 22
3.6.2 推薦教材的選取 22
四、實驗設計 23
4.1 訓練集與指標設計 24
4.2 模型參數設計 24
4.3 推薦清單點擊率設計 25
五、結果與討論 25
六、結論與未來研究 31
七、參考文獻 32

參考文獻

Atenas, J., & Havemann, L. (2014). Questions of quality in repositories of open educational resources: a literature review. Research in Learning Technology, 22.
Bosch, A., Zisserman, A., & Munoz, X. (2007, October). Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on (pp. 1-8). IEEE.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Chen, M., Mao, S., & Liu, Y. (2014). Big data: a survey. Mobile Networks and Applications, 19(2), 171-209.
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783-2792.
Davenport, T. H., & Prusak, L. (1998). Working knowledge: How organizations manage what they know. Harvard Business Press.
Dias, P., & Sousa, A. P. (1997). Understanding navigation and disorientation in hypermedia learning environments. Journal of educational multimedia and hypermedia, 6, 173-186.
Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
García-Peñalvo, F. J., García de Figuerola, C., & Merlo, J. A. (2010). Open knowledge: Challenges and facts. Online Information Review, 34(4), 520-539.
Gardner, J. W., Craven, M., Dow, C., & Hines, E. L. (1998). The prediction of bacteria type and culture growth phase by an electronic nose with a multi-layer perceptron network. Measurement Science and Technology, 9(1), 120.
John, G. H., & Langley, P. (1995, August). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (pp. 338-345). Morgan Kaufmann Publishers Inc..
Su, J., Shirab, J. S., & Matwin, S. (2011). Large scale text classification using semi-supervised multinomial naive bayes. In Proceedings of the 28th international conference on machine learning (icml-11) (pp. 97-104).
Izenman, A. J. (2013). Linear discriminant analysis. In Modern multivariate statistical techniques (pp. 237-280). Springer New York.
Lawrence, R. L., & Wright, A. (2001). Rule-based classification systems using classification and regression tree (CART) analysis. Photogrammetric engineering and remote sensing, 67(10), 1137-1142.
Liu, C., & Wechsler, H. (2002). Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Transactions on Image processing, 11(4), 467-476.
Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., & Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain–computer interfaces. Journal of neural engineering, 4(2), R1.
Manek, A. S., Shenoy, P. D., Mohan, M. C., & Venugopal, K. R. (2017). Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World wide web, 20(2), 135-154.
Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., & de Mendonça, A. (2011). Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC research notes, 4(1), 299.
Morgan, N., & Bourlard, H. (1990, April). Continuous speech recognition using multilayer perceptrons with hidden Markov models. In Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on (pp. 413-416). IEEE.
Nijhuis, J. A. G., Ter Brugge, M. H., Helmholt, K. A., Pluim, J. P. W., Spaanenburg, L., Venema, R. S., & Westenberg, M. A. (1995, November). Car license plate recognition with neural networks and fuzzy logic. In Neural Networks, 1995. Proceedings., IEEE International Conference on (Vol. 5, pp. 2232-2236). IEEE.
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Rokach, L., & Maimon, O. (2014). Data mining with decision trees: theory and applications. World scientific.
Samant, A., & Adeli, H. (2000). Feature extraction for traffic incident detection using wavelet transform and linear discriminant analysis. Computer‐Aided Civil and Infrastructure Engineering, 15(4), 241-250.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
Romero, C., López, M. I., Luna, J. M., & Ventura, S. (2013). Predicting students′ final performance from participation in on-line discussion forums. Computers & Education, 68, 458-472.
Shelton, B. E., Duffin, J., Wang, Y., & Ball, J. (2010). Linking open course wares and open education resources: creating an effective search and recommendation system. Procedia Computer Science, 1(2), 2865-2870.
Statnikov, A., Wang, L., & Aliferis, C. F. (2008). A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC bioinformatics, 9(1), 319.
Xanthopoulos, P., Pardalos, P. M., & Trafalis, T. B. (2013). Linear discriminant analysis. In Robust Data Mining (pp. 27-33). Springer New York.
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS transactions on computers, 4(8), 966-974.
Thaoroijam, K. (2014). A Study on Document Classification using Machine Learning Techniques. International Journal of Computer Science Issues (IJCSI), 11(2), 217.
Baldwin, R. A. (2009). Use of maximum entropy modeling in wildlife research. Entropy, 11(4), 854-866.
Yao, Y., Welp, T., Liu, Q., Niu, N., Wang, X., Britto, C. J., ... & Montgomery, R. R. (2017). Multiparameter single cell profiling of airway inflammatory cells. Cytometry Part B: Clinical Cytometry.
Grzymala-Busse, J. W., & Hu, M. (2000, October). A comparison of several approaches to missing attribute values in data mining. In International Conference on Rough Sets and Current Trends in Computing (pp. 378-385). Springer, Berlin, Heidelberg.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT′2010 (pp. 177-186). Physica-Verlag HD.
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model differences using latent variable mixture modeling: A comparison of relative model fit statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202-226.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500), 2323-2326.
Zhen, X., Zheng, F., Shao, L., Cao, X., & Xu, D. (2017). Supervised Local Descriptor Learning for Human Action Recognition. IEEE Transactions on Multimedia.
Li, Z., Liu, J., Tang, J., & Lu, H. (2015). Robust structured subspace learning for data representation. IEEE transactions on pattern analysis and machine intelligence, 37(10), 2085-2098.
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373-1396.
Wang, Q., Wu, Y., Shen, Y., Liu, Y., & Lei, Y. (2015). Supervised sparse manifold regression for head pose estimation in 3D space. Signal Processing, 112, 34-42.

指導教授

楊鎮華(Stephen J.H. Yang)

審核日期

2018-7-16

推文