自動學習開源教育知識庫內容的分類方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：3.145.18.135

姓名

王馬赫(W K Tharanga Mahesh Gunarathne) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

自動學習開源教育知識庫內容的分類方法
(AUTOMATED LEARNING CONTENT CLASSIFICATION FOR OPEN EDUCATION REPOSITORIES)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

我們認知學習開源教育知識庫(OER)對於教育品質的提升是常好的策略與機會。目前來說，學生、教師或研究人學者是可以透過教材內容的關鍵字的邏輯組合在通用的檢索引擎中尋找資源。但大多數的檢索引擎並沒有辦法準確找到合適的學習內容。這個研究最主要的目的是提出一個用於開放式教育知識庫的自動化的學習內容分類機制。目前MERLOT II (www.merlot.org)是個擁有大量用戶作為獲取或上傳資源的學習平台。因此我們以MERLOT II實驗的場域。
第一個的階段，我們提出基於學習內容知識庫(LOR)使用者的單一檢索關鍵字的分層知識圖，實現透過增強式學習內容(LO)檢索引擎的並以數據視覺化去引導使用者獲得合適的學習內容。使用者可以透過此系統網頁進行單一關鍵字的檢索，並獲得一個視覺化的分層知識圖。本系統的後端具備資料提取、資料轉換、資料內聚及資料視覺化的功能。此視覺化的檢索結果表明，本系統能幫助使用者用單一關鍵字進行檢索以獲得學習內容庫的清晰概述。
下一個階段，我們重新定位原始計畫，提出一個自動學習開源教育知識庫內容的分類方法。開源教育知識庫主要的價值取決於能透過網頁檢索引擎進行檢索或定位。目前MERLOT II知識庫要求資源提供者在上傳時必須手動選擇其所屬相關的學科類別，這種作法非常耗時，而且容易有人為疏失。如果選擇了不正確的分類，知識庫中就會發生未存入正確類別的情況。可能導致MERLOT的智慧檢索或進階檢索時學習資源並不會被列出。以上的調查。我們發現開發一個開源知識庫的內容自動分類方案的重要性。資料集是採用MERLOT蒐集資料並採用廣為周知的分類方法，如：Logistic Regression、 (Multinomial) Naive Bayes、Linear Support Vector Machine及Random Forest進行初步實驗以測試準確性。我們提出自動學習內容分類模組(LCCM)將學習資源進行其相關學科的分類，並將其添加入MERLOT知識庫中。本階段的目標包含資料集準備、資料預處裡、使用LDA主題模型的特徵擷取並使用預先訓練的詞彙嵌入矩陣計算語意的相似度。這些方法是可以在短時間內更有效率對學習資源進行分類的基礎。

摘要(英)

Open Educational Resources deliver a strategic opportunity to improve the quality of education. At present, OER users, students, instructors, and scholars can find OERs from general search engines through metadata enrichment and logic extrapolation. Yet, most users of Web search engines today face difficulties when searching for decent and appropriate learning materials. The main goal of this study is to propose an automated learning content classification for Open Education Repositories. Since MERLOT II (www.merlot.org) is used by a large number of users to obtain learning resources and to submit resources, the MERLOT II repository was designated as an experimental domain.
In the initial phase, we inspired to propose an enhanced learning object (LO) search engine solution together with a data visualization feature to navigate LOs through a hierarchical knowledge graph based on a single search keyword for LOR users. A Web-based solution was implemented where users could execute a single keyword search and then visualize results on a hierarchical knowledge graph. The back-end of the system was designed with the functions of data extraction, data transformation, data clustering, and data visualization to accomplish our objectives. The outcome of the search and data visualization results indicate that the proposed approach can help users to get a clear overview of the LOs based on a single keyword search.
In the next phase, we repositioned with our original plan of proposing an automated learning content classification for Open Education Repositories. The value of OERs mainly depends on how easy they can be searched or located through a web search engine. Currently, the MERLOT II metadata repository requests resource providers to choose the relevant discipline category manually while adding material to its repository. This practice appears very time-consuming and also bound to involve human errors. If a member picks an incorrect discipline category, then the learning resource may not be correctly categorized in the repository. This situation may result in a learning resource not being shortlisted for a given keyword search of the "MERLOT Smart Search" or in the "Advanced search." Above investigations motivated us to recognize the importance of developing an automated learning content classification solution for OER repositories. The dataset was arranged using the MERLOT data collection and carried out the initial experiments with the well-known classifiers: Logistic Regression, (Multinomial) Naive Bayes, Linear Support Vector Machine, and Random Forest to test the accuracy. An automated learning content classification model (LCCM) was proposed to classify learning resources into relevant discipline categories while adding them to the MERLOT repository. The research goal incorporated in this phase includes dataset preparation, data preprocessing, feature extraction using the LDA topic model, and calculating the semantic similarity using a pre-trained word embedding matrix. These methods serve as a base for classifying learning resources more effectively within a short time.

關鍵字(中)

★ 開源教育知識庫
★ 學習內容
★ 搜尋引擎
★ 資料視覺化
★ 資料擷取
★ 資料轉換
★ 自動學習內容分類
★ 主題模型
★ 多重標籤分類
★ 自動學習內容分類

關鍵字(英)

★ Open Education Resources (OER)
★ Learning objects
★ search engine
★ data visualization
★ data extraction
★ data transformation
★ clustering
★ probabilistic topic models
★ multi-label classification
★ Automatic learning object classification

論文目次

Contents

摘要............................................................................................................................................................................................i
Abstract………………………………………………………………………………………………...................................ii
Acknowledgement iv
Contents……………………………………………………………………………………………………………………..vi
List of Figures viii
List of Tables ………………………………………………………………………………………………………………ix
Explanation of Symbols x
Chapter 1 Introduction 1
1.1 First research issue with MERLOT II – Searching a Material 1
1.1.1 Research Contribution - 1 2
1.2 Second research issue with MERLOT II – Adding Materials 3
1.2.1 Research Contribution - 2 4
Chapter 2 Literature Review 5
2.1 Learning Objects and Learning Object Repositories 5
2.1.1 OERCommons 6
2.1.2 MERLOT II 6
2.1.3 Open Stax CNX 6
2.2 Data Pre-Processing 6
2.3 Probabilistic Topic Models 7
2.4 Word2vec 8
2.5 Ida2vec 9
2.6 Text Classification 9
2.5.1 k-Nearest Neighbour Classiﬁers 9
2.5.2 Decision Tree Classifier 10
2.5.3 Naive Bayes Classiﬁers 10
2.7 Multi-label Document Classification 11
2.8 Performance Measures 12
2.9 Performance Measures - Multi-label classification 14
Chapter 3 Learning Object Search Engine Solution MERLOT II 18
3.1 Introduction 18
3.1.1 Exploring the research issue with MERLOT Smart Search 19
3.1.2 Research Contribution 20
3.2 Proposed Methodology 21
3.2.1 Data Extraction and Preparation 23
3.2.2 Data Transformation 23
3.2.3 Clustering Algorithm Implementation 25
3.2.4 Information Visualization Implementation of Data Visualization 29
3.3 Experimental Setup 31
3.4 Evaluation of Clustering Results 32
3.5 Discussion 33
3.5.1 Methodology Strength 33
3.5.2 Methodological Limitations 34
3.6 Conclusion and Future works 34
Chapter 4 Automated Learning Content Classification Solution – MERLOT II 36
4.1 Introduction 36
4.1.1 Research issue with MERLOT II – Adding Material 36
4.1.2 Research Contribution - 2 38
4.2 Phase -1 Proposed Methodology 39
4.2.1 Approaching the (Multi-Class) Classification Techniques 39
4.2.2 Data Collection 40
4.2.3 Data Preprocessing 42
4.2.4 Experimental Setup 43
4.2.5 Results and Discussions 43
4.2.6 Performance Evaluation with sample Results 44
4.3 Phase 2 - Proposed Methodology 47
4.4 Experiment Design and Evaluation 48
4.4.1 Data Collection 49
4.4.2 Data Preprocessing 49
4.4.3 Implementation of Probabilistic Topic Models 49
4.4.4 Implementation of LDA 51
4.4.5 Model classification based on similarity scores 52
4.3.3 Results and Discussion 54
Chapter 5 Conclusion and Future works 58
REFERENCE 61

參考文獻

[1] T. Caswell, S. Henson, M. Jensen, and D. Wiley, “Open content and open educational resources: Enabling universal education,” Int. Rev. Res. Open Distribution. Learn., vol. 9, no. 1, 2008.
[2] E. Tovar and N. Piedra, “Guest editorial: open educational resources in engineering education: various perspectives opening the education of engineers,” IEEE Trans. Educ., vol. 57, no. 4, pp. 213–219, 2014.
[3] D. White, M. Manton, and N. Warren, “Open Educational Resources: The value of reuse in higher education,” Creative Commons, 2011.
[4] E. Tovar, H. Chan, and S. Reisman, “Promoting MERLOT Communities Based on OERs in Computer Science and Information Systems,” in Computer Software and Applications Conference (COMPSAC), IEEE 41st Annual, vol. 2, pp. 700–706, 2017.
[5] Learning Technology Standards Committee, “Approved Working Draft of the IEEE Learning Technology Standards Committee (LTSC),” Learning Object Metadata Working Group, IEEE P1484, 2000.
[6] N. Piedra, J. Chicaiza, J. López, E. Tovar, and O. Martinez, “Finding OERs with social-semantic search,” in Global Engineering Education Conference (EDUCON), IEEE, pp. 1195–1200, 2011.
[7] T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web,” Sci. Am., vol. 284, no. 5, pp. 34–43, 2001.
[8] N. Piedra, E. Tovar, R. Colomo-Palacios, J. Lopez-Vargas, and J. Alexandra Chicaiza, “Consuming and producing linked open data: the case of Opencourseware,” Program, vol. 48, no. 1, pp. 16–40, 2014.
[9] J. Lopez-Vargas, N. Piedra, J. Chicaiza, and E. Tovar, “OER Recommendation for Entrepreneurship Using a Framework Based on Social Network Analysis,” IEEE Rev. Iberoam. Tecnol. del Aprendiz., vol. 10, no. 4, pp. 262–268, 2015.
[10] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Proc. Workshop at ICLR, 2013.
[11] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” Proc. NIPS, 2013.
[12] T. Mikolov, W.Yih, and G. Zweig, “Linguistic Regularities in Continuous Space Word Representations,” Proc. NAACL HLT, 2013.
[13] M. Tomas, C. Kai, C. Greg, and D. Jeffrey, “Efficient estimation of word representations in vector space,” Computer Science, 2013.
[14] V. V. Raghavan, and S. K. M. Wong, “A critical analysis of vector space model for information retrieval.” Journal of the American Society for Information Science, vol. 37, pp. 279–87.16, 1986.
[15] S. Vaidya, and A. Jayshree, "Natural Language Processing Preprocessing Techniques," International Journal of Computer Engineering and Applications, Volume XI, Special Issue, www.ijcea.com ISSN 2321-3469, 2017.
[16] D. M. Blei, “Probabilistic topic models,” Communications of the ACM, 55(4), pp.77-84, 2012.
[17] G. Bettina, and H. Kurt, “Topic models: An R Package for Fitting Topic Model”, Journal of Statistical Software, vol. 40, No. 13, 2011.
[18] D. M. Blei, A. Y. Ng, and M. I. Jordan. “Latent dirichlet allocation.” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003
[19] C. K. Yau, A. Porter, N. Newman, and A. Suominen, “Clustering scientific documents with topic modeling,” Scientometrics, vol. 100, no. 3, pp. 767–786, 2014.
[20] “Topic Modeling with LDA and NMF” [Online] https://medium.com/ml2vec/topic-modeling-is-an-unsupervised-learning-approach-to-clustering-documents-to-discover-topics-fdfbf30e27df [Accessed: 15-Jan-2019]
[21] R. E. Schapire, and Y. Singer, “BoosTexter: A boosting-based system for text categorization,” Machine learning, 39(2-3), pp.135-168, 2000.
[22] C. E. Moody, “Mixing dirichlet topic models and word embedding to make lda2vec.” arXiv preprint arXiv:1605.02019, 2016.
[23] Y. Yang, “An evaluation of statistical approaches to text categorization.” Information Retrieval, vol. 1, pp. 69–90, 1999.
[24] K. A Vidhya. and G. Aghila. “A survey of Na¨ıve bayes machine learning approach in text document classiﬁcation,” International Journal of Computer Science and Information Security, 7, 206–211.15, 18, 24, 2010.
[25] R. Kumar, and R. Verma, “Classification Algorithms for Data Mining:A Survey”, In Engineering International Journal of Innovations and Technology (IJIET), vol. 1, Issue 2 , pp. 7-14, 2012.
[26] M. M. García, R. P. Rodríguez, L. A. Rifón, and M. V. Ferro, “Towards a multi-label classification of open educational resources,” In IEEE 15th International Conference on Advanced Learning Technologies, pp. 407-408, 2015.
[27] G. Moise, M. Vladoiu, and Z. Constantinescu, “MASECO: a multi-agent system for evaluation and classification of OERs and OCW based on quality criteria,” In E-Learning Paradigms and Applications, pp. 185-227, 2014.
[28] G. Tsoumakas, I. Katakis, “Multi-label Classification: An Overview,” International Journal of Data Warehousing and Mining, vol. 3, No. 3, pp. 1-13, 2007.
[29] M.R. Boutell, Luo J., Shen X., C.M. Brown, “Learning multi-label scene classification,” Pattern recognition, vol. 37, No. 9, pp. 1757-1771, 2004.
[30] A. Santos, A. Canuto, and A.F. Neto, “A comparative analysis of classification methods to multi-label tasks in different application domains,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 3, pp. 218-227, 2011.
[31] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining multi-label data,” In Data mining and knowledge discovery handbook, Springer, 2009.
[32] G. Tsoumakas, I. Katakis, and I. Vlahavas, I., “Random k-labelsets for multilabel classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, No. 7, pp.1079-1089, 2010.
[33] R. E Schapire, and Y. Singer, “BoosTexter: A boosting-based system for text categorization,” Machine learning, vol. 39, No. 2-3, pp.135-168, 2000.
[34] S. Godbole, and S. Sarawagi, “Discriminative methods for multi-labeled classification,” In Pacific-Asia conference on knowledge discovery and data mining, pp. 22-30, 2004.
[35] M. L. Zhang, and Z.H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern recognition, vol. 40, No. 7, pp.2038-2048, 2007.
[36] S. Burkhardt, and S. Kramer, “Online multi-label dependency topic models for text classification,” Machine Learning, vol. 107, No. 5, pp.859-886, 2018.
[37] J. Huang, H. Sun, J. Han, H. Deng, Y. Sun, and Y. Liu, “SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks,” in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 219–228, 2010.
[38] D. Húsek, J. Pokorn`y, H. vRezanková, and V. Snášel, “Web data clustering,” in Foundations of Computational Intelligence, vol. 4, pp. 325–353, 2009.
[39] Y. Bédard, T. Merrett, and J. Han, “Fundamentals of spatial data warehousing for geographic knowledge discovery,” Geogr. data Min. Knowl. Discov., vol. 2, pp. 53–73, 2001.
[40] R. Forsati, M. Mahdavi, M. Shamsfard, and M. R. Meybodi, “Efficient stochastic algorithms for document clustering,” Inf. Sci. (Ny)., vol. 220, pp. 269–291, 2013.
[41] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, and others, “A density-based algorithm for discovering clusters in large spatial databases with noise.,” in Kdd, vol. 96, no. 34, pp. 226–231, 1996.
[42] B. Rao and B. K. Mishra, “An approach to clustering of text documents using graph mining techniques,” Int. J. Rough Sets Data Anal., vol. 4, no. 1, pp. 38–55, 2017.
[43] G. Marchionini, “Exploratory search: from finding to understanding,” Commun. ACM, vol. 49, no. 4, pp. 41–46, 2006.
[44] M. O. Ward, G. Grinstein, and D. Keim, “Interactive data visualization: foundations, techniques, and applications” CRC Press, 2010.
[45] J. Ahn and P. Brusilovsky, “Adaptive visualization of search results: Bringing user models to visual analytics,” Inf. Vis., vol. 8, no. 3, pp. 167–179, 2009.
[46] E. Clarkson, K. Desai, and J. Foley, “Resultmaps: Visualization for search interfaces,” IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6, 2009.
[47] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp.53-65, 1987.
[48] G. Maheshwari, P. Trivedi, H. Sahijwani, K. Jha, S. Dasgupta, and J. Lehmann, “SimDoc: Topic Sequence Alignment based Document Similarity Framework.” In Proceedings of the Knowledge Capture Conference, pp. 1-8, 2017.

指導教授

施國琛(Timothy K. Shih)

審核日期

2020-7-24

推文