基於隱含狄利克雷分布進行開放式問卷之主題導向文字探勘;Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/81205

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/81205

題名:	基於隱含狄利克雷分布進行開放式問卷之主題導向文字探勘;Topic-oriented Text Mining on Open-ended Questionnaires using Latent Dirichlet Allocation
作者:	陳昱儒;Chen, Yu-Ju
貢獻者:	資訊工程學系
關鍵詞:	學習成效;文字探勘;主題模型;隱含狄利克雷分布
日期:	2019-07-26
上傳時間:	2019-09-03 15:39:19 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，隨著教育政策的改變，國內各大學皆致力於提升學生的學習成效。而學習成效最常使用的評估方法，即是以教學問卷的方式於各學期對學生進行調查。為提供學生最直接的回饋管道，課程問卷除了針對特定項目的調查外，通常也包含了開放自由填答的意見欄，供學生填寫心得與建議。　　意見欄因為是由學生以文字填寫，故沒有既定的形式與規範，使得這些資料不能和一般數據資料一樣經過簡單的處理後就能進行分析。這些文字資料大多沒有一定的架構，並且因為是以人工填寫，有時甚至會出現用字或語法上的錯誤，本研究即是針對這些問卷中的非結構化之文字資料進行文字探勘。　　由於教學評量中的文字資料內容繁雜多樣且缺乏分類標註，使得監督式學習的分類方法難以應用於此，故本研究以非監督式學習的主題分析，探索隱含的主題分布。主題模型能在沒有分類標註與訓練資料的情形下，利用字詞於文檔中的分布模式找出主題，並將主題相近的文檔群聚在一起。本研究所使用的文字聚類方法，是以吉布斯採樣實踐隱含狄利克雷分布，並進一步以此模型對新進資料的主題分布進行分析。　　本研究對教學問卷中的文字資料進行主題分析，實現了初步的自動化文字資料分群。希望能提供問卷分析者更為便捷的分析方法，亦期望作能為日後自動化問卷文字資料分析的基礎。;As the education system evolved over the past few years, domestic universities are committed to improving students’ learning outcomes. The most common way of evaluating learning outcomes is through questionnaires, filled in by students at the midst and the end of each semester. To provide students a way to give more detailed feedbacks, these questionnaires usually contain a section for students to give comments through pure text. The comment section is designed for students to write any thoughts and opinions, there aren’t any restrictions or rules to how it should be written. These human-generated text are unstructured, and often contain writing mistakes and miss used words. With the lack of structure, it is hard for these text data to be processed as normal data using data mining techniques. Thus, we aim to analyze these text data from course evaluation questionnaires though text mining. Due to the miscellaneous content and the fact that there aren’t enough human-labeled data, it is hard to perform supervised classification methods on these text. Therefore, we use an unsupervised topic analysis technique to find the latent topic distribution of the data. Topic modeling can infer latent topic distributions and cluster similar documents without defining topic labels or train data beforehand. We perform topic modeling by implementing latent Dirichlet allocation (LDA) using Gibbs sampling, and further estimate unseen data with the LDA model. In this thesis, we imply topic analysis on the comment section of the course evaluation questionnaire. We believe that with this automatic topic modeling method, it would be more efficient for analysts to analyze text data in questionnaires. Moreover, future work on automatic questionnaire analysis can be built on this approach.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	184	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....