幼兒學習語音人機對話系統設計與實作

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：18.227.24.11

姓名

凌婉倩(Wan-Chien Ling) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

幼兒學習語音人機對話系統設計與實作
(Design and Implementation of Voice-Based Human-Machine Dialogue System for Early Childhood Learning)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-6-15以後開放)

摘要(中)

多元的閱讀活動能提昇幼兒的學習意願，透過聊天機器人的對話引導將能吸引幼兒注意力，增加閱讀的樂趣。市面上幼兒陪伴的機器人產品缺少能直接以自然語言互動的模式，而這些產品在面對幼兒不同的情緒反應時，將無法適時給予回饋。本研究提出能以自然語言互動的幼兒學習人機對話系統，為一個語音對話的機器人故事機。藉由一來一回的問答聊天，引導幼兒對童話故事的閱讀及理解。系統架構包含語音辨識、語意理解和文字轉語音，先以弱監督式學習的Whisper模型識別語音對話，再使用大型預訓練語言模型GPT-3.5以文字回應對話內容，最後使用Amazon Polly將文字回應轉換成自然的人類語音輸出。本研究驗證了基於多重AI引擎的幼兒互動學習對話框架，藉此提供後續幼兒學習應用系統的開發基礎。

摘要(英)

Diverse readings can enhance young children′s willingness to learn. Engaging in conversations with chatbots can attract their attention and increase the enjoyment of reading. Robot products available in the market for young children lack the ability to interact directly in natural language, and they are unable to provide appropriate feedback with various reactions from young children. In this study, we propose a natural language interactive human-machine dialogue system for early childhood learning, specifically a voice-based story-telling chatbot. The system guides young children in reading by back-and-forth conversations. The system performs speech recognition, semantic understanding, and text-to-speech conversion. We use Whisper model in speech recognition, and generate text responses based on GPT-3.5, and finally convert the text responses into speech through Amazon Polly. We demonstrate effectiveness of a multi-AI engine-based framework for interactive dialogue systems in early childhood learning, which can serve as a foundation for future applications.

關鍵字(中)

★ 幼兒學習
★ 語音機器人
★ Whisper
★ GPT-3.5

關鍵字(英)

論文目次

目錄
摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
程式碼目錄 VII
第一章、緒論 1
1.1研究背景 1
1.2研究目的 3
1.3論文架構 4
第二章、文獻回顧 5
2.1自動語音辨識 (Automatic Speech Recognition, ASR) 5
2.1.1混合式深層類神經網路結合隱藏式馬可夫模型(Hybrid Deep Neural Networks-Hidden Markov Model, DNN-HMM) 5
2.1.2基於注意力機制的端對端語音辨識系統 7
2.2文字轉語音(Text To Speech, TTS) 8
2.2.1 WaveNet 9
2.2.2基於注意力機制的端對端語音合成系統：Tacotron 9
2.3語意理解(Natural Language Understanding, NLU) 11
2.4預訓練語言模型(Pre-trained Language Model) 11
2.4.1基於Transformer的雙向編碼器表示式(Bidirectional Encoder Representations from Transformers, BERT) 11
2.4.2基於生成式預訓練(Generative Pre-Training, GPT)的大型語言模型 13
2.5 Whisper 13
第三章、幼兒人機對話系統設計 16
3.1 MIAT方法論 16
3.1.1 IDEF0階層式模組化設計 17
3.1.2 Grafcet離散事件建模 19
3.2幼兒人機對話系統架構設計 22
3.3離散事件建模 25
3.3.1語音轉文字 25
3.3.2文字內容理解及文字內容生成 26
3.4軟體高階合成 27
第四章、實驗結果 30
4.1實驗環境與實驗方法說明 30
4.1.1環境設定 30
4.1.2 語音轉文字、語意理解與文字生成實驗方法 31
4.1.3 文字轉語音實驗方法 32
4.2自動語音辨識功能驗證 33
4.3文字語意理解及文字內容生成功能驗證 33
4.4語音對話機器人功能驗證 39
第五章、結論與未來展望 45
5.1結論 45
5.2未來展望 46
參考文獻 47

圖目錄
圖2.1 DNN-HMM架構 6
圖2.2 LAS架構 8
圖2.3擴張因果卷積 9
圖2.4 CBHG模組 10
圖2.5 BERT Bidirectional架構 12
圖3.1 MIAT方法論架構 17
圖3.2 IDEF0 功能方塊 18
圖3.3 IDEF0階層式模組化設計 19
圖3.4 Grafcet離散事件建模範例 22
圖3.5語音對話機器人IDEF0系統架構圖 23
圖3.6自動語音辨識模組 23
圖3.7文字語意理解及文字內容生成模組 24
圖3.8語音對話機器人離散事件模型 25
圖3.9自動語音辨識離散事件模型 26
圖3.10文字語意理解及文字內容生成離散事件模型 27
圖4.1故事機阿麗與模擬幼兒回答的語音對話內容 40
圖4.2故事機阿麗回應幼兒的對話內容 40
圖4.3故事機阿麗正確理解內容 41
圖4.4故事機阿麗引導學習新知識 41

表目錄
表2. 1 wav2vec與Whisper在各種資料集表現的比較 14
表3.1 Grafcet基本元件介紹 20
表3.2離散事件模型各狀態功能描述與對應的函式名稱 27
表4.1實驗軟體平台 30
表4.2 OpenAI GPT-4、GPT-3.5、GPT-3、DALL·E收費及可處理的token上限 31
表4.3 Amazon Polly英文和中文的語音選擇 32
表4.4自動語音辨識功能驗證結果 33
表4.5文字語意理解及文字內容生成功能驗證結果 34
表4.6語音對話機器人功能驗證 42

程式碼目錄
程式碼3.1自動語音辨識Python程式碼 28
程式碼3.2自動語音辨識Python程式碼 28
程式碼3.3文字語意理解及文字內容生成Python程式碼 29

參考文獻

參考文獻
[1] J. Van Ravens, L. Crouch, K. Merseth King, E. A. Hartwig, and C. Aggio, “The Preschool Entitlement: A Locally Adaptable Policy Instrument to Expand and Improve Preschool Education,” RTI Press, RTI Press Occasional Paper No. OP-0082-2301, 2023, doi: 10.3768/rtipress.2023.op.0082.2301.
[2] I. Palaiologou, “The early years foundation stage: Theory and practice,” 3rd ed., London: Sage, 2016.
[3] 美國幼兒教育協會NAEYC, 戰略方向-直至2026年的工作指引, 2022. [Online]. Available: https://www.naeyc.org/sites/default/files/globally-shared/downloads/PDFs/our-work/strategicdirection_2022_cht.pdf (Feb. 4, 2023)
[4] 楊秋華, “學齡前幼兒語言能力與親子互動之相關研究”, 臺中市: 國立臺中教育大學幼兒教育學系碩士論文, 2014.
[5] 內政部戶政司全球資訊網, 人口統計資料, 2022. [Online]. Available: https://www.ris.gov.tw/app/portal/346 (Feb. 4, 2023)
[6] 張佳琪, “多元閱讀活動對幼兒閱讀動機與閱讀行為之影響之行動研究”, 屏東縣: 國立屏東教育大學碩士論文, 2016.
[7] Y. Xu, D. Wang, “Penelope Collins, Hyelim Lee, Mark Warschauer,Same benefits, different communication patterns: Comparing Children′s reading with a conversational agent vs. a human partner,” Computers & Education, Volume 161, Article 104059, 2021, doi: 10.1016/j.compedu.2020.104059.
[8] C. C. Liu, M. G. Liao, C. H. Chang, and H. M. Lin, “An analysis of children’ interaction with an AI chatbot and its impact on their interest in reading,” Computers & Education, Volume 189, Article 104576, 2022, doi: 10.1016/j.compedu.2022.104576.
[9] J.H. Han, M.H. Jo, V. Jones, and J.H. Jo, “Comparative study on theeducational use of home robots for children,” Journal of Information Processing Systems, Volume 4, Issue 4, pp.159-168, 2008, doi: 10.3745/JIPS.2008.4.4.159.
[10] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in International Conference on Learning Representations (ICLR), 2015, pp. 1-15.
[11] T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421,
[12] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, attend and spell,” CoRR, vol. abs/1508.01211, 2015.
[13] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: a generativemodel for raw audio,” arXiv preprint arXiv: 1609.03499, 2016.
[14] Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S, Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and Rif A. Saurous, “Tacotron: Towards end-to-endspeech synthesis,” Proceedings of the Interspeech 2017, pp. 4006–4010, 2017.
[15] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018.
[16] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Yusuke Iwasawa. “Large language models are zero-shot reasoners,” arXiv preprint arXiv:2205.11916, 2022.
[17] M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro, “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism,” arXiv preprint arXiv:1909.08053, 2019.
[18] D. Jacob, C. M. Wei, L. Kenton, T. Kristina, “Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171-4186.
[19] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Neural Information Processing Systems Online Conference (NeurIPS), 2020.
[20] OpenAI, ChatGPT: Optimizing Language Models for Dialogue, 2023. [Online]. Available: https://openai.com/blog/chatgpt/ (Feb. 5, 2023)
[21] M. U. Haque, I. Dharmadasa, Z. T. Sworna, R. N. Rajapakse, and H. Ahmad, “"I think this is the most disruptive technology" Exploring Sentiments of ChatGPT Early Adopters using Twitter Data,” arXiv preprint arXiv:2212.05856, 2022.
[22] Meimei Tuo, Baoxin Long, “Construction and Application of a Human-Computer Collaborative Multimodal Practice Teaching Model for Preschool Education,” Computational Intelligence and Neuroscience, vol. 2022, Article ID 2973954 , 2022, doi: 10.1155/2022/2973954.
[23] S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky,Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, “Deep voice: Real-time neural text-to-speech,” arXiv preprint arXiv:1702.07825, 2017.
[24] Chat completions - OpenAI API, 2023. [Online]. Available: https://platform.openai.com/docs/guides/chat/ (Feb. 6, 2023)
[25] 陳詩涵, ” AI機器人輔助情緒繪本教學對幼兒情緒能力之影響” , 屏東縣: 國立屏東教育大學碩士論文, 2019.
[26] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[27] G. E. Dahl, D. Yu, L. Deng and A. Acero, “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30-42, 2012, doi: 10.1109/TASL.2011.2134090.
[28] D. Wang, X. Wang, S. Lv, “An Overview of End-to-End Automatic Speech Recognition,” Symmetry, vol. 11, iss. 8, pp. 1018, 2019, doi: 10.3390/sym11081018.
[29] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, Ł u. Kaiser, and I. Polosukhin. “Attention is all you need,” Advances in Neural Information Proccessing Systems, pp. 5998–6008, arXiv preprint arXiv:1706.03762, 2017.
[30] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Applications, vol. 82, pp. 3713–3744, 2023, doi: 10.1007/s11042-022-13428-4.
[31] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning (ICML), pp. 369-376, 2006, doi: 10.1145/1143844.1143891.
[32] A. Graves, “Sequence Transduction with Recurrent Neural Networks,” CoRR, vol. abs/1211.3711, 2012.
[33] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems (NIPS), vol. 30, 2017
[34] OpenAI, “GPT-4 Technical Report,” arXiv preprint arXiv: 2303.08774, 2023.
[35] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” arXiv preprint arXiv:2212.04356, 2022.
[36] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, “High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things,” Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[37] R. J. Mayer, "IDEF0 function modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), Knowledge-Based System Inc, College Station, TX, 1992.
[38] R. David, "Grafcet: A powerful tool for specification of logic controllers," IEEE Transactions on control systems technology, vol. 3, no. 3, pp. 253-268, 1995.
[39] Create a ChatGPT Voice Assistant in 8 Minutes (Python Tutorial) , 2023. [Online]. Available: https://www.youtube.com/watch?v=8z8Cobsvc9k (Apr. 15, 2023)
[40] Create a ChatGPT & Bing Powered Voice Assistant with Python, 2023. [Online]. Available: https://www.youtube.com/watch?v=aokn48vB0kc (Apr. 15, 2023)
[41] Introduction - OpenAI API, 2023. [Online]. Available: https://platform.openai.com/docs/introduction/key-concepts (Apr. 22, 2023)
[42] Models - OpenAI API, 2023. [Online]. Available: https://platform.openai.com/docs/models/gpt-3-5 (Jun. 18, 2023)
[43] Text completion - OpenAI API, 2023. [Online]. Available: https://platform.openai.com/docs/guides/completion/prompt-design (Apr. 22, 2023)
[44] GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision, 2023. [Online]. Available: https://github.com/openai/whisper (May 5, 2023)
[45] 文字轉換語音軟體 – Amazon Polly – Amazon Web Services, 2023. [Online]. Available: https://aws.amazon.com/tw/polly/ (May 5, 2023)
[46] Chocolatey Software | Installing Chocolatey, 2023. [Online]. Available: https://chocolatey.org/install (May 5, 2023)
[47] Pricing - OpenAI API, 2023. [Online]. Available: https://openai.com/pricing (Jun. 18, 2023)
[48] OpenAI API, 2023. [Online]. Available: https://platform.openai.com/tokenizer (May 6, 2023)

指導教授

陳慶瀚(Pierre Chen)

審核日期

2023-6-27

推文