參考文獻 |
[1]2000 hub5 english evaluation transcripts - linguistic data consortium.https://catalog.ldc.upenn.edu/LDC2002T43.
[2]The ami corpus.http://www.openslr.org/16.
[3]The association for computational linguistics and chinese language processing.http://www.aclclp.org.tw/use_mat_c.php#mat160.[4]Automatic speech recognition data collection with youtube v3 api,mask-rcnn and google vision api.https://towardsdatascience.com/automatic-speech-recognition-data-collection-with-youtube-v3-api-mask-rcnn-and-google-vision-api-2370d6776109.
[5]Avidemux - main page.http://avidemux.sourceforge.net/.
[6]Csr-i (wsj0) complete.https://catalog.ldc.upenn.edu/LDC93S6A.
[7]ffdshow tryouts | official website.http://ffdshow-tryout.sourceforge.net/.
[8]Free speech... recognition (linux, windows and mac) - voxforge.org.http://www.voxforge.org/.
[9]Free st american english corpus.http://www.openslr.org/45.
[10]Kdenlive | libre video editor.https://kdenlive.org/.
[11]Librispeech asr corpus.http://www.openslr.org/12.
[12]Mplayer - the movie player.http://www.mplayerhq.hu.
[13]Tatoeba: Collection of sentences and translations.https://tatoeba.org/.
[14]Vlc: Official site - free multimedia solutions for all os! - videolan.https://www.videolan.org/.
[15]xine - a free video player - home.xine-AFreeVideoPlayer-Home.
[16]Youtube.https://www.youtube.com.
[17]youtube-dl.https://youtube-dl.org/.
[18]youtube_dl 2021.4.26 on pypi - libraries.io.https://libraries.io/pypi/youtube_dl.
[19]R. Anup and L. Rob. Rfc2326: Real time streaming protocol (rtsp), 1998.
[20]Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, M. Kohler, JoshMeyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber.Common voice: A massively-multilingual speech corpus. InLREC, 2020.
[21]Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, JoshMeyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber.Common voice: A massively-multilingual speech corpus, 2020.
[22]T. Berners-Lee, R. Fielding, and H. Frystyk. Rfc1945: Hypertext transfer protocol– http/1.0, 1996.
[23]H. Bu, J. Du, X. Na, B. Wu, and H. Zheng. Aishell-1: An open-source mandarinspeech corpus and a speech recognition baseline. In2017 20th Conference of theOriental Chapter of the International Coordinating Committee on Speech Databasesand Speech I/O Systems and Assessment (O-COCOSDA), pages 1–5, 2017.
[24]Chia-Chen Chen, Tien-Chi Huang, James J. Park, Huang-Hua Tseng, and Neil Y.Yen. A smart assistant toward product-awareness shopping.Personal and UbiquitousComputing, 18(2):339–349, Feb 2014.
[25]Robert L. Cheng. A comparison of taiwanese, taiwan mandarin, and peking man-darin.Language, 61(2):352–377, 1985.
[26]P R Cohen and S L Oviatt. The role of voice input for human-machine communica-tion.Proceedings of the National Academy of Sciences, 92(22):9921–9927, 1995.
[27]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
[28]Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen,Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y.Ng. Deep speech: Scaling up end-to-end speech recognition, 2014.
[29]Kenneth Heafield. KenLM: Faster and smaller language model queries. InProceedingsof the Sixth Workshop on Statistical Machine Translation, pages 187–197, Edinburgh,Scotland, July 2011. Association for Computational Linguistics.
[30]Lucas Jo and Wonkyum Lee. goodatlas/zeroth.https://github.com/goodatlas/zeroth.
[31]Michael I. Jordan. Chapter 25 - serial order: A parallel distributed processing ap-proach. In John W. Donahoe and Vivian Packard Dorsel, editors,Neural-NetworkModels of Cognition, volume 121 ofAdvances in Psychology, pages 471–495. North-Holland, 1997.
[32]Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.International Conference on Learning Representations, 12 2014.
[33]Yun-Hsuan Kuo. New dialect formation: The case of taiwanese mandarin. 01 2005.
[34]Egor Lakomkin, Sven Magg, Cornelius Weber, and Stefan Wermter. KT-speech-crawler: Automatic dataset construction for speech recognition from YouTubevideos. InProceedings of the 2018 Conference on Empirical Methods in Natural Lan-guage Processing: System Demonstrations, pages 90–95, Brussels, Belgium, Novem-ber 2018. Association for Computational Linguistics.
[35]Lantian Li, Ruiqi Liu, Jiawen Kang, Yue Fan, Hao Cui, Yunqi Cai, RavichanderVipperla, Thomas Fang Zheng, and Dong Wang. Cn-celeb: multi-genre speakerrecognition, 2020.
[36]Zhang De Liang. Deep neural network for chinese speech recognition. Master’s thesis,2015.
[37]Josh Meyer. Multi-task and transfer learning in low-resource speech recognition,2019.
[38]Clément Le Moine and Nicolas Obin. Att-hack: An expressive speech database withsocial attitudes, 2020.
[39]V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. Librispeech: An asr corpusbased on public domain audio books. In2015 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), pages 5206–5210, 2015.
[40]Md. Wahidur Rahman, Rahabul Islam, Md. Mahmodul Hasan, Shisir Mia, and Mohammad Motiur Rahman. Iot based smart assistant for blind person and smart homeusing the bengali language.SN Computer Science, 1(5):300, Sep 2020.
[41]Anthony Rousseau, Paul Deléglise, and Yannick Estève. TED-LIUM: an automaticspeech recognition dedicated corpus. InProceedings of the Eighth International Con-ference on Language Resources and Evaluation (LREC’12), pages 125–129, Istanbul,Turkey, May 2012. European Language Resources Association (ELRA).
[42]D. E. Rumelhart and J. L. McClelland.Learning Internal Representations by ErrorPropagation, pages 318–362. 1987.
[43]M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks.IEEE Trans-actions on Signal Processing, 45(11):2673–2681, 1997.
[44]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017.
[45]Dong Wang and Xuewei Zhang. Thchs-30 : A free chinese speech corpus, 2015.
[46]YU Zong WU Yang. An extended hybrid end-to-end chinese speech recognition modelbased on cnn.Journal of Qingdao University of Science and Technology(NaturalScience Edition), 041(001):104–109,118, 2020. |