參考文獻 |
[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
[2] J. Redmon, and A. Farhadi, “YOLO9000: better, faster, stronger,” In Pro-ceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263-7271, 2017.
[3] J. Redmon, A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[4] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” In bmvc, vol. 1, no. 3, pp. 6, Sep. 2015.
[5] Y. C. Wu, P. C Chang, C. Y. Wang, and J. C. Wang, “Asymmetrie Kernel Convolutional Neural Network for acoustic scenes classification,” In 2017 IEEE International Symposium on Consumer Electronics (ISCE), pp. 11-12, Nov. 2017.
[6] R. Plutchik, “Emotions and life: Perspectives from psychology, biology, and evolution,” American Psychological Association, 2003.
[7] A. Mehrabian, “Framework for a comprehensive description and measure-ment of emotional states,” Genetic, social, and general psychology mono-graphs, 1995.
[8] Plutchik′s wheel of emotions,
https://zh.wikipedia.org/wiki/File:Plutchik-wheel.svg
[9] M. M. Bradley, and P. J. Lang, “Measuring emotion: the self-assessment manikin and the semantic differential,” Journal of behavior therapy and ex-perimental psychiatry, vol. 25, no. 1, pp. 49-59, 1994.
[10] P. Welch, “The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodo-grams,” IEEE Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70-73, 1967.
[11] S. S. Stevens, and J. Volkmann, “The relation of pitch to frequency: A re-vised scale,” The American Journal of Psychology, vol. 53, no. 3, pp. 329-353, 1940.
[12] B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling,” In ISMIR, vol. 270, pp. 1-11, Oct. 2000.
[13] ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 201 108, v1.1.3, Sep. 2003.
[14] ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ES 202 050, v1.1.5, Jan. 2007.
[15] N.Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
[16] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[17] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review, v. 65, no. 6, pp. 386–408, 1958.
[18] N. Rochester, J. Holland, L. Haibt, W. Duda, “Tests on A Cell Assembly Theory of the Action of the Brain, Using A Large Digital Computer, ” IRE Transactions on information Theory, vol. 2, no. 3, pp. 80-93, 1956.
[19] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological cybernetics, vol. 36, no. 4, pp. 193-202, 1980.
[20] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[21] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[22] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[23] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.
[24] M. Chen, X. He, J. Yang, and H. Zhang, “3-D convolutional recurrent neural networks with attention model for speech emotion recognition,” IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444, 2018.
[25] C. Busso, M. Bulut, C. C. Lee, A.Kazemzadeh, E.Mower, S.Kim, and S. S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, 2008.
[26] S. Tripathi and H. Beigi, "Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning." arXiv preprint arXiv:1804.05788, 2018.
[27] I. Lawrence, and K. Lin, “A concordance correlation coefficient to evaluate reproducibility,” Biometrics, pp. 255-268, 1989.
|