摘要(英) |
According to statistics from the Taiwan′s Ministry of Health and Welfare in 2021,
There are about 1,198,000 people with physical disability certificates in Taiwan,
including 125,764 people with hearing impairment.
About 5% of the total population.
The hearing-impaired people often have many difficulties in oral pronunciation and learning due to the hearing impairment since childhood.
Therefore, sign language is often used as its main communication method.
And many sign language users are reading TV news, election debates and live press conferences, etc.
Media that relies heavily on hearing to absorb information can often only be read in subtitles, including election debates,
Regular meetings of public programs organized by the government such as epidemic prevention press conferences
Equipped with a sign language teacher, the sign language teacher will convert the content of the speaker′s spoken language into the form of sign language,
Make it easier for sign language users to understand the content.
However, because the number of sign language teachers is still limited, they can only be deployed in a few occasions. then,
How to enable the hearing-impaired to have the same experience as ordinary listeners,
A major issue for modern media.
This research combines technologies from two major fields in deep learning, natural language processing and gesture recognition.
Developed a system that can perform sign language translation in time and use virtual characters to make sign language gestures,
Using the 3D gesture recognition model to convert the single-word video of sign language into a gesture data set,
Use a third-party speech recognition service to recognize the user′s spoken language and convert it into Chinese sentences,
And use the natural language processing model to convert Chinese sentences into sign language word sequences,
And compared the sign language word sequence with the gesture data set,
Then, the correct sign language gestures are passed to the avatar, so that they can make sign language gestures,
Then connect all the stages into a complete user system.
A system that enables instant interpretation of sign language.
In addition, this study also experimented and applied a variety of signal smoothing techniques,
Improve the Temporal Jitter problem common in gesture recognition,
The virtual characters can be closer to real people when they perform sign language gestures. |
參考文獻 |
[1] 統計處. “身心障礙統計專區.” (Jul. 2021), [Online]. Available: https://dep.mohw.gov.
tw/dos/cp-5224-62359-113.html (visited on 06/09/2022).
[2] “Speech-to-Text:自動語音辨識| Cloud 語音轉文字,” [Online]. Available: https :
//cloud.google.com/speech-to-text?hl=zh-tw (visited on 05/19/2022).
[3] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” arXiv, Tech. Rep.,
2017.
[4] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese
BERT-Networks,” arXiv, Tech. Rep. arXiv:1908.10084, Aug. 2019.
[5] A. Juliani, V. Berges, E. Vckay, et al., “Unity: A general platform for intelligent agents,”
CoRR, vol. abs/1809.02627, 2018.
[6] F. Zhang, V. Bazarevsky, A. Vakunov, et al., “MediaPipe Hands: On-device Real-time
Hand Tracking,” arXiv, Tech. Rep. arXiv:2006.10214, Jun. 2020.
[7] R. E. Kalman. “卡爾曼濾波.” (Aug. 2021), [Online]. Available: https://zh.wikipedia.
org/w/index.php?title=%E5%8D%A1%E5%B0%94%E6%9B%BC%E6%BB%A4%
E6%B3%A2&oldid=67182863 (visited on 06/09/2022).
[8] G. Casiez, N. Roussel, and D. Vogel, “1€ Filter: A Simple Speed-based Low-pass Filter
for Noisy Input in Interactive Systems,” Conference on Human Factors in Computing
Systems - Proceedings, pp. 2527–2530, May 2012.
[9] 台灣手語, zh-Hant-TW, Nov. 2021.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding,” arXiv, Tech. Rep. arXiv:1810.04805,
May 2019.
[11] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verification using
a ”siamese” time delay neural network,” in Advances in Neural Information Processing
Systems, vol. 6, Morgan-Kaufmann, 1993.
[12] N. Kasukurthi, B. Rokad, S. Bidani, and D. A. Dennisan, “American Sign Language
Alphabet Recognition using Deep Learning,” arXiv, Tech. Rep. arXiv:1905.05487, May
2019.
[13] “美國手語.” zh-Hant-TW. (Dec. 2020), [Online]. Available: https://zh.wikipedia.org/
w/index.php?title=美國手語&oldid=63043570 (visited on 06/29/2022).
[14] S. He, “Research of a Sign Language Translation System Based on Deep Learning,” in
2019 International Conference on Artificial Intelligence and Advanced Manufacturing
(AIAM), Oct. 2019, pp. 392–396.
[15] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: Realtime multiperson
2d pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 43, no. 1, pp. 172–186, 2021.
[16] S. A. Ehssan Aly, A. Hassanin, and S. Bekhet, “Esldl: An integrated deep learning model
for egyptian sign language recognition,” in 2021 3rd Novel Intelligent and Leading Emerging
Sciences Conference (NILES), 2021, pp. 331–335.
[17] W. Cheng, J. H. Park, and J. H. Ko, “Handfoldingnet: A 3d hand pose estimation network
using multiscale-feature guided folding of a 2d hand skeleton,” CoRR, vol. abs/
2108.05545, 2021.
[18] U. Iqbal, P. Molchanov, T. M. Breuel, J. Gall, and J. Kautz, “Hand pose estimation via
latent 2.5d heatmap regression,” CoRR, vol. abs/1804.09534, 2018.
[19] M. Boulares and M. Jemni, “Mobile sign language translation system for deaf community,”
in Proceedings of the International Cross-Disciplinary Conference on Web Accessibility,
ser. W4A ’12, Lyon, France: Association for Computing Machinery, 2012.
[20] S. Stoll, N. C. Camgoz, S. Hadfield, and R. Bowden, “Text2Sign: Towards Sign Language
Production Using Neural Machine Translation and Generative Adversarial Networks,”
en, International Journal of Computer Vision, vol. 128, no. 4, pp. 891–908, Apr.
2020.
[21] Thadeu Luz. “Using AI for Sign Language Translation.” (Mar. 2020), [Online]. Available:
https://www.youtube.com/watch?v=N0Vm0LXmcU4 (visited on 06/29/2022).
[22] Hand Talk Translator–Apps on Google Play, zh-TW.
[23] 孫聖然. “北京冬奧|央視推AI 手語主播助聽障人士觀賽適應快語速識專有詞.”
(Feb. 2022), [Online]. Available: https://www.hk01.com/即時中國/732025/北京冬奧-
央視推ai手ª主播助聽障人士觀賽-適應快語速識專有詞(visited on 05/18/2022).
[24] STSbenchmark - stswiki.
[25] Huertas97, Multilingual-STSB, Mar. 2022.
[26] 張榮興. “實用臺灣手語教材,” [Online]. Available: https : / / www . books . com . tw /
products/0010882503 (visited on 06/25/2022).
[27] “SentenceTransformers Documentation —Sentence-Transformers documentation,” [Online].
Available: https://www.sbert.net/ (visited on 06/26/2022). |