Transformer-based Gloss-free Sign Language Translation

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98287

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98287

題名:	Transformer-based Gloss-free Sign Language Translation
作者:	江子青;Chiang, Zi-Cing
貢獻者:	資訊工程學系
關鍵詞:	手語翻譯
日期:	2025-07-21
上傳時間:	2025-10-17 12:35:07 (UTC+8)
出版者:	國立中央大學
摘要:	全球有超過四億三千萬人受致殘性聽力損失影響，溝通障礙常在教育、就業和社會融合方面造成重大挑戰，這使得手語翻譯（SLT）成為促進更無障礙與共融社會的關鍵領域。為應對這些障礙，本論文提出一個基於Transformer 的框架，旨在將手語影片直接翻譯為文本，以克服時空複雜性與高昂標註成本等關鍵挑戰。我們的架構整合了自適應遮蔽與融合模組、上下文位置編碼以及詞元級對比損失，以提升翻譯準確性及多模態對齊效果。該模型表現優於現有方法，在公開基準數據集上達到了頂尖的精確度與流暢度，同時在更具挑戰性的對話型數據上展現出強大的泛化能力。此整合式架構證實其有效，為生成準確、流暢的翻譯提供了一種卓越的方法，並推動了免詞彙庫模型的技術前沿。;With over 430 million people worldwide affected by disabling hearing loss, communication barriers often create significant challenges in education, employment, and social integration, making Sign Language Translation (SLT) a crucial field for fostering a more accessible and inclusive society. To address these barriers, we introduce a Transformer-based framework designed to translate sign language videos directly into text, confronting key challenges like spatio-temporal complexity and costly annotation. Our architecture integrates an Adaptive Masking and Fusion module, Contextual Position Encoding, and a Token-Level Contrastive Loss to enhance accuracy and multimodal alignment. The model outperforms existing approaches, demonstrating a state-of-the-art level of precision and fluency on public benchmarks and a strong generalization on more challenging conversational data. Ultimately, this integrated architecture proves highly effective, offering a superior method to generate accurate and fluent translations compared to existing gloss-free models.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	20	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....