CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/89789

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89789

題名:	CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild
作者:	黃靖筌;Huang, Ching-Chuan
貢獻者:	資訊工程學系
關鍵詞:	注意力機制;唇形同步;臉部生成;attention mechanism;lip synchronization;face synthesis
日期:	2022-07-25
上傳時間:	2022-10-04 11:59:48 (UTC+8)
出版者:	國立中央大學
摘要:	隨著線上媒體需求的不斷增長，媒體創作者為了接觸到來自世界各地的更多觀眾，迫切需要影片內容的翻譯。然而，經過直接翻譯和配音的素材無法提供自然的視聽體驗，往往是因為翻譯後的語音和唇部動作不同步。為了改善觀看體驗，準確的自動唇部動作同步生成系統有了它的必要性。為了提高語音到嘴唇生成的準確性和視覺品質，本研究提出了兩種做法：在卷積層中嵌入註意力機制，以及在視覺品質判別器中部署SSIM作為損失函數。最後在三個視聽資料集上對所提出的系統以及過往的系統進行了實驗。結果表明，我們提出的方法不僅在音頻-嘴唇同步生成的準確度上，而且也在其視覺品質上，都比目前領域中最先進的語音-嘴唇合成系統有更佳的表現。;With the growing consumption of online visual contents, there is an urgent need for video translation in order to reach a wider audience from around the world. However, the materials after direct translation and dubbing are unable to create a natural audio-visual experience since the translated speech and lip movement are often out of sync. To improve viewing experience, an accurate automatic lip-movement synchronization generation system is necessary. To improve the accuracy and visual quality of speech to lip generation, this research proposes two techniques: Embedding Attention Mechanisms in Convolution Layers and Deploying SSIM as Loss Function in Visual Quality Discriminator. The proposed system as well as several other ones are experimented on three audio-visual datasets. The results show that our proposed methods achieve superior performance than the state-of-the-art speech to lip synthesis on not only the accuracy but also the visual quality of audio-lip synchronization generation.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	30	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....