基於聲音驅動的End to end即時面部模型合成系統

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	胡峻愷	zh_TW
DC.creator	Jyun-Kai Hu	en_US
dc.date.accessioned	2022-9-26T07:39:07Z
dc.date.available	2022-9-26T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=108522047
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	VR/AR作為一種新興技術，無論是教育、娛樂還是情景模擬，都有非常重要的應用。 VR 可以提供與實際空間環境相媲美的體驗。在模擬醫療手術，是軍事訓練，甚至心理諮詢時的畫面想像中是一個非常好的應用工具。製造業、建築業和旅遊業也可以在 VR 和 AR 的幫助下發生巨大的變化。例如，VR可以輕鬆實現工廠遠程監控、旅遊景點導覽，甚至應用於建築信息模型。用於工程建設項目的設計模擬、協同編輯、造價試算，而AR疊加現實場景中虛擬物體的特徵，可用於疊加設備運行、維護SOP，甚至空間內的管線圖信息、方位引導、物品歷史信息等，都可以為生產操作、各種設備的維護操作、消防救援、觀光引導等帶來極大的便利。而虛擬世界中，人的面部表情是極為重要的一環。人在處裡的外界資訊中，人臉占了大腦中相當分量的容量。人腦甚至有專門的區域負責處裡視覺訊號中面部表情的區塊。若是虛擬解色的面部處裡不夠逼真，很容易使VR使用者沉浸感降低，達不到VR/AR預期該有的效果。因此，投入資源模擬出逼真的虛擬人物面部模型，是相當有必要的。現有的面部捕捉技術，可以利用影像資訊，搭配各種感測器在虛擬世界中重建出原本的人物面部。這項技術已經縝緻成熟，建出以假亂真的模型，在各大動畫/遊戲/影視中已經被大量應用。然而，現有的技術，捕捉面部的器材成本卻也所費不貲。許多情境下，並沒有那麼多的資源可以使用，可傳輸的資料更加稀少。在這種情境下，利用深度學習，分析音訊中的文字以及對應情緒，重建和合成出虛擬角色該有的五官動作網格的技術，就能派上用場了。本論文基於前人提出的即時面部模型合成系統，利用輕量化的Transformer模型，在消耗更少量資源的前提下，使用語音訊息即時的分析出說話者嘴部該有的形狀，同時分析出語氣中隱含的情緒，調整面部模型其他部位諸如眉毛、眼睛和臉頰等部件的形狀。	zh_TW
dc.description.abstract	As a novel technology, VR/AR has very important applications whether it is education, entertainment or scenario simulation. VR can provide an experience comparable to the actual spatial environment. It is a very good application tool in the image imagination of simulated medical surgery, military training, and even psychological consultation. Manufacturing, construction, and tourism can also be dramatically transformed with the help of VR and AR. For example, VR can easily implement remote monitoring of factories, tours of tourist attractions, and even applied to building information models. It is used for design simulation, collaborative editing, and cost trial calculation of engineering construction projects, while AR superimposes the characteristics of virtual objects in real scenes, which can be used to superimpose equipment operation, maintenance SOP, and even pipeline map information, orientation guidance, and item history information in space. , can bring great convenience to production operation, maintenance operation of various equipment, fire rescue, sightseeing guidance, etc. In the virtual world, human facial expressions are an extremely important part. Among the external information of human beings, the human face occupies a considerable amount of capacity in the brain. The human brain even has a dedicated area responsible for processing facial expressions in visual signals. If the virtual decolorization of the face is not realistic enough, it is easy to reduce the immersion of the VR user, and the expected effect of VR/AR cannot be achieved. Therefore, it is quite necessary to invest resources to simulate realistic facial models of virtual characters. Existing face capture technology can use image information and various sensors to reconstruct the original face of a character in the virtual world. This technology has matured and built a fake model, which has been widely used in major animations/games/films. However, with the existing technology, the cost of the equipment to capture the face is also very expensive. In many situations, there are not so many resources available, and the data that can be transmitted is even more scarce. In this situation, the use of deep learning to analyze the text and corresponding emotions in the audio, reconstruct and synthesize the technology of the facial features and action grids that the virtual character should have, can come in handy. Based on the real-time facial model synthesis system proposed by the predecessors, this paper uses the lightweight Transformer model to analyze the shape of the speaker′s mouth in real time and analyze the tone of the voice under the premise of consuming less resources. Implied emotions, adjust the shape of other parts of the face model such as eyebrows, eyes and cheeks.	en_US
DC.subject	Seq2Seq模型	zh_TW
DC.subject	Transformer輕量化	zh_TW
DC.subject	人臉合成	zh_TW
DC.subject	Sequence to Sequence	en_US
DC.subject	Lightweight Transformer	en_US
DC.subject	face synthesis	en_US
DC.title	基於聲音驅動的End to end即時面部模型合成系統	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Audio-driven End to End real-time facial model synthesis system	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 108522047 完整後設資料紀錄