dc.description.abstract | As a novel technology, VR/AR has very important applications whether it is education, entertainment or scenario simulation. VR can provide an experience comparable to the actual spatial environment. It is a very good application tool in the image imagination of simulated medical surgery, military training, and even psychological consultation. Manufacturing, construction, and tourism can also be dramatically transformed with the help of VR and AR. For example, VR can easily implement remote monitoring of factories, tours of tourist attractions, and even applied to building information models. It is used for design simulation, collaborative editing, and cost trial calculation of engineering construction projects, while AR superimposes the characteristics of virtual objects in real scenes, which can be used to superimpose equipment operation, maintenance SOP, and even pipeline map information, orientation guidance, and item history information in space. , can bring great convenience to production operation, maintenance operation of various equipment, fire rescue, sightseeing guidance, etc.
In the virtual world, human facial expressions are an extremely important part. Among the external information of human beings, the human face occupies a considerable amount of capacity in the brain. The human brain even has a dedicated area responsible for processing facial expressions in visual signals. If the virtual decolorization of the face is not realistic enough, it is easy to reduce the immersion of the VR user, and the expected effect of VR/AR cannot be achieved. Therefore, it is quite necessary to invest resources to simulate realistic facial models of virtual characters.
Existing face capture technology can use image information and various sensors to reconstruct the original face of a character in the virtual world. This technology has matured and built a fake model, which has been widely used in major animations/games/films. However, with the existing technology, the cost of the equipment to capture the face is also very expensive. In many situations, there are not so many resources available, and the data that can be transmitted is even more scarce. In this situation, the use of deep learning to analyze the text and corresponding emotions in the audio, reconstruct and synthesize the technology of the facial features and action grids that the virtual character should have, can come in handy.
Based on the real-time facial model synthesis system proposed by the predecessors, this paper uses the lightweight Transformer model to analyze the shape of the speaker′s mouth in real time and analyze the tone of the voice under the premise of consuming less resources. Implied emotions, adjust the shape of other parts of the face model such as eyebrows, eyes and cheeks. | en_US |