中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98643
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83696/83696 (100%)
Visitors : 56306353      Online Users : 1046
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98643


    Title: 圖片建置聲音驅動之 3D 高斯潑濺立體半身模型及換衣功能;Image-Based Construction of a Voice-Driven 3D Gaussian Splatting Upper-Body Model with Virtual Garment Transfer
    Authors: 陳明威;Chen, Ming-Wei
    Contributors: 資訊工程學系
    Keywords: 3D 高斯潑濺;多模態生成;虛擬人像;聲音驅動;虛擬試衣;3D Gaussian Splatting;multimodal generation;virtual avatar;voice-driven animation;virtual try-on
    Date: 2025-08-29
    Issue Date: 2025-10-17 13:02:29 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 提出一套基於少量圖像輸入,即可建構具備聲音驅動能力與動態換衣功能之 3D高斯潑濺立體半身人像模型。本系統融合了多模態生成技術,整合了圖像合成、語音處理與即時渲染等模組,實現高擬真的虛擬人像互動框架。首先,透過頭部合成影像模型從單人正面圖像推衍出多角度視角圖,並進一步利用GaussianAvatars 的 3D Gaussian Splatting 技術建構連續視角下的立體半身模型。接著,結合語音辨識與文字轉語音模型,驅動虛擬人像實現同步嘴型與表情動作。衣著更換方面,透過條件式圖像轉換模型實現視覺一致性的虛擬試衣功能。整體系統具備低資源建模、高度即時性與高度視覺擬真度,能廣泛應用於虛擬人類、遠端互動、數位分身與沉浸式行銷等場域。實驗結果顯示,本方
    法於極少圖像輸入(1~3 張)條件下,依然可生成穩定、連續且語音同步良好的 3D 半身人像,並具備靈活的多套衣著視覺轉換能力,驗證本系統於多模態互動應用的可行性與實用性。
    ;This study presents a novel framework for constructing a GaussianAvatars 3D Gaussian Splatting-based upper-body avatar driven by voice and capable of dynamic clothing changes, using only a small number of input images. The proposed system leverages multimodal generation techniques by integrating image synthesis, speech processing, and real-time rendering modules to achieve high-fidelity and interactive virtual humans. Starting with a frontal portrait, a Head Synthesizer is employed to synthesize multi-view facial images, which are then reconstructed into a continuous-viewpoint 3D representation using Gaussian Splatting. Voice interaction is enabled through automatic speech recognition (ASR) and text-to-speech (TTS) modules, driving realistic lip-sync and expression dynamics. For clothing manipulation, a conditional image-to-image translation model is applied to perform seamless virtual outfit try-on. The system features low-data requirements, fast rendering, and visually coherent results, making it suitable for applications such as digital humans, remote interaction, virtual fitting, and immersive marketing. Experimental results demonstrate that the system can generate temporally consistent and speech-synchronized 3D avatars with as few as 1 to 3 images, while supporting diverse outfit changes with high visual realism. These findings confirm the feasibility and practicality of the proposed multimodal human avatar system.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML9View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明