中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98643
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 83696/83696 (100%)
造访人次 : 56317824      在线人数 : 2313
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98643


    题名: 圖片建置聲音驅動之 3D 高斯潑濺立體半身模型及換衣功能;Image-Based Construction of a Voice-Driven 3D Gaussian Splatting Upper-Body Model with Virtual Garment Transfer
    作者: 陳明威;Chen, Ming-Wei
    贡献者: 資訊工程學系
    关键词: 3D 高斯潑濺;多模態生成;虛擬人像;聲音驅動;虛擬試衣;3D Gaussian Splatting;multimodal generation;virtual avatar;voice-driven animation;virtual try-on
    日期: 2025-08-29
    上传时间: 2025-10-17 13:02:29 (UTC+8)
    出版者: 國立中央大學
    摘要: 提出一套基於少量圖像輸入,即可建構具備聲音驅動能力與動態換衣功能之 3D高斯潑濺立體半身人像模型。本系統融合了多模態生成技術,整合了圖像合成、語音處理與即時渲染等模組,實現高擬真的虛擬人像互動框架。首先,透過頭部合成影像模型從單人正面圖像推衍出多角度視角圖,並進一步利用GaussianAvatars 的 3D Gaussian Splatting 技術建構連續視角下的立體半身模型。接著,結合語音辨識與文字轉語音模型,驅動虛擬人像實現同步嘴型與表情動作。衣著更換方面,透過條件式圖像轉換模型實現視覺一致性的虛擬試衣功能。整體系統具備低資源建模、高度即時性與高度視覺擬真度,能廣泛應用於虛擬人類、遠端互動、數位分身與沉浸式行銷等場域。實驗結果顯示,本方
    法於極少圖像輸入(1~3 張)條件下,依然可生成穩定、連續且語音同步良好的 3D 半身人像,並具備靈活的多套衣著視覺轉換能力,驗證本系統於多模態互動應用的可行性與實用性。
    ;This study presents a novel framework for constructing a GaussianAvatars 3D Gaussian Splatting-based upper-body avatar driven by voice and capable of dynamic clothing changes, using only a small number of input images. The proposed system leverages multimodal generation techniques by integrating image synthesis, speech processing, and real-time rendering modules to achieve high-fidelity and interactive virtual humans. Starting with a frontal portrait, a Head Synthesizer is employed to synthesize multi-view facial images, which are then reconstructed into a continuous-viewpoint 3D representation using Gaussian Splatting. Voice interaction is enabled through automatic speech recognition (ASR) and text-to-speech (TTS) modules, driving realistic lip-sync and expression dynamics. For clothing manipulation, a conditional image-to-image translation model is applied to perform seamless virtual outfit try-on. The system features low-data requirements, fast rendering, and visually coherent results, making it suitable for applications such as digital humans, remote interaction, virtual fitting, and immersive marketing. Experimental results demonstrate that the system can generate temporally consistent and speech-synchronized 3D avatars with as few as 1 to 3 images, while supporting diverse outfit changes with high visual realism. These findings confirm the feasibility and practicality of the proposed multimodal human avatar system.
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML9检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明