基於3D全身人體追蹤及虛擬試衣之手語展示系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：32

、訪客IP：18.217.230.80

姓名

李元熙(LI-YUAN-SI) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於3D全身人體追蹤及虛擬試衣之手語展示系統
(Sign Language Display System Based on 3D Body Tracking and Virtual Try-on)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-14以後開放)

摘要(中)

手語是一種視覺交流形式，它依靠手勢、面部表情和肢體語言的組合來傳達意義。全世界數以百萬計的失聰或聽障人士以及與他們交流的人每天都在使用它。然而，儘管它很重要，但由於手語的複雜性和可變性，手語識別和翻譯仍然是一項具有挑戰性的任務。

近年來，計算機視覺和技術越來越多地應用於手語識別和翻譯，並取得了好的成果。在這項工作中，我們介紹了一種基於三維身體建模和虛擬試衣的手語顯示系統。我們的方法涉及使用身體網格估計來生成手語者的 3D 人體模型，然後將其用作多服裝網絡的輸入以模擬手語者衣服的外觀。

我們收集了包含 100 個手語影片的資料集，每個影片都有不同的手語者表演一系列手語。為了使用這些影片，我們首先使用 YOLOv5 裁剪出手語者以創建更好的環境來進行人體網格估計。並使用旨在提高手腕旋轉精度的身體網格估計算法從每個影片中提取手語者的身體模型，然後應用虛擬試穿的方法在手語者身上模擬不同類型的服裝。之後，我們得到了一個姿勢和形狀與原始手語者相同的虛擬人物模型，其衣服是從衣裝資料集中選擇的。我們將這些模型一幀一幀地組合起來，生成一個影片，該影片顯示了一個虛擬人體模型穿著虛擬服裝演示手語。

摘要(英)

Sign language is a form of visual communication that relies on a combination of hand gestures, facial expressions, and body language to convey meaning. Millions of individuals worldwide who are deaf or hard of hearing, as well as by those who communicate with them utilize it on a daily basis. However, despite its importance, sign language recognition and translation remains a challenging task since the complexity and variability of sign language.

In recent years, computer vision and technique has been increasingly applied to sign language recognition and translation, with promising results. In this work, we introduce a sign language display system, based on three-dimensional body modeling[1] and virtual try-on[2]. Our approach involves using body mesh estimation to generate a 3D human model of the signer, which is then used as input to a multi-garment network[2] to simulate the appearance of clothing on the signer.

We collected a dataset of 100 sign language videos, each featuring a different signer performing a range of signs. To use these videos, we firstly use YOLOv5[17] to crop out the signer to create a better environment to do human mesh estimation. And used body mesh estimation algorithms which aims to improve the accuracy of wrist rotation to extract the signer′s body model from each video, and then applied a virtual try-on method to simulate different types of clothing on the signer. Afterwards, we got a virtual human model whose pose and shape is same as the original signer, and its clothes is select from a cloth dataset. We combined these model frame by frame to generate a video which shows a virtual human model with virtual clothes acting sign language.

關鍵字(中)

★ 虛擬試衣
★ 人體建模
★ 手語

關鍵字(英)

論文目次

1.Introduction 1
2.Related Work 2
2.1 YOLO 2
2.2 Deep Learning 3
2.3 Convolutional Neural Network 4
2.4 Human body estimation 6
2.4.1 2D pose estimation 7
2.4.2 3D pose estimation 9
2.4.3 Mesh estimation 10
2.5 3D human model 15
2.6 Virtual try-on 17
2.7 Internet Information Services 19
3.Methodology 20
3.1 Introduction 20
3.2 YOLO detection 20
3.3 3D Whole-Body Estimation 21
3.4 Virtual Try-on 26
3.5 Deploy Website to IIS 30
3.6 Conclusion 30
4.Experiments 31
4.1 Data Collection 31
4.2 Experimental Setup 31
4.3 Experimental Step 31
5.Conclusion 37
6.Reference 38

參考文獻

[1]Gyeongsik Moon, Hongsuk Choi, Kyoung Mu Lee, Dept. of ECE & ASRI, IPAI, Seoul National University, Korea. Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation. Moon et al. - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - 2022
[2]Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. Multi-Garment Net: Learning to Dress 3D People from Images. 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
[3]Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, University of Washington, Allen Institute for AI, Facebook AI Research. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 2016
[4]https://brohrer.mcknote.com/zh-Hant/how_machine_learning_works/how_convolutional_neural_networks_work.html
[5]Zhe Cao, Tomas Simon, Shih-En Wei and Yaser Sheikh. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - 2017
[6]C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C. Chang, M. G. Yong, J. Lee, W. Chang, W. Hua, M. Georg, and M. Grundmann. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
[7]Arindam Sengupta, Feng Jin, Renyuan Zhang and Siyang Cao Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ USA. mm-Pose: Real-Time Human Skeletal Posture Estimation using mmWave Radars and CNNs. IEEE Sensors Journal - 2020.
[8]https://mmpose.readthedocs.io/zh_CN/latest/demos.html
[9]Yu Rong, Takaaki Shiratori, Hanbyul Joo, The Chinese University of Hong Kong, Facebook Reality Labs, Facebook AI Research, FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration. Rong et al. - 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - 2021
[10]Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik, University of California, Berkeley, MPI for Intelligent Systems, Tubingen, Germany, University of Maryland, College Park. End-to-end Recovery of Human Shape and Pose. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition - 2018.
[11]Muhammed Kocabas, Nikos Athanasiou, Michael J. Black, Max Planck Institute for Intelligent Systems, Tubingen, Germany, Max Planck ETH Center for Learning Systems. VIBE: Video Inference for Human Body Pose and Shape Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2020
[12]Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, Michael J. Black, Max Planck Institute for Intelligent Systems, Tubingen, Germany, Industrial Light and Magic, San Francisco, CA. SMPL: A Skinned Multi-Person Linear Model. Loper et al. - ACM Transactions on Graphics - 2015

[13]Pavlakos, Georgios and Choutas, Vasileios and Ghorbani, Nima and Bolkart, Timo and Osman, Ahmed A. A. and Tzionas, Dimitrios and Black, Michael J. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2019.
[14]Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis University of Maryland, College Park. VITON: An Image-based Virtual Try-on Network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition - 2018
[15]Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang. Toward Characteristic-Preserving Image-based Virtual Try-On Network. Computer Vision – ECCV 2018 - 2018.
[16]http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html
[17]https://github.com/ultralytics/yolov5
[18]https://medium.com/@_Xing_Chen_/yolov5-%E8%A9%B3%E7%B4%B0%E8%A7%A3%E8%AE%80-724d55ec774
[19]Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. 2021 International Conference on 3D Vision (3DV) - 2021
[20]Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J Black. Monocular expressive body regression through body-driven attention. Computer Vision – ECCV 2020.
[21]Yuxiao Zhou, Marc Habermann, Ikhsanul Habibie, Ayush Tewari, Christian Theobalt, Feng Xu1. Monocular Real-time Full Body Capture with Inter-part Correlations. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - 2021
[22]https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw
[23]https://github.com/vchoutas/smplx/tree/main/transfer_model
[24]https://github.com/bharat-b7/MultiGarmentNetwork
[25]https://zhuanlan.zhihu.com/p/256358005
[26]https://github.com/facebookresearch/frankmocap/issues/91
[27]https://github.com/mks0601/Hand4Whole_RELEASE
[28]https://140.115.51.243/sign-language/list

指導教授

施國琛(SHIH-GUO-CHEN)

審核日期

2023-7-12

推文