基於多層次注意力機制之單目相機語意場景補全技術;Monoscene camera semantic scene completion technique based on multi-level attention mechanisms

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/92988

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92988

题名:	基於多層次注意力機制之單目相機語意場景補全技術;Monoscene camera semantic scene completion technique based on multi-level attention mechanisms
作者:	張鎮宇;Chang, Cheng-Yu
贡献者:	通訊工程學系
关键词:	語意場景補全;注意力機制;深度學習;語意分割;semantic scene completion;Attention mechanism;deep learning;semantic segmentation
日期:	2023-08-15
上传时间:	2024-09-19 16:36:47 (UTC+8)
出版者:	國立中央大學
摘要:	現今科技快速發展的時代，硬體上持續的突破史的人工智慧的研究日益進展，需多的研究都逐漸有二維平面拓展到三維空間中，例如:自駕車產業、娛樂影視業，三維的人體建模、醫學美容等相關的領域。在二維圖像中估計中人們可以很好的判斷場景的三維距離，但是在電腦視覺中，由單一的二維圖像推估三維場景一直以來都是一項值得關注的議題，因為人們能很快速地由圖像中辨識出物體並且能夠很好的預估物體的位置資訊，於是現今獲取三維空間資訊幾乎皆使用激光雷達或是深度相機，這些雖然能暫時解決三維空間資訊不足問題，但這些設備通常更加的昂貴且需要額外的輸入，因此由純視覺方法估計三維場景並且語意分割與補全語意意圖更好更快的解決場景理解的問題。同時間在三維體素的場景在訓練與應用的階段會使用到大量的記憶體，因此如何在有限的資源限制之下能夠提高效能並重建三維場景也是在語意場景補全重要的一部份。本研究將由單張RGB圖像重建三維場景並完成語意場景補全，在模型中加入注意力機制，對於不同尺度特徵在不同層級特徵對於使語意場景補全的影響，並提高語意場景補全模型的品質，減少訓練時間，並且分析在使用之記憶體與模型效能之間之優點。本研究在客觀的評估(IoU, mIoU)上皆有傑出的表現。 ;In today′s era of rapid technological development, research on artificial intelligence has been making continuous breakthroughs in hardware. More and more studies are gradually expanding from two-dimensional planes to three-dimensional space, encompassing various fields such as the self-driving car industry, entertainment film industry, three-dimensional human modeling, and medical aesthetics. While people are good at judging the three-dimensional distance of a scene in two-dimensional image estimation, estimating the three-dimensional scene from a single two-dimensional image has always been a matter of concern in computer vision. This is because people can quickly identify objects in images and accurately predict their locations. As a result, the acquisition of three-dimensional spatial information nowadays heavily relies on laser radar or depth cameras. Although these devices temporarily solve the problem of insufficient 3D spatial information, they are typically expensive and require additional inputs. Therefore, adopting a purely visual approach for estimating 3D scenes, along with semantic segmentation and complementary semantics, can better address the challenges of scene understanding. Simultaneously, training and applying three-dimensional scenes require significant memory resources. Consequently, improving performance and reconstructing three-dimensional scenes with limited resources are crucial aspects of semantic scene complementation. This study reconstructs 3D scenes from a single RGB image and completes the semantic scene complementation by adding an attention mechanism to the model and the importance of features at different levels to make it useful during training, in order to improve the quality of the semantic scene complementation model, reduce the training time, and investigate the advantages between the memory used and the model performance. This study shows outstanding performance in objective evaluation (IoU, mIoU).
显示于类别:	[通訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	8	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....