基於多層次注意力機制之單目相機語意場景補全技術

DC 欄位	值	語言
DC.contributor	通訊工程學系	zh_TW
DC.creator	張鎮宇	zh_TW
DC.creator	Cheng-Yu Chang	en_US
dc.date.accessioned	2023-8-15T07:39:07Z
dc.date.available	2023-8-15T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110523001
dc.contributor.department	通訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	現今科技快速發展的時代，硬體上持續的突破史的人工智慧的研究日益進展，需多的研究都逐漸有二維平面拓展到三維空間中，例如:自駕車產業、娛樂影視業，三維的人體建模、醫學美容等相關的領域。在二維圖像中估計中人們可以很好的判斷場景的三維距離，但是在電腦視覺中，由單一的二維圖像推估三維場景一直以來都是一項值得關注的議題，因為人們能很快速地由圖像中辨識出物體並且能夠很好的預估物體的位置資訊，於是現今獲取三維空間資訊幾乎皆使用激光雷達或是深度相機，這些雖然能暫時解決三維空間資訊不足問題，但這些設備通常更加的昂貴且需要額外的輸入，因此由純視覺方法估計三維場景並且語意分割與補全語意意圖更好更快的解決場景理解的問題。同時間在三維體素的場景在訓練與應用的階段會使用到大量的記憶體，因此如何在有限的資源限制之下能夠提高效能並重建三維場景也是在語意場景補全重要的一部份。本研究將由單張RGB圖像重建三維場景並完成語意場景補全，在模型中加入注意力機制，對於不同尺度特徵在不同層級特徵對於使語意場景補全的影響，並提高語意場景補全模型的品質，減少訓練時間，並且分析在使用之記憶體與模型效能之間之優點。本研究在客觀的評估(IoU, mIoU)上皆有傑出的表現。	zh_TW
dc.description.abstract	In today′s era of rapid technological development, research on artificial intelligence has been making continuous breakthroughs in hardware. More and more studies are gradually expanding from two-dimensional planes to three-dimensional space, encompassing various fields such as the self-driving car industry, entertainment film industry, three-dimensional human modeling, and medical aesthetics. While people are good at judging the three-dimensional distance of a scene in two-dimensional image estimation, estimating the three-dimensional scene from a single two-dimensional image has always been a matter of concern in computer vision. This is because people can quickly identify objects in images and accurately predict their locations. As a result, the acquisition of three-dimensional spatial information nowadays heavily relies on laser radar or depth cameras. Although these devices temporarily solve the problem of insufficient 3D spatial information, they are typically expensive and require additional inputs. Therefore, adopting a purely visual approach for estimating 3D scenes, along with semantic segmentation and complementary semantics, can better address the challenges of scene understanding. Simultaneously, training and applying three-dimensional scenes require significant memory resources. Consequently, improving performance and reconstructing three-dimensional scenes with limited resources are crucial aspects of semantic scene complementation. This study reconstructs 3D scenes from a single RGB image and completes the semantic scene complementation by adding an attention mechanism to the model and the importance of features at different levels to make it useful during training, in order to improve the quality of the semantic scene complementation model, reduce the training time, and investigate the advantages between the memory used and the model performance. This study shows outstanding performance in objective evaluation (IoU, mIoU).	en_US
DC.subject	語意場景補全	zh_TW
DC.subject	注意力機制	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	語意分割	zh_TW
DC.subject	semantic scene completion	en_US
DC.subject	Attention mechanism	en_US
DC.subject	deep learning	en_US
DC.subject	semantic segmentation	en_US
DC.title	基於多層次注意力機制之單目相機語意場景補全技術	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Monoscene camera semantic scene completion technique based on multi-level attention mechanisms	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110523001 完整後設資料紀錄