dc.description.abstract | In today′s era of rapid technological development, research on artificial intelligence has been making continuous breakthroughs in hardware. More and more studies are gradually expanding from two-dimensional planes to three-dimensional space, encompassing various fields such as the self-driving car industry, entertainment film industry, three-dimensional human modeling, and medical aesthetics.
While people are good at judging the three-dimensional distance of a scene in two-dimensional image estimation, estimating the three-dimensional scene from a single two-dimensional image has always been a matter of concern in computer vision. This is because people can quickly identify objects in images and accurately predict their locations. As a result, the acquisition of three-dimensional spatial information nowadays heavily relies on laser radar or depth cameras. Although these devices temporarily solve the problem of insufficient 3D spatial information, they are typically expensive and require additional inputs. Therefore, adopting a purely visual approach for estimating 3D scenes, along with semantic segmentation and complementary semantics, can better address the challenges of scene understanding.
Simultaneously, training and applying three-dimensional scenes require significant memory resources. Consequently, improving performance and reconstructing three-dimensional scenes with limited resources are crucial aspects of semantic scene complementation.
This study reconstructs 3D scenes from a single RGB image and completes the semantic scene complementation by adding an attention mechanism to the model and the importance of features at different levels to make it useful during training, in order to improve the quality of the semantic scene complementation model, reduce the training time, and investigate the advantages between the memory used and the model performance. This study shows outstanding performance in objective evaluation (IoU, mIoU). | en_US |