dc.description.abstract | The data of three-dimensional point clouds differs significantly from that of regular two-dimensional images in terms of data storage, data characteristics, and classification network architecture. 3D scanning devices have become increasingly popular recently, like structured light scanners in smartphones or radar/lidar systems in next-gen cars. As the volume of 3D point cloud data increases, there is a growing demand for more accurate classification networks that are specifically designed to analyze such data. PointNet is the pioneering and effective model among 3D point cloud classification networks, However, PointNet++ was introduced to incorporate simple local features to conquer the limitation of considering only global features of 3D point clouds. The aim is to assist the 3D point cloud classification network learn more detailed local geometric structures better. In this thesis, we introduce PointGPS, a novel 3D Point Cloud classification network architecture that leverages self-attention and plane fitting for perceiving local geometric features.
The foundation of our proposed network architecture is built on PointMLP, which is built upon PointNet++. PointMLP enhances the accuracy by incorporating residual Multi-Layer Perceptron (MLP) modules. The architecture begins with an embedding module that elevates the point cloud into higher-dimensional features. It then undergoes four rounds of geometric feature mapping module and feature extraction modules. The geometric feature mapping modules capture features from the point cloud using farthest point sampling where the point cloud is reduced by half at each step, and then neighboring points are selected. The high-dimensional characteristics of the surrounding points are subtracted from the high-dimensional attributes of the farthest point itself, in addition to the high-dimensional attributes of the farthest point. In this module, we utilize Singular Value Decomposition (SVD) to fit the plane of the neighbors and employ self-attention to compute more detailed local geometric structure features. Subsequently, the feature extraction modules, which include MLP modules with residual connections, are applied. In the end, the attributes undergo a downsizing process using a Max Pooling layer, which is subsequently followed by a classifier comprising of fully connected layers and batch normalization layers, activation functions, and random weight dropping to enhance the model′s generalization capability to unseen data. With these design choices, our proposed model achieves significant improvements in accuracy. | en_US |