摘要(英) |
The data of three-dimensional point clouds differs significantly from that of regular two-dimensional images in terms of data storage, data characteristics, and classification network architecture. 3D scanning devices have become increasingly popular recently, like structured light scanners in smartphones or radar/lidar systems in next-gen cars. As the volume of 3D point cloud data increases, there is a growing demand for more accurate classification networks that are specifically designed to analyze such data. PointNet is the pioneering and effective model among 3D point cloud classification networks, However, PointNet++ was introduced to incorporate simple local features to conquer the limitation of considering only global features of 3D point clouds. The aim is to assist the 3D point cloud classification network learn more detailed local geometric structures better. In this thesis, we introduce PointGPS, a novel 3D Point Cloud classification network architecture that leverages self-attention and plane fitting for perceiving local geometric features.
The foundation of our proposed network architecture is built on PointMLP, which is built upon PointNet++. PointMLP enhances the accuracy by incorporating residual Multi-Layer Perceptron (MLP) modules. The architecture begins with an embedding module that elevates the point cloud into higher-dimensional features. It then undergoes four rounds of geometric feature mapping module and feature extraction modules. The geometric feature mapping modules capture features from the point cloud using farthest point sampling where the point cloud is reduced by half at each step, and then neighboring points are selected. The high-dimensional characteristics of the surrounding points are subtracted from the high-dimensional attributes of the farthest point itself, in addition to the high-dimensional attributes of the farthest point. In this module, we utilize Singular Value Decomposition (SVD) to fit the plane of the neighbors and employ self-attention to compute more detailed local geometric structure features. Subsequently, the feature extraction modules, which include MLP modules with residual connections, are applied. In the end, the attributes undergo a downsizing process using a Max Pooling layer, which is subsequently followed by a classifier comprising of fully connected layers and batch normalization layers, activation functions, and random weight dropping to enhance the model′s generalization capability to unseen data. With these design choices, our proposed model achieves significant improvements in accuracy. |
參考文獻 |
[1] QI, Charles R., et al. Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 652-660.
[2] QI, Charles Ruizhongtai, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 2017, 30.
[3] MA, Xu, et al. Rethinking network design and local geometry in point cloud: A simple residual MLP framework. arXiv preprint arXiv:2202.07123, 2022.
[4] CAMUFFO, Elena; MARI, Daniele; MILANI, Simone. Recent advancements in learning algorithms for point clouds: An updated overview. Sensors, 2022, 22.4: 1357.
[5] GUO, Yulan, et al. Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence, 2020, 43.12: 4338-4364.
[6] MIRBAUER, Martin, et al. Survey and evaluation of neural 3d shape classification approaches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44.11: 8635-8656.
[7] JOSEPH-RIVLIN, Mor; ZVIRIN, Alon; KIMMEL, Ron. Momen (e) t: Flavor the moments in learning to classify shapes. In: Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019. p. 0-0.
[8] ZHAO, Hengshuang, et al. Pointweb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 5565-5573.
[9] DUAN, Yueqi, et al. Structural relational reasoning of point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 949-958.
[10] YAN, Xu, et al. Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 5589-5598.
[11] CHENG, Silin, et al. Pra-net: Point relation-aware network for 3d point cloud analysis. IEEE Transactions on Image Processing, 2021, 30: 4436-4448.
[12] WU, Wenxuan; QI, Zhongang; FUXIN, Li. Pointconv: Deep convolutional networks on 3d point clouds. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2019. p. 9621-9630.
[13] THOMAS, Hugues, et al. Kpconv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019. p. 6411-6420.
[14] LIU, Yongcheng, et al. Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 8895-8904.
[15] LI, Yangyan, et al. Pointcnn: Convolution on x-transformed points. Advances in neural information processing systems, 2018, 31.
[16] XU, Yifan, et al. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: Proceedings of the European conference on computer vision (ECCV). 2018. p. 87-102.
[17] KOMARICHEV, Artem; ZHONG, Zichun; HUA, Jing. A-cnn: Annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 7421-7430.
[18] KUMAWAT, Sudhakar; RAMAN, Shanmuganathan. Lp-3dcnn: Unveiling local phase in 3d convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 4903-4912.
[19] ZHOU, Hui, et al. Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation. arXiv preprint arXiv:2008.01550, 2020.
[20] FAN, Hehe, et al. Pstnet: Point spatio-temporal convolution on point cloud sequences. arXiv preprint arXiv:2205.13713, 2022.
[21] FAN, Hehe; YANG, Yi. PointRNN: Point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287, 2019.
[22] SIMONOVSKY, Martin; KOMODAKIS, Nikos. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 3693-3702.
[23] WANG, Yue, et al. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 2019, 38.5: 1-12.
[24] ZHANG, Kuangen, et al. Linked dynamic graph cnn: Learning on point cloud via linking hierarchical features. arXiv preprint arXiv:1904.10014, 2019.
[25] SHEN, Yiru, et al. Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 4548-4557.
[26] CHEN, Chao, et al. Clusternet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 4994-5002.
[27] XU, Qiangeng, et al. Grid-gcn for fast and scalable point cloud learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 5661-5670.
[28] ZHANG, Yingxue; RABBAT, Michael. A graph-cnn for 3d point cloud classification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. p. 6279-6283.
[29] LANDRIEU, Loic; SIMONOVSKY, Martin. Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 4558-4567.
[30] DEMANTKÉ, Jérôme, et al. Dimensionality based scale selection in 3D lidar point clouds. The international archives of the photogrammetry, remote sensing and spatial information sciences, 2012, 38: 97-102.
[31] GUINARD, Stéphane; LANDRIEU, Loic; VALLET, Bruno. Weakly supervised segmentation-aided classification of urban scenes from 3D LiDAR point clouds. 2017.
[32] VASWANI, Ashish, et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.
[33] WU, Zhirong, et al. 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1912-1920.
[34] UY, Mikaela Angelina, et al. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019. p. 1588-1597. |