dc.description.abstract | Since 2016, Courbariaux pioneered Binary Neural Network to dramatically decrease the storage and computation cost of CNN for lightweight application, researchers have made continued efforts to drill the cost as well as minimize the representation capacity loss and accuracy gap to its real-valued counterpart. Among them, ReActNet achieving 62.16% Top-1 accuracy on CFAR100 sets a new horizon on this competition landscape. In this thesis, we strive for further polishing its performance yet at even a lower overall cost.
We redesign the General Building block of the ReActNet (GBR) in an effort to elevating the accuracy on CIFAR100 image classification dataset, PSCAL VOC 07+12 object detection dataset, and KITTI vision benchmark suits, yet at a lower memory footprint and lower computation cost. The GBR comprises a single Down-sampling Block (DB) and a plurality of Common Blocks (CB). Firstly, we eliminate all the 1x1 Binary Convolutional (BConv) layers of the CBs to reduce the weight parameters as well as the network size. Second, the 1x1 Bconv duplicate of the DB is replaced by the Efficient Channel Attention (ECA) to enrich the representation capacity. Third, a Batch Normalization (BN) unit is added right after the Concatenator of the DB to render the data distribution more suitable for the performance optimization. Finally, the shortcut connection is resided after the RPReLU activation unit so as to balance the information preservation from the shortcut path and information transformation from the residual path. Our experiment shows that the enhanced network (ERCNet) delivers 2.39% higher Top-1 accuracy on CIFAR100 than the original ReActNet yet at around 10% lower memory and 8% lower computation flops. It generates 81.8% mAP50 under YOLOv8 framework on Pascal VOC 07+12 data set, surpassing the ReActNet by 0.8%. Furthermore, it is extremely encouraging that on the KITTI dataset, our ERCNET wins a landslide victory over all the models of the official YOLOv8 backbone, presenting 94.8% mAP50 which transcends YOLOv8-L &-N by 1.9% and 11.2%, respectively. On the other hand, we also find that our ERCNET performs slightly inferiorly to the default YOLOv8 backbone when regressing both on Pascal VOC 07+12.
Our experiments indicate that ERCNet demonstrates better performance than CNN in some particular data sets such as KITTI, yet at a lower memory and computation cost. As such, ERCNet makes it further suitable for having BNN on specific dataset applications in lightweight devices. | en_US |