參考文獻 |
[1] M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, ′′The pascal visual object classes (voc) challenge,′′ Int. Journal of Computer Vision (IJCV), vol.88, is.2, pp.303-338, 2010.
[2] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, ′′Microsoft coco: Common objects in context,′′ arXiv:1405.0312.
[3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ′′Imagenet large scale visual recognition challenge, ′′ arXiv:1409.0575.
[4] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, NV, Dec.3-8, 2012, pp.1106-1114.
[5] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. of ECCV Conf., Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.1-9.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. of ICLR Conf., San Diego, CA, USA, May.7-9, 2015, pp.1-14.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[9] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” in Proc. of Neural Information Processing Systems (NIPS), Montréal, Canada, Dec.7-12, 2015, pp.2377-2385.
[10] J. Redmon and A. Farhadi, ′′Yolov3: an incremental improvement,′′ arXiv:1804.02767.
[11] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. of ICML Conf. , Lille, France, Jul.7-9, 2015, vol.37, pp.448-456.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[13] Jonathan Tremblay, Thang To, and Stan Birchfield, ′′Falling Things: A synthetic dataset for 3D object detection and pose estimation,′′ arXiv:1804.06534.
[14] A. Neubeck and L. Van Gool, "Efficient non-maximum suppression," in Proc. of IEEE Int. Conf. on Pattern Recognition(ICPR), Hong Kong, Aug.20-24, 2006, pp.850-855.
[15] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, Jun.23-28, 2014, pp.580-587.
[16] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[17] R. Girshick, "Fast R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. of ECCV Conf. , Zurich, Switzerland, Sep.6-12, 2014, pp.346-361.
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[20] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “ SSD: Single shot multibox detector,” in Proc. European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21-37.
[21] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd: Deconvolutional single shot detector,” arXiv:1701.06659.
[22] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp.779-788.
[23] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, Jul.21-26, 2017, pp.6517-6525.
[24] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, CA, Jun.21-Jul.18, vol.1, 1967, pp.281-297.
[25] T.-Y. Lin, P. Dollár1, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.936-944.
[26] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp.2980-2988.
[27] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox, “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” arXiv:1711.00199.
[28] Bugra Tekin, Sudipta N. Sinha, and Pascal Fua, “Real-time seamless single shot 6D object pose prediction,” arXiv:1711.08848.
[29] Martin Simon, Stefan Milz, Karl Amende, and Horst-Michael Gross, “Complex-YOLO: Real-time 3D object detection on point clouds,” arXiv:1803.06199.
[30] Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Sämann, Hauke Kaulbersch, Stefan Milz, and Horst Michael Gross, “Complexer-YOLO: Real-time 3D object detection and tracking on semantic point clouds,” arXiv:1904.07537.
[31] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello, “ENet: A deep neural network architecture for real-time semantic segmentation,” arXiv:1606.02147.
[32] N. Chigozie Enyinna, I. Winifred, G. Anthony, and M. Stephen, “Activation functions: comparison of trends in practice and research for deep learning,” arXiv:1811.03378.
[33] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. of ICML Conf. , Haifa, Israel, Jun.21-24, 2010, pp.807-814.
[34] M. Andrew L, H. Awni Y, and N. Andrew Y, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of ICML Conf., Atlanta, GA, Jun.16-21, 2013.
[35] J. Hu, L. Shen and G. Sun, "Squeeze-and-excitation networks," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.7132-7141.
[36] Dario Pavllo, David Grangier, and Michael Auli, “QuaterNet: A quaternion-based recurrent model for human motion,” arXiv:1805.06485.
[37] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” arXiv:1708.02002. |