參考文獻 |
[1] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer," arXiv:1907.01341v3.
[2] Y. Zhang and T. Funkhouser, "Deep depth completion of a single RGB-D image," arXiv:1803.09326.
[3] Y. Huang, T. Wu, Y. Liu, and W. Hsu, "Indoor depth completion with boundary consistency and self-attention," arXiv:1908.08344.
[4] D. Senushkin, M. Romanov, I. Belikov, A. Konushin, and N. Patakin, "Decoder modulation for indoor depth completion," arXiv:2005.08607.
[5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6.
[6] P. Fischer, O. Ronneberger, and T. Brox, "U-Net: convolutional networks for biomedical image segmentation," arXiv:1505.04597v1.
[7] V. Nekrasov, C. Shen, and I. Reid, "Light-weight refineNet for real-time semantic segmentation," arXiv:1810.03272v1.
[8] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional neural networks," in Proc. of ECCV Conf., Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[9] M. Tan and Q. Le, "EfficientNet: rethinking model scaling for convolutional neural networks," arXiv:1905.11946.
[10] T.-Y. Lin, P. Doll?r, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," arXiv:1612.03144.
[11] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, "Deconvolutional networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.-N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," arXiv:1706.03762v5.
[13] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, "Self-attention generative adversarial networks," arXiv:1805.08318v2.
[14] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, "Squeeze-and-excitation networks," arXiv:1709.01507v4.
[15] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, "CBAM: convolutional block attention module," arXiv:1807.06521v2.
[16] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," arXiv:1809.02983v4.
[17] Y. Chen, Y. Kalantidis, J. Li, S. Yan, and J. Feng, "A2-Nets: Double attention networks," arXiv:1810.11579v1.
[18] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, "ECA-Net: efficient channel attention for deep convolutional neural networks," arXiv:1910.03151v4.
[19] B. Niu, W. Wen, W. Ren, X. Zhang, L. Yang, S. Wang, K. Zhang, X. Cao, and H. Shen, "Single image super-resolution via a holistic attention network," arXiv:2008.08767v1.
[20] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, "Image inpainting," in Proc. 27th Int. Conf. on Computer Graphics and Interactive Techniques, New Orleans, USA, 2000, pp.417-424.
[21] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera, "Filling-in by joint interpolation of vector fields and gray levels," in Proc. of IEEE Trans. on Image Processing, vol.10, no.8, pp.1200-1211, 2001.
[22] A. Telea, "An image inpainting technique based on the fast marching method," Journal of Graphics Tools, vol.9, no.1, pp.25-36, Jan. 2004.
[23] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, "Patchmatch: a randomized correspondence algorithm for structural image editing," ACM Trans. on Graphics (TOG), vol.28, no.3, 2009.
[24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," arXiv:1406.2661.
[25] Y. Li, S. Liu, J. Yang, and M.-H. Yang, "Generative face completion," arXiv:1704.05838.
[26] S. Iizuka, E. Simo-Serra, and H. Ishikawa, "Globally and locally consistent image completion," ACM Trans. on Graphics (TOG), vol.36, no.4, 2017.
[27] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, "Sparsity invariant CNNs," arXiv:1708.06500v2.
[28] S. Shivakumar, T. Nguyen, I. Miller, S. Chen, V. Kumar, and C. Taylor, "Dfusenet: deep fusion of RGB and sparse depth information for image guided dense depth completion," arXiv:1902.00761v2.
[29] J. Park, K. Joo, Z. Hu, C. Liu, and I. Kweon, "Non-local spatial propagation network for depth completion," arXiv:2007.10042v1.
[30] Y. Zhang and T. Funkhouser, "Deep depth completion of a single RGB-D image," arXiv:1803.09326.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," arXiv:1406.4729v4.
[32] T. Park, M. Liu, T. Wang, and J. Zhu, "Semantic image synthesis with spatially-adaptive normalization," arXiv:1903.07291.
[33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, "Instance normalization: the missing ingredient for fast stylization," arXiv:1607.08022v3.
[34] S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," arXiv:1502.03167v3.
[35] C. Zhao, Q. Sun, C. Zhang, Y. Tang, and F. Qian, "Monocular depth estimation based on deep learning: an overview," arXiv:2003.06620v2.
[36] K. Lore, K. Reddy, M. Giering, and E. Bernal, "Generative adversarial networks for depth map estimation from rgb video," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun.18-22, 2018, pp.1177-1185.
[37] P. Chakravarty, P. Narayanan, and T. Roussel, "GEN-SLAM: generative modeling for monocular simultaneous localization and mapping," arXiv:1902.02086.
[38] D. Wofk, F. Ma, T. Yang, S. Karaman, and V. Sze, "FastDepth: fast monocular depth estimation on embedded systems," arXiv:1903.03273.
[39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," arXiv:1612.01105v2.
[40] G. Lin, A. Milan, C. Shen, and I. Reid, "RefineNet: multi-path refinement networks for high-resolution semantic segmentation," arXiv:1611.06612.
[41] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv:1512.03385v1.
[42] J. Yu, Z. Lin, J. Yan, X. Shen, X. Lu, and T. S. Huang, "Generative image inpainting with contextual attention," arXiv:1801.07892.
[43] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. Le, "MnasNet: platform-aware neural architecture search for mobile," arXiv:1807.11626v3.
[44] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "MobileNet v2: inverted residuals and linearbottlenecks, arXiv:1801.04381.
[45] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," arXiv:1603.05027v3, also in Proc. of ECCV Conf., Amsterdam, The Netherlands, Oct.11-14, 2016, pp.630-645.
[46] A. Agarap, "Deep learning using rectified linear units (ReLU)," arXiv:1803.08375v2.
[47] D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv:1412.6980.
[48] Z. Zhang, T. He, H. Zhang, Z. Zhang, J. Xie, and M. Li, "Bag of freebies for training object detection neural networks." arXiv:1902.04103v3.
[49] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, "Image quality assessment: from error visibility to structural similarity," in Proc. of IEEE Trans. on Image Processing, vol.13, no.4, pp.600-612, 2004. |