以非局部的解碼器-擠壓-激勵網路及自適應深度列表達成基於編碼器-解碼器的單鏡頭深度估計任務

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.141.3.50

姓名

萬偉中(Wei-Chung Wan) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

以非局部的解碼器-擠壓-激勵網路及自適應深度列表達成基於編碼器-解碼器的單鏡頭深度估計任務
(Monocular Depth Estimation based on encoder-decoder with Non-Local Decoder-Squeeze-Excitation Network and Adaptive Depth List)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

單鏡頭深度估計是計算機視覺中的一個重要議題。近年來，基於CNN（卷積神經網路）的做法中的端到端的編碼器-解碼器(Encoder-Decoder)架構中展現了合理的結果。對於編碼器部分，大部分的研究是基於一個強大的特徵擷取器來獲得良好的特徵，並利用此特徵進行上採樣重構深度圖像。在一個強大的編碼器下，人們發現即使是簡單的上採樣過程也能達到良好的準確度。然而，為了達到更高質量的深度估計，改善在解碼器部分更為關鍵。即使現在，很少有合理且確切有用的方法對上採樣過程做出貢獻。在本文中，我們提出了一個新穎的單鏡頭深度估計網路架構設計。更準確卻說，我們提出了一個基於CNNs網路的模組，該模組以從全局的角度來考慮整個上採樣過程。提出的模組設計是基於SE-Net的概念，並通過全局視角關注機制對整個解碼器中的不同解析度的特徵圖進行了適當的重新校準，該模組為解碼器-擠壓-激勵模組(DSE)。我們更進一步將其與非局部網路注意機制結合起來，並完成設計了用於整個上採樣過程的非局部解碼器-擠壓-激勵 (Non-Local Decoder-Squeeze-and-Excitation: NL-DSE)模組。此外，我們還提出了一個自適應深度列表（Adaptive Depth List: ADL）的輸出限制範圍方法，以提高近距離估計的準確度。結合這些建議的技術，我們的結果在NYU Depth估計資料集V2 (NYU Depth V2)上進行了評估，並在準確度上達到目前CNN的state-of-the-art的做法。

摘要(英)

Monocular depth estimation is an essential topic in computer vision. In recent years, the CNNs (Convolutional Neural Networks) based model shows the reasonable result from an end-to-end encoder-decoder architecture. For the encoder part, most of the research is based on a robust feature extractor to get good features. With a strong encoder, it was found that even simple up-sampling processes can achieve good accuracy. However, in the decoder part, it is more critical in a high-quality depth estimation task. Even now, few reasonable methods contribute to the up-sampling process. In this paper, we present a novel monocular depth estimation design. We propose an innovative CNN-based network module that considers the whole up-sampling process globally. This design is based on the concept of SE-Net, and properly recalibrated the feature maps with a global perspective attention mechanism. We further combine it with Non-local network attention mechanisms to design the Non-Local Decoder-Squeeze-and-Excitation (NL-DSE) module for the whole up-sampling process. Furthermore, we also propose an output limiting range method called Adaptive Depth List (ADL) to enhance the precision of the near distance estimation. Combining with these proposed techniques, our results are evaluated on the NYU Depth V2 dataset and outperforms the state-of-the-art CNN-based approaches in accuracy.

關鍵字(中)

★ 單鏡頭深度估計任務
★ 解碼器-擠壓-激勵網路
★ 自適應深度列表

關鍵字(英)

論文目次

摘要 ...............I
ABSTRACT ...............II
1. 序論 ...............1
1.1. 研究背景 ...............1
1.2. 模型架構設計的研究方向 ...............3
1.3. 論文架構 ...............5
2. 文獻探討 ...............6
2.1. 離散輸出的作法 ...............7
2.2. 輸出形狀校正 ...............8
2.3. 注意力機制 ...............10
3. 提出方法概述 ...............13
3.1. 動機 ...............13
3.2. 提出方法的演變流程及最終的作法 ...............14
4. 在單鏡頭深度估計任務上提出的關鍵模組 ...............17
4.1. 解碼器-擠壓-激勵（DECODER-SQUEEZE-AND-EXCITATION）的重新校準模組(RECALIBRATION MODULE ) ...............17
4.2. 帶有注意力圖(ATTENTION MAP)的非局部模組(NON-LOCAL BLOCK) ...............20
4.3. 非局部機制的解碼器-擠壓-激勵(NL-DSE)的校準模組 (NON-LOCAL DECODER-SQUEEZE-AND-EXCITATION RECALIBRATION) ...............22
4.4. 自適應深度列表(ADL: ADAPTIVE DEPTH LIST) ...............24
5. 實驗結果 ...............26
5.1. 資料集 ...............26
5.2. 實作細節 ...............28
5.3. 驗證 ...............30
5.4. 比較 ...............30
5.5. 消融驗證 ...............34
5.6. 討論 ...............41
5.7. 運算量討論 ...............42
6. 結論 ...............44
參考文獻 ...............45

參考文獻

[1] Ruofei Du, Eric Lee Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Moura e Silva Cruces, Shahram Izadi, Adarsh Kowdle, Konstantine Nicholas John Tsotsos, and David Kim. Depthlab: Real-time 3d interaction with depth maps for mobile augmented reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, page 15, 2020.
[2] W. Lee, N. Park, and W. Woo. Depth-assisted real-time 3d object detection for augmented reality. ICAT11, 2:126–132, 2011.
[3] C. Hazirbas, L. Ma, C. Domokos, and D. Cremers. Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In ACCV, 2016.
[4] F. Moreno-Noguer, P. N. Belhumeur, and S. K. Nayar. Active refocusing of images and videos. ACM Trans. Graph., 26(3), July 2007.
[5] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 47(1-3):7–42, 2002.
[6] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. I. J. Robotics Res., 32:1231–1237, 2013.
[7] R. Ranftl, V. Vineet, Q. Chen, and V. Koltun. Dense monocular depth estimation in complex dynamic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4058–4066, 2016.
[8] R. Basri, D. Jacobs, and I. Kemelmacher. Photometric stereo with general, unknown lighting. International Journal of Computer Vision, 72(3):239–257, 2007.
[9] A. Saxena, S. H. Chung, and A. Y. Ng. Learning depth from single monocular images. In Advances in neural information processing systems, pages 1161–1168, 2006.
[10] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. In Computer Vision – ECCV 2012, pages 746– 760, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
[11] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In NIPS, 2014.
[12] Ibraheem Alhashim and Peter Wonka. High quality monocular depth estimation via transfer learning. CoRR, abs/1812.11941, 2018.
[13] Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. In: Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence. pp. 694–700. AAAI Press (2019)
[14] Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, and Janne Heikkila. Guiding monocular depth estimation using depth-attention volume. arXiv preprint arXiv:2004.02760, 2020.
[15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision, 2015.
[16] J. Hu, L. Shen, S. Albanie, G. Sun and E. Wu, "Squeeze-and-Excitation Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011-2023, 1 Aug. 2020, doi: 10.1109/TPAMI.2019.2913372.
[17] Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7794–7803 (2018)
[18] Mingxing Tan and Quoc V. Le. Efﬁcientnet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 2019.
[19] Amlaan Bhoi. Monocular Depth Estimation: A Survey. CoRR, abs/1901.09402, 2019
[20] Chaoqiang Zhao, Qiyu Sun, Chongzhen Zhang, Yang Tang, Feng Qian. Monocular Depth Estimation Based On Deep Learning: An Overview. CoRR, abs/2003.06620, 2020
[21] Huan Fu, Mingming Gong, Chaohui Wang, Nematollah Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2002–2011, 2018.
[22] Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and Il Hong Suh. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326, 2019.
[23] Lukas Liebel, Marco Körner. MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification arXiv preprint arXiv: 1907.11111, 2019.
[24] W. Yin, Y. Liu and C. Shen, “Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3097396.
[25] Michael Ramamonjisoa and Vincent Lepetit. Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV) Workshops, Oct 2019.
[26] Zhixiang Hao, Yu Li, Shaodi You, and Feng Lu. Detail preserving depth estimation from a single image using attention guided networks. 2018 International Conference on 3D Vision (3DV), pages 304–313, 2018.
[27] Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. Deeper depth prediction with fully convolutional residual networks. 2016 Fourth International Conference on 3D Vision (3DV), pages 239–248, 2016.
[28] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015
[29] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
[30] Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. aXiv:1502.03167
[31] A. Levin, D. Lischinski, and Y. Weiss. Colorization using optimization. ACM Trans. Graph., 23:689–694, 2004.
[32] Ravi Garg, Vijay Kumar B.G., Gustavo Carneiro, and Ian Reid. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 740–756, Cham, 2016. Springer International Publishing.
[33] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d′Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 8026–8037. Curran Associates, Inc., 2019.
[34] Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980v9
[35] J. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13:600–612, 2004.
[36] Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1043–1051, 2018.
[37] Xiaotian Chen, Xuejin Chen, and Zheng-Jun Zha. Structure-aware residual pyramid network for monocular depth estimation. In Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence, IJCAI-19, pages 694–700. International Joint Conferences on Artiﬁcial Intel- ligence Organization, 7 2019.
[38] M. Song, S. Lim and W. Kim, “Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 11, pp. 4381-4393, Nov. 2021, doi: 10.1109/TCSVT.2021.3049869.
[39] Fayao Liu, Chunhua Shen, Guosheng Lin, and I. Reid. Learning depth from single monocular images using deep convolutional neural ﬁelds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:2024–2039, 2016.
[40] Cl´ement Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsupervised monocular depth estimation with left-right consistency. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6602–6611, 2017.
[41] Yevhen Kuznietsov, Jörg Stückler, and Bastian Leibe. Semisupervised deep learning for monocular depth map prediction. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2215–2223, 2017.
[42] Yukang Gan, Xiangyu Xu, Wenxiu Sun, and Liang Lin. Monocular depth estimation with afﬁnity, vertical pooling, and label enhancement. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 232–247, Cham, 2018. Springer International Publishing.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2022-4-15

推文