應用於物件偵測之動態注意力圖神經網路區塊

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：122

、訪客IP：18.216.21.252

姓名

鄭皓中(Hao-Chung Cheng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

應用於物件偵測之動態注意力圖神經網路區塊
(Dynamic Graph Attention Blocks on Object Detection)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

在深度學習電腦視覺領域裡，物件偵測任務一直是廣泛受到討論與重視的研究領域，在現實世界中存在廣大的應用場景，使其成為不可撼動的重要研究項目。相關的模型方法也不斷推陳出新，不管是基於卷積神經網路或是基於 Transformer 架構的模型都在持續發展中，但圖神經網路在這方面的應用卻不多，尤其是二維影像的應用，使我們想探討圖神經網路在二維影像物件偵測任務上的可能性。
圖神經網路近期逐漸受到重視，歸功於其對圖資料結構良好的表示能力，使其能夠
探索不規則鄰居節點的關係。先前有部分研究工作將圖神經網路架設在卷積神經網路上，並探索此方法帶來的性能提升，但其存在實驗比較對象不夠適當、其圖結構的鄰居與邊是依據固定空間範圍建立，如此可能使感受野及探索能力受到限制，甚至可能降低圖神經網路探索能力等問題。
我們針對上述問題提出了可模組化的動態注意力圖神經網路區塊(Dynamic Graph Attention Blocks)，引入可變卷積來增加圖神經網路的探索能力，使其建邊的方式由固定改為動態建立，讓模型可以自己學習找到更好的特徵做卷積，同時將模組架設在 state-of-the-art 物件偵測器上進行實驗。經由實驗顯示我們的方法可以得到匹配或稍加的表現。

摘要(英)

In the realm of deep learning for computer vision, the object detection tasks have always been a widely discussed and emphasized research area. There are many application scenarios in the real world, making it an unshakable research area. Various models are constantly being evolved, whether based on convolutional neural networks or Transformer architectures, both of which are in continuous development. However, the application of graph neural networks in this area is not common, especially in the case of 2D images, which prompts us explore the potential of graph neural networks in 2D image object detection tasks.
Graph neural networks have recently gained attention due to their strong representation ability for graph data structures, enabling them to explore the relationship of irregular
neighborhood nodes. Some previous research efforts have combined graph neural networks
with convolutional neural networks and explored the performance improvements brought by this method. But the experimental comparisons are not suitable enough, and the neighbors and edges of the graph structure are established based on a fixed spatial range, which may limit the receptive field and exploration capabilities, and may even reduce the exploration ability of graph neural networks.
To address these issues, we propose a modular Dynamic Graph Attention Blocks, which introduces deformable convolutions to enhance the exploration capabilities of graph neural networks. This change allows the edges to be dynamically established rather than fixed, enabling the model to learn to find better features for convolution. Simultaneously, we integrate the module into state-of-the-art object detector for experiments. Our experiments show that our method can achieve comparable or slightly improved performance.

關鍵字(中)

★ 物件偵測
★ 圖神經網路
★ 可變卷積
★ 圖注意力網路

關鍵字(英)

論文目次

中文摘要 ... i
Abstract ... ii
章節目次 ... iii
圖目錄 ... v
表目錄 ... vi
第一章緒論 ... 1
1.1 背景 ... 1
1.2 研究動機與目的 ... 3
1.3 研究方法與章節概要 ... 3
第二章文獻探討 ... 5
2.1 物件偵測 Object Detection ... 5
2.1.1. 二階段偵測器 ... 6
2.1.2. 一階段偵測器 ... 8
2.1.3. Transformer 物件偵測 ... 9
2.2 可變卷積網路 Deformable Convolutional Network (DCN) ... 10
2.2.1. 可變卷積 Deformable Convolution ... 11
2.2.2. 變異模型 ... 12
2.3 圖神經網路 Graph Neural Network (GNN) ... 13
2.3.1. 圖注意力網路(Graph Attention Networks) ... 14
2.3.2. 圖神經網路應用於物件偵測 ... 15
第三章研究內容與方法 ... 17
3.1 定義與符號 ... 17
3.2 模型架構 ... 18
3.3 動態注意力圖神經網路模組(Dynamic Graph Attention Module) ... 20
第四章實驗結果與討論 ... 24
4.1 實驗設備 ... 24
4.2 資料集 ... 24
4.2.1. COCO 2017 Object Detection ... 25
4.3 實驗參數 ... 26
4.4 實驗結果與討論 ... 27
4.4.1. 與即時偵測器 SOTA 比較 ... 28
4.4.2. 消融實驗 ... 29
第五章結論及未來方向 ... 34
參考文獻 ... 35

參考文獻

[1] A. Krizhevsky, I. Sutskever, & G.E. Hinton, (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60, 84 - 90.
[2] K. Simonyan, & A. Zisserman, (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
[3] K. He, X. Zhang, S. Ren, & J. Sun, (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
[4] R.B. Girshick, J. Donahue, T. Darrell, & J. Malik, (2013). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
[5] R.B. Girshick, (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440-1448.
[6] S. Ren, K. He, R.B. Girshick, & J. Sun, (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
[7] J. Redmon, S.K. Divvala, R.B. Girshick, & A. Farhadi, (2015). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788.
[8] J. Redmon, & A. Farhadi, (2016). YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525.
[9] J. Redmon, & A. Farhadi, (2018). YOLOv3: An Incremental Improvement. ArXiv, abs/1804.02767.
[10] A. Bochkovskiy, C. Wang, & H.M. Liao, (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv, abs/2004.10934.
[11] G. Jocher, (2022). YOLOv5 release v6.1 https://github.com/ ultralytics/yolov5/releases/tag/v6.1. 2022
[12] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, & X. Wei, (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. ArXiv, abs/2209.02976.
[13] C. Wang, A. Bochkovskiy, & H.M. Liao, (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. ArXiv, abs/2207.02696.
[14] E. Shelhamer, J. Long, & T. Darrell, (2014). Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440.
[15] O. Ronneberger, P. Fischer, & T. Brox, (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv, abs/1505.04597.
[16] A. Vaswani, N.M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, & I. Polosukhin, (2017). Attention is All you Need. ArXiv, abs/1706.03762.
[17] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, & S. Zagoruyko, (2020). End-to-End Object Detection with Transformers. ArXiv, abs/2005.12872.
[18] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, & N. Houlsby, (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929.
[19] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, & B. Guo, (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002.
[20] W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, T. Lu, P. Luo, & L. Shao, (2021). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 548-558.
[21] M. Defferrard, X. Bresson, & P. Vandergheynst, (2016). Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. NIPS.
[22] T. Kipf, & M. Welling, (2016). Semi-Supervised Classification with Graph Convolutional Networks. ArXiv, abs/1609.02907.
[23] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio’, & Y. Bengio, (2017). Graph Attention Networks. ArXiv, abs/1710.10903.
[24] K. Han, Y. Wang, J. Guo, Y. Tang, & E. Wu, (2022). Vision GNN: An Image is Worth Graph of Nodes. ArXiv, abs/2206.00272.
[25] G. Zhao, W. Ge, & Y. Yu, (2021). GraphFPN: Graph Feature Pyramid Network for Object Detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2743-2752.
[26] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, & Y. Wei, (2017). Deformable Convolutional Networks. 2017 IEEE International Conference on Computer Vision 37(ICCV), 764-773.
[27] T. Lin, P. Dollár, R.B. Girshick, K. He, B. Hariharan, & S.J. Belongie, (2016). Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944.
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, & A. Rabinovich, (2014). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9.
[29] X. Zhu, H. Hu, S. Lin, & J. Dai, (2018). Deformable ConvNets V2: More Deformable, Better Results. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9300-9308.
[30] W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, X. Wang, & Y. Qiao, (2022). InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. ArXiv, abs/2211.05778.
[31] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, & J. Dai, (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. ArXiv, abs/2010.04159.
[32] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, & P.S. Yu, (2019). A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 32, 4-24.
[33] J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, & G.E. Dahl, (2017). Neural Message Passing for Quantum Chemistry. ArXiv, abs/1704.01212.159.
[34] T. Lin, M. Maire, S.J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, & C.L. Zitnick, (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision.
[35] L.N. Smith, & N. Topin, (2017). Super-convergence: very fast training of neural networks using large learning rates. Defense + Commercial Sensing.
[36] Z. Ge, S. Liu, F. Wang, Z. Li, & J. Sun, (2021). YOLOX: Exceeding YOLO Series in 2021. ArXiv, abs/2107.08430.
[37] S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, & B. Lai, (2022). PP-YOLOE: An evolved version of YOLO. ArXiv, abs/2203.16250.
[38] C. Wang, I. Yeh, & H. Liao, (2021). You Only Learn One Representation: Unified Network for Multiple Tasks. J. Inf. Sci. Eng., 39, 691-709.
[39] S. Elfwing, E. Uchibe, & K. Doya, (2017). Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. Neural networks : the official journal of the International Neural Network Society, 107, 3-11 .
[40] A.L. Maas, (2013). Rectifier Nonlinearities Improve Neural Network Acoustic Models.

指導教授

王家慶

審核日期

2023-5-25

推文