利用ε-greedy強化基於Transformer的物件偵 測演算法之效能

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：4

、訪客IP：18.191.44.94

姓名

翁崇恒(Chong-Heng Weng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用ε-greedy強化基於Transformer的物件偵測演算法之效能
(Performance Enhancement for Transformerbased Object Detection by ε-Greedy)

相關論文

★ 用於邊緣計算的全新輕量化物件偵測系統	★ 基於標靶訓練策略與強預測器的神經網路架構搜索方法
★ 基於自注意力與擬合平面感知局部幾何之三維點雲分類網路

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-11以後開放)

摘要(中)

物件偵測是電腦視覺中，一項重要的基礎研究項目，而近年來，Detection
Transformer(DETR)類型的模型在這項領域中脫穎而出，最終達到了state-ofthe-art 的效能水準。而這些研究在基礎的DETR 上，提出許多不同的方法，改進了原始DETR 的效能與訓練效率。

然而，我們發現DETR 類型的模型在top K query selection 的環節，可能會有陷入局部最小值的狀況，造成效能無法最佳化。為了改善這個問題，我們在top K query selection 的環節加入了噪音，鼓勵模型去探索更適合預測物件的query。我們的靈感是來自於強化學習中，有ε-greedy 這樣一種方法用來對動作加入噪音。

結合這一個加入噪音的方法以及先前的研究，在COCOval2017 上，運用
ResNet50 的backbone，我們改善了DINO +0.3AP 的效能。這個改善說明了ε-greedy 對於有效減輕陷入局部最小值的負面影響。

摘要(英)

Object detection is a fundamental task in computer vision. To accomplish the object detection goal, the Detection Transformer (DETR) model has emerged as a promising approach for achieving state-of-the-art performance. Since its introduction, several variants of DETR have been proposed with the aim of improving its performance and training efficiency.

However, we find that the DETR-liked model will probably be stuck in a local minimum from top-K query selections, and hence result in inferior performance. To resolve this problem, we add noise to the DETR-liked models with top-K query selections intending to encourage the model to find better queries suitable for bounding box prediction. The rationale is that we are inspired by the ε-greedy idea usually adopted in reinforcement learning which adds noise
to action selection.

Combining this noise-adding scheme with those successful endeavors, it can improve DINO by +0.3AP with the 4 multi-scale feature maps setting on COCOval2017 using a ResNet-50 backbone. These improvements validate that the ε-greedy is effective to reduce the negative effect of being stuck in the local minimum.

關鍵字(中)

★ 深度學習
★ 電腦視覺
★ 物件偵測

關鍵字(英)

★ Deep Learning
★ Computer Vision
★ Object Detection
★ Transformer

論文目次

1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Work 3
2.1 Transformer-Based End-to-End Object Detectors . . . . . . . . . . 3
2.2 Reinforcement Learning and Supervised Learning . . . . . . . . . 7
3 Method 9
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Object Detection Reformulation . . . . . . . . . . . . . . . . . . . . 12
3.3 ε-Greedy Query Selection . . . . . . . . . . . . . . . . . . . . . . . 15
4 Experimental Results 17
4.1 Setup and Environment . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Visualization of Query Selection . . . . . . . . . . . . . . . . . . . 20
4.4 Compare Different ε-Greedy . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Visualization of Query Selection of Different ε-Greedy . . . . . . . 24
4.6 Test on other DETR-liked models . . . . . . . . . . . . . . . . . . . 26
5 Conclusion 28
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

參考文獻

Bibliography
[1] W. Abdulla, Mask r-cnn for object detection and instance segmentation on keras
and tensorflow, https://github.com/matterport/Mask_RCNN, 2017.
[2] K. Arulkumaran, A. Cully, and J. Togelius, “AlphaStar,” in Proceedings of
the Genetic and Evolutionary Computation Conference Companion, ACM, 2019.
DOI: 10.1145/3319619.3321894. [Online]. Available: https://doi.
org/10.1145%2F3319619.3321i894.
[3] M. Bellver, X. Giro-i Nieto, F. Marques, and J. Torres, Hierarchical object
detection with deep reinforcement learning, 2016. DOI: 10.48550/ARXIV.
1611.03718. [Online]. Available: https://arxiv.org/abs/1611.
03718.
[4] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, Yolov4: Optimal speed and
accuracy of object detection, 2020. arXiv: 2004.10934 [cs.CV].
[5] H. Dai, E. B. Khalil, Y. Zhang, B. Dilkina, and L. Song, Learning combinatorial
optimization algorithms over graphs, 2018. arXiv: 1704.01665 [cs.LG].
[6] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, Yolox: Exceeding yolo series in 2021,
2021. arXiv: 2107.08430 [cs.CV].
[7] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., Generative adversarial
networks, 2014. arXiv: 1406.2661 [stat.ML].
[8] D. Jia, Y. Yuan, H. He, et al., “Detrs with hybrid matching,” arXiv preprint
arXiv:2207.13080, 2022.
[9] F. Li, A. Zeng, S. Liu, et al., “Lite detr: An interleaved multi-scale encoder
for efficient detr,” arXiv preprint arXiv:2303.07335, 2023.
[10] F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, “Dn-detr: Accelerate
detr training by introducing query denoising,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
pp. 13 619–13 627.
[11] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense
object detection, 2018. arXiv: 1708.02002 [cs.CV].
[12] T.-Y. Lin, M. Maire, S. Belongie, et al., “Microsoft coco: Common objects in
context,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele,
and T. Tuytelaars, Eds., Cham: Springer International Publishing, 2014,
pp. 740–755, ISBN: 978-3-319-10602-1.
30
[13] S. Liu, F. Li, H. Zhang, et al., “DAB-DETR: Dynamic anchor boxes are better
queries for DETR,” in International Conference on Learning Representations,
2022. [Online]. Available: https://openreview.net/forum?
id=oMI9PjOb9Jl.
[14] D. Meng, X. Chen, Z. Fan, et al., Conditional detr for fast training convergence,
2021. arXiv: 2108.06152 [cs.CV].
[15] G. S. N. U. A. K. Nicolas Carion Francisco Massa and S. Zagoruyko, “Endto-
end object detection with transformers,” in European conference on computer
vision, 2020.
[16] D. Pfau and O. Vinyals, Connecting generative adversarial networks and actorcritic
methods, 2017. arXiv: 1610.01945 [cs.LG].
[17] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” arXiv
preprint arXiv:1612.08242, 2016.
[18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time
object detection with region proposal networks,” in Advances in Neural Information
Processing Systems (NIPS), 2015.
[19] D. Silver, J. Schrittwieser, K. Simonyan, et al., “Mastering the game of go
without human knowledge,” Nature, vol. 550, pp. 354–359, Oct. 2017. DOI:
10.1038/nature24270.
[20] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT
press, 2018.
[21] Z. Tian, C. Shen, H. Chen, and T. He, Fcos: Fully convolutional one-stage
object detection, 2019. arXiv: 1904.01355 [cs.CV].
[22] B. Uzkent, C. Yeh, and S. Ermon, “Efficient object detection in large images
using deep reinforcement learning,” in The IEEE Winter Conference on
Applications of Computer Vision, 2020, pp. 1824–1833.
[23] Z. Yao, J. Ai, B. Li, and C. Zhang, Efficient detr: Improving end-to-end object
detector with dense prior, 2021. arXiv: 2104.01318 [cs.CV].
[24] H. Zhang, F. Li, S. Liu, et al., Dino: Detr with improved denoising anchor boxes
for end-to-end object detection, 2022. arXiv: 2203.03605 [cs.CV].
[25] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr:
Deformable transformers for end-to-end object detection,” arXiv preprint
arXiv:2010.04159, 2020.

指導教授

范國清謝君偉(Kuo-Chin Fan Jun-Wei Hsieh)

審核日期

2023-7-19

推文