然而,我們發現DETR 類型的模型在top K query selection 的環節,可能會有陷入局部最小值的狀況,造成效能無法最佳化。為了改善這個問題,我們在top K query selection 的環節加入了噪音,鼓勵模型去探索更適合預測物件的query。我們的靈感是來自於強化學習中,有ε-greedy 這樣一種方法用來對動作加入噪音。
結合這一個加入噪音的方法以及先前的研究,在COCOval2017 上,運用 ResNet50 的backbone,我們改善了DINO +0.3AP 的效能。這個改善說明了ε-greedy 對於有效減輕陷入局部最小值的負面影響。;Object detection is a fundamental task in computer vision. To accomplish the object detection goal, the Detection Transformer (DETR) model has emerged as a promising approach for achieving state-of-the-art performance. Since its introduction, several variants of DETR have been proposed with the aim of improving its performance and training efficiency.
However, we find that the DETR-liked model will probably be stuck in a local minimum from top-K query selections, and hence result in inferior performance. To resolve this problem, we add noise to the DETR-liked models with top-K query selections intending to encourage the model to find better queries suitable for bounding box prediction. The rationale is that we are inspired by the ε-greedy idea usually adopted in reinforcement learning which adds noise to action selection.
Combining this noise-adding scheme with those successful endeavors, it can improve DINO by +0.3AP with the 4 multi-scale feature maps setting on COCOval2017 using a ResNet-50 backbone. These improvements validate that the ε-greedy is effective to reduce the negative effect of being stuck in the local minimum.