dc.description.abstract | Template matching is an important technique in traditional computer vision. However, this technique can be influenced by various factors when searching for objects. For example, variations in size, shape, and color between the template and the search object can severely affect the effectiveness of the search. In recent years, deep learning technology has become increasingly prevalent in computer vision and has achieved significant results in various fields such as recognition, detection, segmentation, and more. Therefore, this study aims to combine the functionality of deep learning in extracting high-level features with the template matching approach to develop a “template matching deep learning” technique that is robust against the aforementioned factors.
Template matching and object detection differ in that, in object detection, the model is trained to search for similar features in an image based on the features it has learned during training. However, users cannot provide arbitrary objects for the model to search for. If the user wants to search for a specific object, the model needs to be trained again. On the other hand, template matching allows users to provide any template image, and the system determines whether the objects in the search image are similar to the objects in the template image. This enables users to search for specific objects they are looking for.
The objective of this study is to utilize a deep learning network architecture to identify specific features within an image. We modified the single-object tracking network, SiamCAR, to perform multi-object template matching. Our modifications include: i. making the network dynamically adjust based on the input image size to enhance the flexibility of the network′s input data format, ii. simplifying and optimizing the feature extraction ubnetwork to improve both the performance and speed of the network, iii. using smaller feature maps for prediction instead of upsampling to improve network speed, iv. incorporating data augmentation and deliberately creating differences between template images and search images, the network can learn a greater variety of variations.
In the experiments, we trained and tested our model using objects on printed circuit boards (PCBs). The dataset consisted of a total of 11,525 image pairs, with 9,267 pairs in the training set and 2,258 pairs in the testing set. The improved matching network achieved a recall rate of 96.74%, precision rate of 94.06%, and F-score rate of 95.38% on the testing set. Furthermore, the speed was improved to 53 ms/img, which is approximately 9 times faster compared to the original 512 ms/img. | en_US |