dc.description.abstract | I employed the YOLOv4 technique to achieve real-time dynamic image detection and segmentation, and I focused on tongue feature recognition to present my results. Tracking dynamic tongue images is a challenge in the research. It is because the pixel distributions of the lip and cheek are similar to that of the tongue. Thus, my research aims to develop a new technique of detection and segmentation to deal with dynamic tongue images. In terms of biomedical applications, this technique can generate "de-identified" tongue images after segmentation. Thus Chinese medical physicians can use tongue diagnosis techniques to identify symptoms or find features related to diseases. YOLOv4 and R-CNN-based methods are two mainstream techniques in the field of computer vision. The R-CNN-based methods can predict the possible locations of multiple objects and then determine the type of the target object after obtaining the target position. Thus, the recognition of R-CNN-based methods is in high accuracy with the costs of high computation complexity and time consumption. On the other hand, the YOLOv4 technique can simultaneously predict the classified result and the location information of an input image. Thus, the YOLOv4 technique has a significant reduction in computational complexity. Additionally, the YOLOv4 technique has a significant progression in accuracy over the YOLOv3 version. Relevant literature shows that under the conditions of TeslaV100 GPU hardware and 54 FPS (frames per second), the YOLOv4 technique will have 41.2% performance in AP (average precision). Under the same hardware conditions, the R-CNN technique has an accuracy of 42.8% in AP, but its FPS is only 9. Therefore, the YOLOv4 architecture became the basic framework for real-time image detection and segmentation in my research. The YOLOv4 technique can only determine object locations but also provide the corresponding coordinates. Thus, it could help us to achieve edge detection and segmentation in practice. To detect the required precise angle of the tongue, I also added the method of negative sampling into the model. Then I proposed a new framework by utilizing a double backbone structure. The preliminary results show that FPS 7-10 can be obtained under Windows compiling with Visual Studio and GTX 1050 Ti and RAM 16GB hardware conditions. In terms of detection accuracy, it precisely generates images of the required angle of the tongue without skew angle circumstances. By utilizing the confidence score provided by YOLOv4, the predicted results can also reach a grade of more than 90% or more. | en_US |