dc.description.abstract | In the current era, research on Remote Robotic Ultrasound Systems (RUS) is flourishing, particularly in areas concerning control and visual processing. This study employs existing deep learning techniques to eliminate occlusions in the operating field of view. First, the real time streaming images from the camera are processed by our trained YOLOv8 instance segmentation model to detect whether any obstacles appear, and the obstacles in the detected regions are masked out. Next, the missing regions caused by the removal of these obstacles are restored using a state-of-the-art video inpainting model called Decoupled Spatial Temporal Transformer (DSTT). The DSTT model calculates an attention value based on previous frames to fill in the missing regions in the current frame. For the initial image sequence, we insert frames without any obstacle to facilitate subsequent inpainting, thereby achieving the goal of removing occlusions. In order to realize automated scanning and reduce manual operation, we use an Intel depth camera to perform 3D surface modeling of the object to be inspected. The modeling results are transmitted to a reinforcement learning environment for simulation, as well as sent back to the operator’s Graphical User Interface (GUI). Through the GUI, the operator selects the desired endpoint for scanning. The scanning process is then simulated within the reinforcement learning environment to validate the path and ensure safety. Once validated, the joint angle data from the simulation is sent to the robotic arm to perform the corresponding scanning action. Experimental results confirm that the 3D modeling approach is applicable to objects of different shapes (planar, inclined, spherical). For planar and inclined surfaces, the depth error is less than 0.041 cm, and the area error is under 3.3%. For spheres, at tangential angles below 25.6°, the depth error is less than 0.17 cm. In the real-time operational images, once 11% of the occlusion appears, it can be detected and removed from the operator’s live view. Further experiments demonstrate that inserting an obstacle-free initial image into the sequence improves both PSNR and SSIM. This study applies existing deep learning technologies to remote RUS, successfully addressing clinical challenges related to occlusion and automated scanning, thereby enhancing both efficiency and safety of the system. | en_US |