台灣的聽障人口數超過 13 萬人,手語是這些人的主要溝通方式。對於手語翻譯以及手語辨識等應用,準確的手部姿態預測模型至關重要。然而,由於雙手的互動與手部的遮擋,此任務對單一鏡頭的 RGB 影像是一大挑戰。因此,本研究旨在提升雙手手語場景的手部姿態預測結果。 本論文提出了一種應用 Extract-and-adaptation network(EANet)與彩色手套的手部姿態預測方法,並針對彩色手套手語影像進行優化。我 們使用將資料集渲染成彩色手套的方式增加手指的資訊,並採用基於Transformer 架構的 EANet 進行模型訓練,再使用多種影像處理技術來優化手部關鍵點的預測結果。實驗結果顯示,該方法在彩色手套手語資料 集上完整偵測雙手的穩定性高於 Mediapipe 55%,亦在測試資料集中得到 比使用原始資料集訓練的 EANet 更好的結果。;With over 130,000 hearing-impaired individuals in Taiwan, sign language serves as their primary mode of communication. Accurate hand pose estimation models are crucial for applications such as sign language translation and recognition. However, due to interactions between two hands and occlusions, this task poses a significant challenge for single RGB images. This study aims to enhance hand pose estimation in two-hand sign language scenarios. This research proposes a hand pose estimation method using Extract-and-adaptation network (EANet) and colored gloves, optimized for sign language images with colored gloves. We enhance finger information by rendering the dataset into colored gloves and employ a Transformer-based EANet for model training. Additionally, multiple image processing techniques were employed to optimize the prediction result of hand keypoints. Experimental results demonstrate that our method achieves a 55% higher stability in detecting two hands on sign language datasets compared to Mediapipe and yields superior results on test datasets compared to EANet trained on the original dataset.