本論文提出結合大型語言模型於手臂控制的實際應用情境與實現成果。大型語言模型提供高度的語言理解能力,而透過開放式詞彙物件偵測模型,結合夾取點姿態演算法與語言模型的路徑規劃能力,實現在現實世界中的大型語言模型機械手臂助手。為增強大型語言模型機械手臂助手的影像偵測能力,本論文提出使用雙目深度相機取得環境中的即時影像與深度資訊,系統會在使用者提出需求時分析出任務目標中所需要的物件名稱,再透過開放式語意物件偵測模型來偵測環境中物件的三維立體座標,並以獲取的3D點雲資料來與位置生成手臂夾爪的抓取姿態,最後將兩者資料與結果送回大型語言模型後判別使用者提出的需求的可行性,並且生成一個可以供機械手臂執行的移動路徑。使用者可以透過語言模型的反饋來確認大型語言模型生成出的路徑規劃,並確認其路徑的真實性與可靠性,在確認後大型語言模型所生成的路徑規劃會以ROS2通訊做為控制的方式傳送移動路徑到手臂並執行以完成使用著的需求。為了提高生成路徑的可靠性,系統在執行手臂控制前會先以一套基於逆向運動學與重新規劃模型的回饋除錯迴圈來驗證機械手臂移動路徑的正確於否,並且在生成路徑有逆運動學錯誤點發生時執行該路徑的重新生成。本論文主要貢獻為大型語言模型在人機合作類型的機械手臂上的應用實現,提出語言模型與實體機械手臂的通訊方法、語言模型在實體控制上的路徑規劃與環境理解能力,泛化物體偵測與夾取姿態生成等大型語言模型的設計與應用,並且成功的在單物體夾取任務中實現95%的任務高成功率,在非明確名稱的夾取任務中實現58.3%的任務成功率,在非明確需求的任務中實現45%的任務稱功率。;This thesis presents the practical implementation and results of integrating large language models (LLMs) into real-world robotic arm control scenarios. LLMs provide advanced language understanding capabilities which, when combined with open-vocabulary object detection models, grasp pose generation algorithms, and LLM-driven path planning, enable the realization of a robotic arm assistant operating in the physical world. To enhance the visual perception abilities of the LLM-based robotic arm assistant, this work utilizes a binocular depth camera to acquire real-time RGB and depth information of the environment. When the user issues a request, the system analyzes and extracts the object names required for the task, then detects the corresponding objects and their 3D coordinates using an open-vocabulary object detection model. which are used to generate feasible grasp poses for the robotic gripper. These results are then fed back to the LLM, which determines the feasibility of fulfilling the user’s request and generates an executable motion path for the robotic arm. Once confirmed, the planned path is transmitted to the robotic arm via ROS2 communication for execution, thereby completing the user’s task. To further improve the reliability of the generated paths, the system performs a verification loop based on inverse kinematics and a re-planning mechanism before executing the arm movements. The main contributions of this thesis lie in demonstrating the application of LLMs for human-robot collaborative tasks by proposing a communication framework between language models and physical robotic arms, integrating LLM-based path planning and environmental understanding, and applying generalized object detection and grasp pose generation. Experimental results show that the proposed system achieves a 95% task success rate for single-object grasping tasks, a 58.3% success rate for tasks with ambiguous object names, and a 45% success rate for tasks with vague user intents, validating the feasibility and effectiveness of this LLM-driven robotic arm assistant in practical scenarios.