隨著科技的快速發展,增強現實(AR)、虛擬現實(VR)和混合現實 (MR)已成為備受重視的領域。然而,這些技術在遠端渲染應用中仍 面臨多種挑戰,包括整合AI的語音交互。例如,由於單一伺服器無法滿 足高複雜度圖形渲染和實時AI語音處理的需求,因此在動態環境中往往 顯得不足。隨著使用者負載增加,伺服器有限的CPU和GPU資源很快會 超載,導致性能下降、延遲增加,甚至可能系統崩潰。 為了解決這些挑戰,本論文提出了一種多伺服器架構,旨在有效處 理AR、VR和MR環境中的AI語音交互。通過在多個伺服器之間分配工作 負載並優化資源配置,這種架構增強了整體系統性能並改善了使用者體 驗。此外,伺服器端的物件串流相機設計和本地定位輔助減少了由延遲 引起的定位誤差,超越了現有的方法。;With the rapid advancement of technology, augmented reality (AR), virtual reality (VR), and mixed reality (MR) have become highly regarded fields. However, these technologies still face several challenges in remote rendering applications, including the integration of AI-driven voice interactions. For instance, a single server often proves inadequate in dynamic environments due to its inability to meet the demands of high-complexity graphics rendering and realtime AI voice processing. The limited CPU and GPU resources of a server can quickly become overloaded as user load increases, leading to performance degradation, increased latency, and potential system crashes.To address these challenges, this paper proposes a multi-server architecture designed to effectively handle AI voice interactions within AR, VR, and MR environments. By distributing workload across multiple servers and optimizing resource allocation, this architecture enhances overall system performance and improves user experience. Furthermore, server-side object streaming camera design and local positioning assistance mitigate localization errors induced by latency, surpassing existing methods.