AI-Qos具反應時間保證與品質可控之AI技術;Latency-Aware Inference Techniques with Controllable Quality and Response-Time Guarantees

NCUIR > College of Electrical Engineering & Computer Science > Executive Master of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/99389

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/99389

Title:	AI-Qos具反應時間保證與品質可控之AI技術;Latency-Aware Inference Techniques with Controllable Quality and Response-Time Guarantees
Authors:	廖上華;Liao, Shang-Hua
Contributors:	資訊工程學系在職專班
Keywords:	即時串流;網路延遲;物件分割推論
Date:	2026-01-27
Issue Date:	2026-03-06 18:52:26 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著即時 AI 應用與串流推論服務的快速發展,AI 推論系統在實際部署環境中所面臨之反應時間與服務穩定性問題日益顯著。傳統推論架構多以固定模型與離線準確率為主要設計考量,難以因應網路延遲波動、模型計算成本差異與系統負載變化所造成的端到端延遲累積,進而影響即時性與推論結果之可用性。為回應實際應用場域需求,本研究提出一套具反應時間感知與品質可控之AI-QoS 推論架構,透過即時延遲量測與場景特性分析,動態調整推論策略以兼顧即時性與推論品質,並設計自適應模型調度機制,於不同延遲與系統負載條件下選擇最適推論模型。實驗結果顯示,所提出之方法能有效降低端到端延遲對推論品質之影響,並在動態環境中維持推論服務之穩定性與可預測性,證實其於即時串流推論與實際 AI 應用部署中具備可行性與實務價值。;The rapid growth of real-time AI applications and streaming inference services has made responsiveness and service stability critical challenges in deployed AI systems. Conventional inference pipelines rely on fixed models and offline accuracy metrics, which are inadequate for handling end-to-end latency accumulation caused by network variability, heterogeneous model computational costs, and dynamic system workloads, resulting in degraded timeliness and inference usability. This work proposes a latency-aware and quality-controllable AI-QoS inference framework that dynamically adapts inference strategies based on real-time latency measurements and scene characteristics. An adaptive model scheduling mechanism is introduced to select suitable inference models under varying latency and resource conditions, balancing inference quality and responsiveness. Experimental results show that the proposed framework effectively mitigates the impact of end-to-end latency on inference quality while maintaining stable and predictable performance in dynamic environments, demonstrating its practicality for real-time streaming inference and deployed AI applications.
Appears in Collections:	[Executive Master of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	192	View/Open

社群 sharing

Loading...