| 摘要: | 隨著深度學習技術的快速發展,模型部署面臨格式轉換、性能調整、硬體適配等繁瑣流程,傳統方法需要大量人工介入且容易因經驗不足導致部署失敗,此外,部署後缺乏有效的即時監控機制,難以及時發現性能異常。本研究提出一個具備自動化模型優化評估和即時推論資源監控功能的AI部署平台,該平台整合TensorRT優化引擎、Triton 推論服務、Prometheus 監控系統與Grafana 視覺化工具,建立完整的端到端自動化部署工作流程。系統採用微服務架構設計,透過自動化模型轉換與評估機制,能同時處理多種精度格式、批次大小與圖像解析度組合,並結合智慧化推薦系統與視覺化 決策支援工具,協助使用者根據實際需求選擇最適合的模型配置。同時建立雙層監控架構,整合硬體資源監控與推論性能追蹤,形成完整的監控回饋閉環機制。研究以YOLOv8n 模型在COCO2017-pose 資料集上進行實驗驗證,硬體環境為NVIDIA RTX 3090 GPU,透過高負載延遲優化、VRAM受限資源平衡、多目標綜合優化三個典型應用案例證實平台的實用性與有效性。與現有主流解決方案的比較分析顯示,本系統在深度優化整合、批量自動化處理、量化決策支援等方面具備顯著技術優勢。本研究的核心貢獻包括:建構了涵蓋模型轉換、性能測試、智慧推薦與即時監控的完整自動化工作流程;開發了基於多維度量化分析的智慧化推薦決策機制;實現了硬體資源與推論性能的雙層監控體系。這些技術創新有效降低了深度學習模型部署的技術門檻與人力成本,提升了部署過程的可靠性與效率,為AI技術的產業化應用與規模化部署提供了重要的技術支撐與解決方案。;With the rapid development of deep learning technology, model deployment faces cumbersome processes including format conversion, performance tuning, and hardware adaptation. Traditional methods require extensive manual intervention and are prone to deployment failures due to insufficient experience. Furthermore, the lack of effective real-time monitoring mechanisms after deployment makes it difficult to detect performance anomalies promptly. This research proposes an AI deployment platform with automated model optimization evaluation and real-time inference resource monitoring capabilities. The platform integrates TensorRT optimization engine, Triton inference server, Prometheus monitoring system, and Grafana visualization tools to establish a complete end-to-end automated deployment workflow. The system adopts a microservices architecture design and, through automated model conversion and evaluation mechanisms, can simultaneously handle multiple combinations of precision formats, batch sizes, and image resolutions. Combined with an intelligent recommendation system and visual decision support tools, it assists users in selecting the most suitable model configuration based on actual requirements. Meanwhile, a dual-layer monitoring architecture is established, integrating hardware resource monitoring with inference performance tracking to form a complete monitoring feedback closed-loop mechanism. The research conducts experimental validation using the YOLOv8n model on the COCO 2017-pose dataset with NVIDIA RTX 3090 GPU as the hardware environment. Through three typical application scenarios—high-load latency optimization, VRAM-constrained resource balancing, and multi-objective comprehensive optimization—the platform′s practicality and effectiveness are demonstrated. Comparative analysis with existing mainstream solutions shows that this system possesses significant technical advantages in deep optimization integration, batch automated processing, and quantitative decision support. The core contributions of this research include: constructing a complete automated workflow covering model conversion, performance testing, intelligent recommendation, and real-time monitoring; developing an intelligent recommendation decision mechanism based on multi-dimensional quantitative analysis; implementing a dual-layer monitoring system for hardware resources and inference performance. These technological innovations effectively reduce the technical barriers and labor costs of deep learning model deployment, improve the reliability and efficiency of the deployment process, and provide important technical support and solutions for the industrialization and large-scale deployment of AI technology. |