English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 83776/83776 (100%)
造訪人次 : 59217812      線上人數 : 729
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98212


    題名: 基於 SQL 查詢負載評估的 Presto auto-scaling 機制;Presto auto-scaling mechanism based on SQL query workload estimation
    作者: 林彥豪;Lin, Yen-Hao
    貢獻者: 資訊工程學系在職專班
    關鍵詞: Presto;Auto Scaling;機器學習;資源管理;雲端成本;Presto;Auto Scaling;Machine Learning;Resource Management;Cloud Cost
    日期: 2025-08-18
    上傳時間: 2025-10-17 12:29:57 (UTC+8)
    出版者: 國立中央大學
    摘要: 隨著大數據時代數據量快速增長,對於高效數據處理技術的需求日益迫切。 Massively Parallel Processing (MPP) 技術作為處理大規模數據查詢的架構,其中開源的 Presto SQL 查詢引擎因其高效能而廣受關注。然而 Presto 原生的靜態資源配置策略難以適應動態變化的查詢負載,會導致資源利用率低或不必要的浪費,進而影響查詢效能與成本效益。為解決此問題,本研究提出一種基於 SQL 查詢語句進行負載評估的 Presto Auto Scaling 機制。該機制整合了 TF-IDF 向量化進行 SQL 語句的特徵提取,並利用 XGBoost 建立分類模型來預測資源的使用範圍,結合基礎設施即代碼 (Infrastructure as Code) 技術與 Apache Airflow 自動化工作流程,實現 Presto 工作節點的調整。本研究在 Oracle Cloud Infrastructure 環境中,使用 TPC-DS Benchmark 測試集對該機制進行了驗證。實驗結果表明,相較於一個 Coordinator 搭配三個 Worker 的配置,本研究的 Auto Scaling 機制使 99 筆 TPC-DS 查詢的總執行時間減少了14%,並且 92% 的查詢延遲下降,其中聚合查詢的查詢延遲降低了 20%。且得益於查詢執行的時間降低,整體的成本效益提升將近 5%,然而整體雲端運算成本增加了 12%,主要因平均節點數從 4 個增加至 5.32 個所致。儘管查詢效能得到優化,實驗亦觀察到 69% 的查詢記憶體使用量有所上升,平均增加 15-20%,間接導致查詢執行時間未達預期低值;初步分析可能歸因於在增加 Presto Worker 之後,資料在 Worker 之間交互傳遞所造成的影響。這些結果顯示在動態資源配置中,實現查詢效能與成本控制之間的權衡仍面臨挑戰。;With the rapid growth of data in the era of big data, the demand for efficient data processing technologies has become increasingly urgent. Massively Parallel Processing (MPP) technology has emerged as a key architecture for handling large-scale data queries, and among MPP solutions, the open-source Presto SQL query engine has gained wide attention for its high performance. However, Presto′s native static resource allocation strategy struggles to adapt to dynamically changing query workloads, often leading to low resource utilization or unnecessary waste, which in turn affects query performance and cost-efficiency. To address this issue, this study proposes a Presto Auto Scaling mechanism based on workload assessment through SQL queries. The mechanism integrates TF-IDF vectorization for SQL feature extraction and leverages an XGBoost-based classification model to predict the required resource range. It also combines Infrastructure as Code (IaC) practices and Apache Airflow automation workflows to dynamically adjust the number of Presto worker nodes. The proposed mechanism was validated in an Oracle Cloud Infrastructure environment using the TPC-DS benchmark dataset. Experimental results show that, compared to a static configuration with one coordinator and three workers, the Auto Scaling mechanism reduced the total execution time of 99 TPC-DS queries by 14%, with 92% of the queries experiencing reduced latency. In particular, the latency of aggregate queries decreased by 20%. Additionally, due to the shortened execution times, overall cost-efficiency improved by nearly 5%. However, total cloud computing costs increased by 12%, mainly because the average number of worker nodes rose from 4 to 5.32. Despite the improvements in query performance, the experiments also observed increased memory usage in 69% of the queries, with an average increase of 15–20%, which indirectly prevented execution times from reaching their expected lows. Preliminary analysis suggests that this may be due to the increased data exchange overhead among worker nodes after scaling out. These results indicate that in dynamic resource allocation scenarios, achieving a balance between query performance and cost control remains a challenging task.
    顯示於類別:[資訊工程學系碩士在職專班 ] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML24檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明