中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/95450
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 40146160      Online Users : 237
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/95450


    Title: 基於Apache Airflow工作流程管理之自動化資料管線監控機制;Automated Data Pipeline Monitoring Mechanism Based on Apache Airflow Workflow Management
    Authors: 余忠訓;Yu, Zhong-Xun
    Contributors: 資訊工程學系在職專班
    Keywords: 工作流程管理;資料管線;監控;Apache Airflow;Microsoft SQL Server;預存程序;Workflow Management;Data Pipeline;Monitoring;Apache Airflow;Microsoft SQL Server;Stored Procedure
    Date: 2024-07-26
    Issue Date: 2024-10-09 16:52:01 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 在全球大數據時代的趨勢下,如何運用大數據已成為最重要的企業資源之一,儼然成為企業的優勢和競爭力,大數據進行分析之前的資料處理,需要從不同資料源的蒐集資料,轉換資料型別到一連串進行資料前處理、聚合、轉換和過濾,在資料流處理過程中,ETL是一種資料擷取(Extract)、轉換(Transform)和載入(Load)的資料梳理作法,當資料源的異質性和資料量逐漸變大,整體資料流的維護和管理問題,將成為企業降低成本和提升生產力效率的關鍵。近年來隨著大數據的發展,需要一個自動化,可易於編排工作任務和監控流程的工具,Apache Airflow 是一款受歡迎的開源工作流程管理平台,它被設計用於自動化、安排和監控工作流程,常與其他大數據工具和框架結合運用,以構建和管理複雜的資料工作流程,像是管理資料管線(Data Pipeline)中的各個步驟,用於構建、排程和管理各種類型的工作流程和監控,包括ETL資料梳理和整合管理訊息佇列工具。然而在不同系統環境中,快速部署系統和系統開發,儼然促進容器技術的應用和發展,像是Docker的應用、部署和維運,能夠更有效的節省開發人員進行部署和減輕維運人員在系統環境的維運工作。本研究運用Apache Airflow蒐集Microsoft SQL Server資料庫排程中執行預存程序(Stored Procedure)用於ETL作業的元資料日誌,解析日誌以擷取預存程序執行的錯誤原因後,並將其與現有資料來源的記錄資料整合,發送警報郵件給開發人員和維運團隊。;In the era of global big data, effectively utilizing big data has become a crucial resource, offering significant competitive advantages for enterprises. Before analysis, data processing involves collecting, transforming, and preprocessing data from various sources. ETL (Extract, Transform, Load) methods organize this process, but increasing data heterogeneity and volume complicate maintenance and management, which are vital for reducing costs and boosting productivity. Apache Airflow is a popular open-source workflow management platform, addresses these needs by automating, scheduling, and monitoring workflows, often integrated with other big data tools to manage complex data pipelines. The rapid deployment and development of systems have further driven the adoption of container technologies like Docker, enhancing efficiency for developers and operations engineers. This study uses Apache Airflow to collect the scheduled execution metadata logs of stored procedures for ETL jobs in Microsoft SQL Server. It parses the logs to extract error reason of stored procedures execution, integrates this information with existing data source records to sends alert emails to developers and operations team.
    Appears in Collections:[Executive Master of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML4View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明