博碩士論文 111552005 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:52 、訪客IP:3.144.250.72
姓名 余忠訓(Zhong-Xun Yu)  查詢紙本館藏   畢業系所 資訊工程學系在職專班
論文名稱 基於Apache Airflow工作流程管理之自動化資料管線監控機制
(Automated Data Pipeline Monitoring Mechanism Based on Apache Airflow Workflow Management)
相關論文
★ 以伸展樹為基礎的Android Binder Driver★ 應用增量式學習於多種農作物判釋之研究
★ 應用分類重建學習偵測航照圖幅中的新穎坵塊★ 用於輔助工業零件辨識之尺寸估算系統
★ 使用無紋理之3D CAD工業零件模型結合長度檢測實現細粒度真實工業零件影像分類★ 一個建立在平行工作系統上的動態全球計算平台
★ 用權重參照計數演算法執行主動物件垃圾收集★ 一個動態負載平衡之最大可能性估算計算架構
★ 利用多項系統負載資訊進行動態P2P系統重組的策略研究★ 基於Hadoop系統的雲端應用程式特徵擷取與計算監測架構
★ 適用於大型動態分散式系統的調適性計算模型★ 一個提供彈性虛擬資料中心的雲端服務平台
★ 雲端彈性虛擬機房服務平台之資源控管中心★ 一個適用於自動供應雲端系統的動態調適計算架構
★ 線性相關工作與非相關工作的探索式排程策略★ 適用於大資料集高效率的分散式階層分群演算法
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 在全球大數據時代的趨勢下,如何運用大數據已成為最重要的企業資源之一,儼然成為企業的優勢和競爭力,大數據進行分析之前的資料處理,需要從不同資料源的蒐集資料,轉換資料型別到一連串進行資料前處理、聚合、轉換和過濾,在資料流處理過程中,ETL是一種資料擷取(Extract)、轉換(Transform)和載入(Load)的資料梳理作法,當資料源的異質性和資料量逐漸變大,整體資料流的維護和管理問題,將成為企業降低成本和提升生產力效率的關鍵。近年來隨著大數據的發展,需要一個自動化,可易於編排工作任務和監控流程的工具,Apache Airflow 是一款受歡迎的開源工作流程管理平台,它被設計用於自動化、安排和監控工作流程,常與其他大數據工具和框架結合運用,以構建和管理複雜的資料工作流程,像是管理資料管線(Data Pipeline)中的各個步驟,用於構建、排程和管理各種類型的工作流程和監控,包括ETL資料梳理和整合管理訊息佇列工具。然而在不同系統環境中,快速部署系統和系統開發,儼然促進容器技術的應用和發展,像是Docker的應用、部署和維運,能夠更有效的節省開發人員進行部署和減輕維運人員在系統環境的維運工作。本研究運用Apache Airflow蒐集Microsoft SQL Server資料庫排程中執行預存程序(Stored Procedure)用於ETL作業的元資料日誌,解析日誌以擷取預存程序執行的錯誤原因後,並將其與現有資料來源的記錄資料整合,發送警報郵件給開發人員和維運團隊。
摘要(英) In the era of global big data, effectively utilizing big data has become a crucial resource, offering significant competitive advantages for enterprises. Before analysis, data processing involves collecting, transforming, and preprocessing data from various sources. ETL (Extract, Transform, Load) methods organize this process, but increasing data heterogeneity and volume complicate maintenance and management, which are vital for reducing costs and boosting productivity. Apache Airflow is a popular open-source workflow management platform, addresses these needs by automating, scheduling, and monitoring workflows, often integrated with other big data tools to manage complex data pipelines. The rapid deployment and development of systems have further driven the adoption of container technologies like Docker, enhancing efficiency for developers and operations engineers. This study uses Apache Airflow to collect the scheduled execution metadata logs of stored procedures for ETL jobs in Microsoft SQL Server. It parses the logs to extract error reason of stored procedures execution, integrates this information with existing data source records to sends alert emails to developers and operations team.
關鍵字(中) ★ 工作流程管理
★ 資料管線
★ 監控
★ Apache Airflow
★ Microsoft SQL Server
★ 預存程序
關鍵字(英) ★ Workflow Management
★ Data Pipeline
★ Monitoring
★ Apache Airflow
★ Microsoft SQL Server
★ Stored Procedure
論文目次 摘要 i
Abstract ii
目錄 iii
圖目錄 iv
表目錄 v
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 研究貢獻 3
1.4 論文架構 3
第二章 背景知識 4
2.1 基礎知識介紹 4
2.1.1 Apache Airflow 4
2.1.2 Docker 7
2.1.3 Microsoft SQL Server 8
2.1.4 SQL Server Agent 11
2.1.5 Stored Procedure 12
2.1.6 SMTP 13
2.2 基於Airflow系統的ETL Pipeline相關技術研究 14
2.3 商業軟體監控ETL Pipeline 15
第三章 系統設計與流程設計 17
3.1 系統架構 17
3.2 虛擬機環境設置 20
3.3 SQL Server資料庫設計 23
3.4 Airflow工作流程管理設計 26
3.5 資料管線監控結果 29
第四章 案例研究 33
4.1 資料管線監控導入比較 33
第五章 結論與未來研究方向 36
參考文獻 38
參考文獻 [1] N. Saranya, R. Brindha, N. Aishwariya, R. Kokila, P. Matheswaran, and P. Poongavi, "Data migration using etl workflow," in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, IEEE, 2021, pp. 1661-1664.
[2] S. Haines, "Workflow orchestration with apache airflow," in Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications, S. Haines, Ed., Berkeley, CA: Apress, 2022, pp. 255-295.
[3] C. Pahl, A. Brogi, J. Soldani, and P. Jamshidi, "Cloud container technologies: a state-of-the-art review," IEEE Transactions on Cloud Computing, vol. 7, no. 3, pp. 677-692, 2017.
[4] M. A. Rodriguez and R. Buyya, "Container-based cluster orchestration systems: A taxonomy and future directions," Software: Practice and Experience, vol. 49, no. 5, pp. 698-719, 2019.
[5] I. M. AL. Jawarneh, P. Bellavista, F. Bosi, L. Foschini, G. Martuscelli, R. Montanari and A. Palopoli, "Container orchestration engines: A thorough functional and performance comparison," in International Conference on Communications (ICC), IEEE, 2019, pp. 1-6.
[6] G. Ambrosino, G. B. Fioccola, R. Canonico, and G. Ventre, "Container mapping and its impact on performance in containerized cloud environments," in 2020 IEEE International Conference on Service Oriented Systems Engineering (SOSE), IEEE, 2020, pp. 57-64.
[7] "Docker Image for Apache Airflow," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/docker-stack/index.html (accessed Apr. 2024).
[8] "Apache Airflow - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Apache_Airflow (accessed Apr. 2024).
[9] "Project – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/project.html (accessed Apr. 2024).
[10] B. Maxime, "Airbnb Engineering & Data Science for Airflow," Airbnb, Oct. 2014. [Online]. Available: https://airbnb.io/projects/airflow/ (accessed Apr. 2024).
[11] B. Maxime, "Airflow: a workflow management platform," Medium, Jun. 2, 2015. [Online]. Available: https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8 (accessed Apr. 2024).
[12] "Connections & Hooks – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html (accessed Apr. 2024).
[13] "Operators – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html (accessed Apr. 2024).
[14] "Executor – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html (accessed Apr. 2024).
[15] "Celery Executor – apache-airflow-providers-celery Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow-providers-celery/stable/celery_executor.html (accessed Apr. 2024).
[16] "Redis - The Real-time Data Platform," Redis. [Online]. Available: https://redis.io/ (accessed Apr. 2024).
[17] "Redis - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Redis (accessed Apr. 2024).
[18] "Concepts – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/2.0.2/concepts.html (accessed Apr. 2024).
[19] "Scheduler – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html (accessed Apr. 2024).
[20] "Set up a Database Backend – Airflow Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html (accessed Apr. 2024).
[21] "Overview – SQLAlchemy 1.4 Documentation," SQLAlchemy, Mar. 2024. [Online]. Available: https://docs.sqlalchemy.org/en/14/intro.html (accessed Apr. 2024).
[22] "PostgreSQL: The world′s most advanced open source database," PostgreSQL. [Online]. Available: https://www.postgresql.org/ (accessed Apr. 2024).
[23] C. Anderson, "Docker [software engineering]," IEEE Software, vol. 32, no. 3: 102-c3, 2015.
[24] "Docker (software) - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Docker_(software) (accessed Apr. 2024).
[25] A. M. Potdar, D. G. Narayan, S. Kengond and M. M. Mulla, "Performance evaluation of docker container and virtual machine," Procedia Computer Science, vol. 171, pp. 1419-1428, 2020.
[26] K. Atul, "Docker Architecture: A Complete Docker Introduction," K21Academy, Jul. 2024. [Online]. Available: https://k21academy.com/docker-kubernetes/docker-architecture-docker-engine-components-container-lifecycle/ (accessed Apr. 2024).
[27] "Docker Compose overview," Docker Docs. [Online]. Available: https://docs.docker.com/compose/ (accessed Apr. 2024).
[28] "Overview of docker compose CLI," Docker Docs. [Online]. Available: https://docs.docker.com/compose/reference/ (accessed Apr. 2024).
[29] G. Lukasz, "Orchestrate Containers for Development with Docker Compose," CloudBees, May 2015. [Online]. Available: https://www.cloudbees.com/blog/orchestrate-containers-for-development-with-docker-compose (accessed Apr. 2024).
[30] P. Aanand, "Release 1.0.0 | docker/compose," GitHub, Oct. 2014. [Online]. Available: https://github.com/docker/compose/releases/tag/1.0.0 (accessed Apr. 2024).
[31] R. Mistry and S. Misner, Introducing Microsoft SQL Server 2014. Redmond, WA, USA: Microsoft Press, 2014.
[32] "Microsoft SQL Server - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Microsoft_SQL_Server (accessed Apr. 2024).
[33] B. Fiona, "SQL Server Architecture (Explained)," Guru99, Mar. 2024. [Online]. Available: https://www.guru99.com/sql-server-architecture.html (accessed Apr. 2024).
[34] T. Ankush, "Components of the SQL Server Architecture," LinkedIn, Jan. 2024. [Online]. Available: https://www.linkedin.com/pulse/components-sql-server-architecture-ankush-thavali-yxhrf/ (accessed Apr. 2024).
[35] "Download SQL Server Management Studio (SSMS) - SQL Server Management Studio (SSMS)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-ver16 (accessed Apr. 2024).
[36] "SSMS Query Editor - SQL Server Management Studio (SSMS)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/ssms/f1-help/database-engine-query-editor-sql-server-management-studio?view=sql-server-ver16 (accessed Apr. 2024).
[37] "SQL Server Agent," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent?view=sql-server-ver16 (accessed Apr. 2024).
[38] "Stored procedure - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Stored_procedure (accessed Apr. 2024).
[39] Skilleavor, "What is SMTP (Simple Mail Transfer Protocol)?," Medium, Mar. 2023. [Online]. Available: https://medium.com/@skilleavor/what-is-smtp-simple-mail-transfer-protocol-43a70caf6418 (accessed Apr. 2024).
[40] J. Klensin, N. Freed, M. Rose, E. Stefferud and D. Crocker, "SMTP service extensions," Technical report, RFC 2846, Nov. 1995.
[41] V. V. Riabov, SMTP (simple mail transfer protocol). River College, 2005.
[42] P. Hoffman, "SMTP service extension for secure SMTP over transport layer security," Internet Engineering Task Force (IETF), RFC 3207, Feb. 2002.
[43] "SMTPS - Wikipedia," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/SMTPS (accessed Apr. 2024).
[44] "Simple Mail Transfer Protocol (SMTP)," Board Infinity. [Online]. Available: https://www.boardinfinity.com/blog/everything-about-simple-mail-transfer-protocol-smtp/ (accessed Apr. 2024).
[45] R. Mitchell, L. Pottier, S. Jacobs, R. F. da Silva, M. Rynge, K. Vahi and E. Deelman, "Exploration of workflow management systems emerging features from users perspectives," in 2019 IEEE International Conference on Big Data (Big Data), IEEE, 2019, pp. 4537-4544.
[46] M. Luís, C. Vaz and A. P. Francisco, "FLOWViZ: An airflow based workflow middleware for computational phylogenetics," Preprints 2023, 2023101211, doi: https://doi.org/10.20944/preprints202310.1211.v1.
[47] A. Suleykin and P. Panfilov, "Metadata-driven industrial-grade etl system," in 2020 IEEE International Conference on Big Data (Big Data), IEEE, 2020, pp. 2433-2442.
[48] L. Finnigan and E. Toner, "Building and maintaining metadata aggregation workflows using apache airflow," Code4Lib Journal, vol. 52, 2021.
[49] "SSIS How to Create an ETL Package," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/integration-services/ssis-how-to-create-an-etl-package?view=sql-server-ver16 (accessed Apr. 2024).
[50] "Monitor SSIS operations with Azure Monitor," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/azure/data-factory/monitor-ssis (accessed Apr. 2024).
[51] "SQL Server Profiler," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/sql-server-profiler?view=sql-server-ver16 (accessed Apr. 2024).
[52] G. Joe, "Anatomy of a SQL Agent Email Notification," MSSQLTips. [Online]. Available: https://www.mssqltips.com/sqlservertip/5533/anatomy-of-a-sql-agent-email-notification/ (accessed Apr. 2024).
[53] "MsSqlOperator – apache-airflow-providers-microsoft-mssql Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-mssql/stable/operators.html (accessed May 2024).
[54] "dbo.sysjobhistory (Transact-SQL)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysjobhistory-transact-sql?view=sql-server-ver16 (accessed May 2024).
[55] "dbo.sysjobs (Transact-SQL)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysjobs-transact-sql?view=sql-server-ver16 (accessed May 2024).
[56] "dbo.sysjobsteps (Transact-SQL)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysjobsteps-transact-sql?view=sql-server-ver16 (accessed May 2024).
[57] "msdb Database," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/relational-databases/databases/msdb-database?view=sql-server-ver16 (accessed Jun. 2024).
[58] "SQL Server Agent Tables (Transact-SQL)," Microsoft Learn. [Online]. Available: https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/sql-server-agent-tables-transact-sql?view=sql-server-ver16 (accessed Jun. 2024).
[59] "Apache Kafka Operators – apache-airflow-providers-apache-kafka Documentation," Apache Airflow. [Online]. Available: https://airflow.apache.org/docs/apache-airflow-providers-apache-kafka/stable/operators/index.html (accessed Jun. 2024).
指導教授 王尉任(Wei-Jen Wang) 審核日期 2024-7-26
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明