dc.description.abstract | According to international research reports, to enhance corporate competitiveness, businesses can actively promote digital transformation within the company. In the case of the telecommunications industry that students are engaged in, digital transformation helps increase the number of customers and reduce unit costs. In the past two years, the company has been vigorously promoting the implementation of technologies such as cloud (Azure), AI, and big data (Databricks platform). Our unit′s specific approach is to fully migrate warehouse data to the cloud. Compared to the on-premises data warehouse system architecture (Teradata), which has high machine rental/license usage fees and lacks flexible capacity expansion, the cloud architecture offers advantages such as easy deployment and scalability according to demand.
During the implementation process, it is necessary to learn and develop new ETL Jobs based on the selected PaaS platform (Databricks) and redesign the job scheduling according to cloud characteristics. It is estimated that thousands of ETL Jobs will be run daily, which need to be completed within the company′s established KPI requirements. The ultimate goal is to optimize cost-effectiveness, making cloud job scheduling a highly challenging topic. During the scheduling process, considering the platform′s characteristics, the best solution must be found by focusing on multiple factors such as resource allocation (VM) and job priority (Job).
To solve this problem, this study evaluated different combination methods, followed by experiments and comparisons, hoping to collect the performance of different combination methods under various scenarios and find the most suitable method for optimizing job scheduling. The experiments were divided into four parts:
Experiment 1: Custom CNN validated different datasets with different matrix sizes.
Experiment 2: Evaluated three different DNNs (CNN, ResNet-18, MobileNet) on the same dataset.
Experiment 3: ResNet-18 validated different datasets with different matrix sizes.
Experiment 4: Added Dueling DQN, validating different datasets with different matrix sizes.
By using Job and Cluster′s resource-related feature values, this approach can reduce the complexity of the state space, making the deep reinforcement learning algorithm in the second stage more efficient and stable. The DDDQN (Dueling Double Deep Q Network) algorithm is a type of reinforcement learning. The agent uses end-to-end reinforcement learning to learn successful strategies directly from high-dimensional sensory inputs, combining deep convolutional networks with reinforcement learning architecture. In traditional job scheduling problems, heuristic algorithms or rule-based methods are often used for optimization. These methods often require manual design of decision rules and cannot adapt to the ever-changing production environment. | en_US |