探究強化學習與停止策略於活動來源頁面探勘之設計;On the Design of RL Algorithms and Termination Strategies for Focused Crawling - A case study for Event Source Page Discovery

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/93312

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/93312

题名:	探究強化學習與停止策略於活動來源頁面探勘之設計;On the Design of RL Algorithms and Termination Strategies for Focused Crawling - A case study for Event Source Page Discovery
作者:	葉庭;Yeh, Ting
贡献者:	資訊工程學系
关键词:	強化式學習;網頁探勘;多任務學習;Reinforcement learning;web mining;Multitask Learning
日期:	2023-07-26
上传时间:	2024-09-19 16:53:19 (UTC+8)
出版者:	國立中央大學
摘要:	本研究旨在開發一個智能的爬蟲系統，以收集活動來源網頁的資訊。我們的目標是希望能節省使用者在瀏覽器尋找活動的時間，並提供結構化的活動資訊，以滿足現代人尋找當地特色活動的需求。在我們先前的工作中，我們使用了基於強化學習的策略梯度方法進行活動源網頁的挖掘。然而，我們發現兩階段訓練存在兩個問題：第一階段僅能使用固定步伐進行訓練，第二階段的微調訓練效能沒有顯著提升。為了改進這些問題，我們希望在初始訓練階段就能通過可變動步伐的方式控制回合的停止。這樣能夠提供更靈活的訓練，以適應不同的場景和環境變化，並改善模型的性能和結果。為了實現這一目標，我們設計了資產控制的停止策略，並且採用不同的強化式學習演算法。同時，我們將原本的兩階段訓練框架定義得更加嚴謹，將訓練策略擴展為四種不同的方法。通過與先前的工作進行比較，我們想要確定新設計的停止策略是否能夠降低點擊成本，並且確定在應用不同的強化學習算法後，選擇最適合我們任務的方法。最終，我們也想選擇最適合我們任務的訓練策略。結果顯示，我們新設計的停止策略在DQN算法中實現了更低的點擊成本和更高的性能。點擊成本從1.4%降低到1.2%，性能從72%提升到78.2%。比較不同的訓練策略後，我們得出結論，通過使用標記數據和給予正確答案的獎勵函數對於我們的任務更加適合。;The purpose of this study is to develop an intelligent web crawler system that collects information from activity source web pages. Our objective is to save users′ time in browsing for activities and provide structured activity information to fulfill the modern demand for finding local distinctive activities. In our previous work, we utilized a reinforcement learning-based policy gradient method for activity source web page mining. However, we identified two issues with the two-stage training process: firstly, the first stage only allowed training with a fixed step size, and secondly, the fine-tuning in the second stage did not have significant improvement. To address these problems, we aim to control episode termination using a variable step size during the initial training phase. This approach would provide more flexibility in training to adapt to different scenarios and environmental changes, thereby improving the performance and outcomes of the model. To achieve this goal, we introduced an asset control stopping strategy and employed different reinforcement learning algorithms. Moreover, we redefined the original two-stage training framework, expanding the training strategies to four different methods. By comparing the results with our previous work, we aimed to determine if the newly designed stopping strategy could reduce click costs and identify the most suitable method after applying various reinforcement learning algorithms. Ultimately, we aimed to select the training strategy that best suited our task. The results showed that our newly designed stopping strategy achieved lower click costs and higher performance in the DQN algorithm. The click cost decreased from 1.4% to 1.227%, and the performance improved from 72% to 78.2%. After comparing different training strategies, we concluded that using labeled data and a reward function that incorporates correct answers is more suitable for our task.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	80	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....