結合自我注意力模塊的多尺度特徵融合網路用於場景文字偵測;Multi-Scale Feature Fusion Network Combined with Self-Attention Module for Scene Text Detection

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/89831

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89831

題名:	結合自我注意力模塊的多尺度特徵融合網路用於場景文字偵測;Multi-Scale Feature Fusion Network Combined with Self-Attention Module for Scene Text Detection
作者:	何立群;Ho, Li-Chun
貢獻者:	資訊工程學系
關鍵詞:	自我注意力模塊;多尺度網路;場景文字偵測;Self-Attention Module;Multi-Scale Network;Scene Text Detection
日期:	2022-07-28
上傳時間:	2022-10-04 12:01:27 (UTC+8)
出版者:	國立中央大學
摘要:	場景文字偵測的研究在近年來有突破性的發展，並且有著許多不同的應用，例如文件文字偵測及停車場的車牌辨識。但是，對於像是招牌、告示牌等任意形狀的場景文字偵測依然存在著許多問題，例如，許多方法沒辦法將彎曲的文字完整的標示出來，也無法有效的分開相鄰的文字。因此，我們提出了一個更有效的模型，它可以更有效的融合及利用特徵，並偵測出任意形狀的場景文字。我們是基於文字的中心區域進行預測，並透過後處理將預測出的機率圖進行擴張，得到整個文字區域的結果。我們提出Multi-Scale Feature Fusion Network以更有效的萃取及融合特徵，其中包含了結合Self-Attention Module (SAM)的Multi-Scale Attention Module (MSAM)，可以更有效的精煉特徵，最後由Self-Attention Head (SAH)預測文字機率圖。本文透過實驗證實了此方法的效果，在Total-Text數據集上得到87.4分的F-score。;The research on scene text detection has made breakthroughs in recent years and has many different applications, such as document text detection and license plate recognition in parking lots. However, there are still many problems in scene text detection with arbitrary shapes such as signboards and billboards. For example, many methods cannot mark curved text fully, nor can they effectively separate adjacent text. Therefore, we propose a more efficient model, which can more effectively fuse and utilize features and detect scene texts of arbitrary shapes. In this paper, the result is predicted based on the central area of the text, and the predicted probability map is expanded through post-processing to obtain the result of the entire text area. We propose a Multi-Scale Feature Fusion Network to extract and fuse features more effectively, including Multi-Scale Attention Modules (MSAMs) combined with Self-Attention Modules (SAMs), which can refine features more effectively. Finally, Self-Attention Head (SAH) predicts the text probability map. We confirm the effect of this method through experiments and achieve F-score of 87.4 on the Total-Text dataset.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	45	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....