基於深度學習之結合全局及局部資訊和修復分割細節的語義分割方法;Global and Local context and Coarse to Fine Semantic Segmentation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/81080

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/81080

Title:	基於深度學習之結合全局及局部資訊和修復分割細節的語義分割方法;Global and Local context and Coarse to Fine Semantic Segmentation
Authors:	邱亦成;Chiu, Yi-Cheng
Contributors:	資訊工程學系
Keywords:	深度學習;語義分割;卷積神經網路;deep learning;semantic segmentation;convolutional neural network
Date:	2019-07-15
Issue Date:	2019-09-03 15:33:30 (UTC+8)
Publisher:	國立中央大學
Abstract:	圖像語義分割的問題在計算機視覺和人工智慧是非常熱門的議題。影像分割的訓練資料集的產生，也非常耗費時間及人力，訓練高精確度的影像分割結果來減輕資料產出的成本，也是本論文的目標。最近對基於深度學習的語義分割研究中，為了能即時在道路上運行和GPU卡的容量限制，通常會採取下採樣的操作，導致場景中的細節丟失。我們在論文中探討各個著名的語義分割架構所提出的方法，從自編碼到專注力模型分析其貢獻及優缺點，此外我們也修改其網路架構，提出由二個模組所組成的JCF架構，其中一個模組從高分辨率圖像中取得細節資訊，透過最後通道權重結合二特徵圖使原來的分割結果更加精細。而我們最終所提出的網路架構GLNet，結合全域專注力資訊和局部的多尺度上下文資訊，幫助模型理解各種場景之間物體的關係，減少分類的錯誤，並透過通道權重模組，引入卷積神經網路前層的資訊來修補分割物件的邊界和細節部分，而我們提出的架構和目前幾個著名的方法相比得到了改進。 ;The issue of image semantic segmentation is renowned within computer vision and artificial intelligence. The ground truth in image segmentation is hard to produce and is time- and resource-intensive. It is also the goal of this paper to produce high-precision image segmentation results to reduce the cost of ground truth data output. Recently, in the research of semantic segmentation based on deep learning, in order to be able to run in real-time and limit the capacity of the GPU card, has reduced image resolution through downsampling operation, resulting in detail loss in the scene. In the paper, we explore the famous semantic segmentation architecture, from autoencoder to attention model to analyze its contribution, advantages and disadvantages. In addition, we also modify its network architecture, and propose a JCF architecture consisting of two modules. One module obtains detailed information from high-resolution images, and combines two feature map with the channel weights to make the segmentation result from coarse to fine. Our proposed network architecture, combined with global spatial information and local multi-scale context information, helps the model understand the relationship between objects between various scenes, reduces false alarm, and repair the boundaries and details of the segmented object through channel attention modules. The experiments of our proposed architecture is improved compared to state-of-the-art methods.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	131	View/Open

社群 sharing

Loading...