結合語義分割特徵與注意力模型之室內場景分類系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：3.145.17.55

姓名

黃健銘(Jian-Ming Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

結合語義分割特徵與注意力模型之室內場景分類系統
(Indoor Scene Image Classification System combining Semantic Segmentation Features and Attention Module)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-14以後開放)

摘要(中)

場景辨識是電腦視覺中重要的一個環節，現今機器學習的方法效能遠遠高於傳統處理的方式，然而，直接使用神經網路進行分類往往會遺失物體、空間佈局、和背景之間關聯的資訊，導致分類效果不佳。因此抓取出物體、空間佈局、和背景之間關聯的資訊，並使用有效的方式將這些資訊、特徵與原圖結合進行分類，是目前場景分類中重要的挑戰。
本論文提出的方法，對影像做語義分割，並將語義分割影像與原圖影像分別使用神經網路模型提取特徵，將語義分割特徵使用注意力模型與原圖特徵進行特徵融合，最後進行分類、辨識。
實驗結果證明，在我們收集的旅館室內場景資料集中，準確率能達到最好的效果。在公開15-Scene資料集中，比較其他論文方法，我們方法的效果可以取得更好的分類準確度。因此，透過使用語義分割的方式，能夠抓取到物體、空間佈局和背景之間關聯的資訊，並使用注意力模型進行特徵融合，能在場景辨識中取得更好的辨識效果。

摘要(英)

Scene recognition is an important part of computer vision. The efficiency of current machine learning methods is much better than traditional processing methods. However, using neural networks directly for classification often loses more information of objects, spatial layout, and background. Resulting in poor classification. Therefore, it is an important challenge in scene classification to capture the information of objects, spatial layout, and background, and use an effective method to merge these features to classify scene.
The method proposed in this paper performs semantic segmentation on the image. Use Neural network model to extract the features of the semantic segmentation image and original image respectively. And then, use the attention module to fuse the semantic segmentation features with original image features. Finally, according to these fused features to classify images.
The experiment results show that our method can achieve the best result on the Hotel Indoor Scene dataset. Furthermore, in the public 15-Scene dataset, our method can outperform existing methods. Therefore, by using semantic segmentation, the information of objects, spatial layout and background can be captured. Using the attention module to do feature fusion can achieve better accuracy in scene recognition.

關鍵字(中)

★ 場景辨識
★ 語義分割
★ 注意力模型
★ 特徵融合

關鍵字(英)

論文目次

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章緒論 1
1.1 研究背景與動機 1
1.2 論文架構 3
第二章相關研究方法 4
2.1 圖像語義分割 4
2.1.1 UPerNet 4
2.1.2 Mask R-CNN 6
2.2 場景物件提取 8
2.3 特徵提取神經網路架構 10
第三章研究方法 12
3.1旅館室內場景資料庫蒐集 12
3.2系統架構流程 14
3.3原圖特徵分支 14
3.4分割特徵分支 15
3.4.1語義分割前處理 15
3.4.1.1使用Mask R-CNN進行語義分割 15
3.4.1.2使用UPerNet進行語義分割 20
3.4.2物件分割前處理 22
3.4.2.1使用Mask R-CNN進行物件分割 22
3.4.2.2使用UPerNet進行物件分割 25
3.5 特徵融合 27
3.5.1原圖特徵與語義分割特徵融合 27
3.5.2原圖特徵與物件分割特徵融合 28
3.6 系統介面與功能 30
第四章實驗結果 32
4.1資料庫 32
4.2 實驗環境與參數設置 33
4.3 實驗數據與分析 33
4.3.1 不使用特徵融合 33
4.3.2 旅館室內場景資料集使用特徵融合 35
4.3.2.1 使用語義分割特徵融合 36
4.3.2.2 使用物件分割特徵融合 39
4.3.2.3 同時使用語義分割特徵與物件分割特徵融合 41
4.3.3 15-Scene資料集使用特徵融合 42
4.3.3.1使用語義分割特徵融合 42
4.3.3.2使用物件分割特徵融合 43
4.3.3.3同時使用語義分割特徵與物件分割特徵融合 44
4.3.4 實驗結果 45
4.3.4.1 旅館室內場景資料集實驗結果 45
4.3.4.2 15-Scene資料集實驗結果 49
4.3.4.3 程式執行時間 54
第五章結論與未來研究方向 55
參考文獻 56

參考文獻

[1] Krizhevsky A, Sutskever I, Hinton G, "Imagenet classification with deep convolutional neural networks", 2012.
[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., anhoucke, V., and Rabinovich, A. "Going deeper with convolutions", 2014.
[3] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. "Places: A 10 million Image Database for Scene Recognition", 2017
[4] Shuang Bai 1 ꞏZhaohong Li 1 ꞏJianjun Hou. "Learning two-pathway convolutional neural networks for categorizing scene images", 2016
[5] Luis Herranz, Shuqiang Jiang, Xiangyang Li. "Scene recognition with CNNs: objects, scales and dataset bias", 2018
[6] Xiaojuan Cheng, Jiwen Lu, Jianjiang Feng, Bo Yuan, Jie Zhou. "Scene recognition with objectness", 2018
[7] Alejandro Lopez-Cifuentes, Marcos Escudero-Vinolo, Jesus Bescos, Alvaro Garcia-Martin. "Semantic-Aware Scene Recognition", 2019
[8] Long, J., Shelhamer, E., and Darrell, T. "Fully convolutional networks for semantic segmentation.", 2014.
[9] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla. "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", 2015.
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", 2014

[11] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", 2016
[12] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. "Rethinking Atrous Convolution for Semantic Image Segmentation", 2017
[13] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. " Pyramid Scene Parsing Network", 2016
[14] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun. "Unified Perceptual Parsing for Scene Understanding.", 2018.
[15] Kaiming He Georgia Gkioxari Piotr Doll ́ar Ross Girshick, "Mask R-CNN", 2018.
[16] T.-Y. Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie. "Feature pyramid networks for object detection." In CVPR, 2017.
[17] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba, "Semantic Understanding of Scenes through the ADE20K Dataset", 2018.
[18] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár, "Panoptic Segmentation", 2019.
[19] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", 2014.
[20] R. B. Girshick, "Fast R-CNN", 2015.
[21] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 2016.
[22] Lowe DG, "Distinctive image features from scale-invariant keypoints. ", 2004.
[23] Dalal N, Triggs B, "Histograms of oriented gradients for human detection", 2005.
[24] Bay H, Tuytelaars T, Gool LV, "Surf: speeded up robust features", 2006.
[25] Cortes, Corinna, and Vladimir VAPNIK. "Support-vector networks.", 1995.
[26] K. He, X. Zhang, S. Ren, and J. Sun. "Deep residual learning for image recognition. ", 2016.
[27] Simonyan, K. & Zisserman, A. "Very deep convolutional networks for largescale image recognition", 2014.
[28] A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope", IJCV, 2001.
[29] L. Fei-Fei and P. Perona, "A bayesian hierarchical model for learning natural scene categories", CVPR, 2005.
[30] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories", CVPR, 2006.
[31] N. Rasiwasia , N. Vasconcelos. "Holistic context models for visual recognition", TPAMI 34 (5) (2012) 902–917 .
[32] L.J. Li , H. Su , Y. Lim. "Object bank: an object-level image representation for high-level visual recognition", IJCV 107 (1) (2014) 20–39 .
[33] L. Bo , X. Ren , D. Fox. "Kernel descriptors for visual recognition" NIPS, 2010, pp. 244–252 .
[34] R. Kwitt , N. Vasconcelos , N. Rasiwasia. "Scene recognition on the semantic manifold", ECCV, 2012, pp. 359–372 .
[35] H.O. Song , R. Girshick , S. Zickler. "Generalized sparselet models for real-time multiclass object recognition", TPAMI 37 (5) (2015) 1001–1012 .
[36] L. Zhang , X. Zhen , L. Shao. "Learning object-to-class kernels for scene classification", TIP 23 (8) (2014) 3241–3253 .

指導教授

鄭旭詠

審核日期

2020-7-21

推文