場景辨識是電腦視覺中重要的一個環節,現今機器學習的方法效能遠遠高於傳統處理的方式,然而,直接使用神經網路進行分類往往會遺失物體、空間佈局、和背景之間關聯的資訊,導致分類效果不佳。因此抓取出物體、空間佈局、和背景之間關聯的資訊,並使用有效的方式將這些資訊、特徵與原圖結合進行分類,是目前場景分類中重要的挑戰。 本論文提出的方法,對影像做語義分割,並將語義分割影像與原圖影像分別使用神經網路模型提取特徵,將語義分割特徵使用注意力模型與原圖特徵進行特徵融合,最後進行分類、辨識。 實驗結果證明,在我們收集的旅館室內場景資料集中,準確率能達到最好的效果。在公開15-Scene資料集中,比較其他論文方法,我們方法的效果可以取得更好的分類準確度。因此,透過使用語義分割的方式,能夠抓取到物體、空間佈局和背景之間關聯的資訊,並使用注意力模型進行特徵融合,能在場景辨識中取得更好的辨識效果。 ;Scene recognition is an important part of computer vision. The efficiency of current machine learning methods is much better than traditional processing methods. However, using neural networks directly for classification often loses more information of objects, spatial layout, and background. Resulting in poor classification. Therefore, it is an important challenge in scene classification to capture the information of objects, spatial layout, and background, and use an effective method to merge these features to classify scene. The method proposed in this paper performs semantic segmentation on the image. Use Neural network model to extract the features of the semantic segmentation image and original image respectively. And then, use the attention module to fuse the semantic segmentation features with original image features. Finally, according to these fused features to classify images. The experiment results show that our method can achieve the best result on the Hotel Indoor Scene dataset. Furthermore, in the public 15-Scene dataset, our method can outperform existing methods. Therefore, by using semantic segmentation, the information of objects, spatial layout and background can be captured. Using the attention module to do feature fusion can achieve better accuracy in scene recognition.