Channel Spatial Attention-based Transformer for Image Super-Resolution

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：103

、訪客IP：3.146.105.212

姓名

陳家偉(CHIA-WEI CHEN) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(Channel Spatial Attention-based Transformer for Image Super-Resolution)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法	★ 車用WiFi網路中的碰撞分析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-1-24以後開放)

摘要(中)

隨著影音媒體需求的不斷增長，超解析度領域的重要性日
益提升。特別是Transformer模型因其卓越的性能而在電腦視
覺方面受到廣泛關注，導致越來越多的研究將其應用於這一領
域。然而，我們發現儘管Transformer通過增加不同機制的注
意力能夠解決學習特徵有限的問題，但是在訓練過程中仍可能
遺失一些紋理和結構。為了盡可能地保留初始特徵和架構，我
們提出了一種方式用以整合Residual Connection、Attention
Mechanism和Upscaling Technique。為了驗證我們方法的性能，
我們在五個不同的資料集上進行了多次實驗，並且與現有的先
進超解析度模型進行了比較。實驗結果顯示，我們的方法在性
能上相較於當前領域中最先進的模型有著更佳的表現。

摘要(英)

As the demand for audio-visual media continues to grow, the significance of the
super-resolution field is increasingly recognized. In particular, Transformer models
have garnered widespread attention in the realm of computer vision due to their
exceptional performance, leading to their growing application in this area. However, we observed that despite the ability of Transformer to address the issue of
limited feature learning through various attention mechanisms, some textures and
structures may be lost during the learning process. To maximally preserve the
initial features and structures, we propose a system, named Integrated Attention
Transformer (IAT), that integrates Residual Connection, Attention Mechanism, and
Upscaling Technique. To confirm the efficacy of IAT, experiments were conducted
on five different datasets, compared with the current advanced super-resolution
state-of-the-art (SOTA) model. The results show that the proposed IAT surpasses
the current SOTA model.

關鍵字(中)

★ 超解析度

關鍵字(英)

★ Super Resolution
★ Transformer

論文目次

Contents
1 Introduction 1
2 Related Work 4
2.1 CNN-based Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Attention-based Super Resolution . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Transformer-based Super Resolution . . . . . . . . . . . . . . . . . . . . . 6
2.4 GAN-based Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Preliminary 9
3.1 Image Restoration Using Swin Transformer . . . . . . . . . . . . . . . . . . 9
3.2 Hybrid Attention Transformer . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Hybrid Attention Block . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Overlapping Cross-Attention Block . . . . . . . . . . . . . . . . . . 12
3.3 Upscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Channel-Spatial Attention Mechanism . . . . . . . . . . . . . . . . . . . . 14
4 Design 16
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Proposed System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.1 Shallow Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 19
4.4.2 Deep Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.3 Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iii
5 Performance 24
5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Environmental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . . . 27
5.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.5.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Conclusion 33

參考文獻

[1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image superresolution: Dataset and study. In The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops, July 2017.
[2] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.
arXiv preprint arXiv:1607.06450, 2016.
[3] Marco Bevilacqua, Aline Roumy, Christine M. Guillemot, and Marie-Line AlberiMorel. Low-complexity single-image super-resolution based on nonnegative neighbor
embedding. In British Machine Vision Conference, 2012.
[4] Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and
Manning Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
[5] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In
European conference on computer vision, pages 213–229. Springer, 2020.
[6] Xiangyu Chen, Xintao Wang, Jiantao Zhou, and Chao Dong. Activating more pixels
in image super-resolution transformer. arxiv 2022. arXiv preprint arXiv:2205.04437,
1, 2022.
[7] Marcos V Conde, Ui-Jin Choi, Maxime Burchi, and Radu Timofte. Swin2sr: Swinv2
transformer for compressed image super-resolution and restoration. In European
Conference on Computer Vision, pages 669–687. Springer, 2022.
[8] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-order
attention network for single image super-resolution. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pages 11065–11074, 2019.
[9] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep
convolutional network for image super-resolution. In Computer Vision–ECCV 2014:
13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings,
Part IV 13, pages 184–199. Springer, 2014.
[10] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image superresolution using deep convolutional networks. IEEE transactions on pattern analysis
and machine intelligence, 38(2):295–307, 2015.
[11] Chao Dong, Chen Change Loy, and Xiaoou Tang. Accelerating the super-resolution
convolutional neural network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14,
pages 391–407. Springer, 2016.
[12] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua
Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold,
Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arxiv 2020. arXiv preprint arXiv:2010.11929, 2010.
[13] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. Single image super-resolution
from transformed self-exemplars. 2015 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 5197–5206, 2015.
[14] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196,
2017.
[15] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution
using very deep convolutional networks. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 1646–1654, 2016.
[16] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Deeply-recursive convolutional
network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
[17] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[18] Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham,
Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang,
et al. Photo-realistic single image super-resolution using a generative adversarial
network. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 4681–4690, 2017.
[19] Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, and Jiaya Jia. On
efficient transformer-based image pre-training for low-level vision. arXiv preprint
arXiv:2112.10175, 2021.
[20] Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu
Timofte. Swinir: Image restoration using swin transformer. In Proceedings of the
IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
[21] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages
1132–1140, 2017.
[22] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
[23] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of
the IEEE conference on computer vision and pattern recognition workshops, pages
136–144, 2017.
[24] Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning,
Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity
and resolution. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pages 12009–12019, 2022.
[25] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin,
and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted
windows. In Proceedings of the IEEE/CVF international conference on computer
vision, pages 10012–10022, 2021.
[26] David R. Martin, Charless C. Fowlkes, Doron Tal, and Jitendra Malik. A database of
human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings Eighth IEEE International
Conference on Computer Vision. ICCV 2001, 2:416–423 vol.2, 2001.
[27] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, T. Yamasaki,
and Kiyoharu Aizawa. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 76:21811–21838, 2015.
[28] Yiqun Mei, Yuchen Fan, and Yuqian Zhou. Image super-resolution with non-local
sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 3517–3526, 2021.
[29] Ben Niu, Weilei Wen, Wenqi Ren, Xiangde Zhang, Lianping Yang, Shuzhen Wang,
Kaihao Zhang, Xiaochun Cao, and Haifeng Shen. Single image super-resolution
via a holistic attention network. In Computer Vision–ECCV 2020: 16th European
Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 191–
207. Springer, 2020.
[30] Guocheng Qian, Yuanhao Wang, Chao Dong, Jimmy S Ren, Wolfgang Heidrich,
Bernard Ghanem, and Jinjin Gu. Rethinking the pipeline of demosaicing, denoising
and super-resolution. arXiv preprint arXiv:1905.02538, 2019.
[31] Jinpeng Shi, Hui Li, Tianle Liu, Yulong Liu, Mingjian Zhang, Jinchen Zhu, Ling
Zheng, and Shizhuang Weng. Image super-resolution using efficient striped window
transformer. arXiv preprint arXiv:2301.09869, 2023.
[32] Wenzhe Shi, Jose Caballero, Ferenc Husz´ar, Johannes Totz, Andrew P Aitken, Rob
Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video superresolution using an efficient sub-pixel convolutional neural network. In Proceedings of
the IEEE conference on computer vision and pattern recognition, pages 1874–1883,
2016
[33] Ying Tai, Jian Yang, and Xiaoming Liu. Image super-resolution via deep recursive
residual network. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 3147–3155, 2017.
[34] Tong Tong, Gen Li, Xiejie Liu, and Qinquan Gao. Image super-resolution using dense
skip connections. In Proceedings of the IEEE international conference on computer
vision, pages 4799–4807, 2017.
[35] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv´e J´egou. Training data-efficient image transformers & distillation
through attention. In International conference on machine learning, pages 10347–
10357. PMLR, 2021.
[36] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in
neural information processing systems, 30, 2017.
[37] Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. Residual attention network for image classification.
In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 3156–3164, 2017.
[38] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural
networks. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 7794–7803, 2018.
[39] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao,
and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV)
workshops, pages 0–0, 2018.
[40] Yancheng Wang, Ning Xu, and Yingzhen Yang. Adaptive cross-layer attention for
image restoration. arXiv preprint arXiv:2203.03619, 2022.
[41] Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan,
Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, and Peter Vajda. Visual transformers: Token-based image representation and processing for computer vision. arXiv
preprint arXiv:2006.03677, 2020.
[42] Roman Zeyde, Michael Elad, and Matan Protter. On single image scale-up using
sparse-representations. In Curves and Surfaces, 2010.
[43] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image
super-resolution using very deep residual channel attention networks. In Proceedings
of the European conference on computer vision (ECCV), pages 286–301, 2018.
[44] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense
network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2472–2481, 2018.
[45] Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, and Chen Change Loy. Crossscale internal graph neural network for image super-resolution. Advances in neural
information processing systems, 33:3499–3509, 2020.
[46] Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, and Qibin Hou.
Srformer: Permuted self-attention for single image super-resolution. arXiv preprint
arXiv:2303.09735, 2023.

指導教授

孫敏德(Min-Te Sun)

審核日期

2024-1-25

推文