多尺度可變形卷積對齊網路應用於影片超解析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：57

、訪客IP：3.144.232.9

姓名

李易翰(Yi-Han Lee) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

多尺度可變形卷積對齊網路應用於影片超解析
(Multiscale Deformable Convolution Alignment Network for Video Super Resolution)

相關論文

★ 基於卷積神經網路之注視區塊估測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-6以後開放)

摘要(中)

隨著科技進步，能夠拍出高解析度影像的設備以及高解析度的顯示器已經隨手可得，許多過去儲存的影片電子檔的解析度相比現今流通的顯示器顯得很低，這時候就需要用到影片超解析演算法來提升影片的解析度，此外影片超解析也能夠應用在網路傳輸，能夠在傳輸影片之前下降影片解析度，傳輸完畢後再利用影片超解析演算法將影片的解析度還原，進而達到節省流量的效果。
本篇論文提出了一個基於多尺度可變形卷積來進行影像對齊的影片超解析演算法，本文利用多尺度模型的概念，使用不同解析度的分支來預測可變形卷積的偏差值，進而增強對齊模組，並且使用SE block來整合特徵傳遞階段產生的影像特徵，幫助模型找出重要的特徵用來重建影像。本論文使用Reds資料集以及Vimeo-90k對模型進行訓練以及測試，在Reds提供的測試資料集Reds4上測試能夠超越basicVSR++0.07dB的PSNR，在視覺方面則是能夠生成出較為清晰、銳利的紋理。

摘要(英)

As a result of highly developed technology, high-resolution devices and screens are extremely easy to obtain nowadays. The display problem with distorted image which occurs on the current monitor is due to the low resolution of the traditional video. To reconstruct the low resolution video, the video super-resolution techniques are helpful in quickly generating high-resolution video. Hence, this paper proposes
a video super-resolution algorithm adopting multi-scale deformable convolution for image alignment to improve the visual quality of the to generate a video with improved visual quality as our final product. In order to enhance the alignment module, the multi-scale model and branches of different resolutions are utilized to predict the deviation value of deformable convolution. Then, the better quality reconstruction video relies heavily on the SE block architecture which is applied to integrate the image features generated from the feature propagation stage in order to help find the better image features for image alignment module.
The results of the experiments utilizing the REDs and Vimeo-90k datasets indeed generate better visual quality and high-resolution videos. The proposed algorithm has achieved the best performance. It is 0.07dB higher than the other three methods in PSNR value, and the comparison and ablation experiment results proved the effectiveness of the proposed algorithm.

關鍵字(中)

★ 影片超解析
★ 可變形卷積
★ 多尺度模型
★ 注意力機制

關鍵字(英)

★ Video super resolution
★ Multiscale model
★ Deformable Convolution
★ Attention mechanism

論文目次

中文摘要 i
英文摘要 vii
目錄 viii
圖目錄 x
表目錄 xii
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 論文架構 3
第二章相關文獻 4
2.1 影片超解析(Video super resolution) 4
2.2 基於深度學習的上採樣方法 5
2.2.1 轉置卷積(Transposed convolution) 5
2.2.2 Pixel Shuffle 6
2.3 可變形卷積(Deformable convolution) 7
2.4 BasicVSR 9
2.5 BasicVSR++ 11
2.6 多尺度模型(Multiscale module) 12
2.6.1 平行多分支結構 12
2.6.2 串行多分支結構 13
2.7 注意力機制(Attention mechanism) 14
2.7.1 通道注意力(Channel attention) 14
2.8 空間注意力(Spatial attention) 15
第三章方法介紹 16
3.1 系統架構 16
3.2 Symbol Define 17
3.3 特徵擷取 18
3.4 雙向遞迴特徵傳遞 19
3.5 光流預測模組 21
3.6 光流引導的多尺度可變形卷積對齊 22
3.7 影像重建與上採樣 24
3.8 模型訓練細節 26
第四章實驗成果 27
4.1 資料集 27
4.1.1 REalistic and Dynamic Scenes dataset Vimeo-90K 27
4.1.2 Vid4 28
4.1.3 Vimeo-90K 29
4.2 驗證指標 31
4.2.1 PSNR 31
4.2.2 SSIM 32
4.3 實驗環境與設定 33
4.4 不同的多尺度偏差值預測方法對模型的影響 34
4.5 不同分支數的FMSD對效能的影響 39
4.6 注意力機制對模型的影響 42
4.7 在不同物件移動速度下的效能比較 43
4.8 FMSD與BasicVSR++的特徵圖視覺化比較 43
4.9 結果討論 45
4.10 失敗的結果 54
第五章結論與未來展望 55
參考文獻 56

參考文獻

[1] Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., & Shi, W., " Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation.", In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4778-4787, 2017.
[2] Wang, L., Guo, Y., Lin, Z., Deng, X., & An, W., " Learning for Video Super-Resolution through HR Optical Flow Estimation.", In Asian Conference on Computer Vision, pp. 514-529, 2018.
[3] Xue, T., Chen, B., Wu, J., Wei, D., & Freeman, W. T., " Video Enhancement with Task-Oriented Flow." International Journal of Computer Vision, Vol 127, No.8 pp, 1106-1125. 2019.
[4] Dai, J., Li, Y., He, K., & Sun, J., "R-FCN: Object Detection via Region-Based Fully Convolutional Networks.", Advances in neural information processing systems, 29, 2016.
[5] Tian, Y., Zhang, Y., Fu, Y., & Xu, C., " TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3360-3369, June 2020.
[6] Wang, X., Chan, K. C., Yu, K., Dong, C., & Change Loy, C., " EDVR: Video Restoration with Enhanced Deformable Convolutional Networks.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-0, June 2019.
[7] Chan, K. C., Zhou, S., Xu, X., & Loy, C. C., "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5972-5981, 2022.
[8] He, K., Zhang, X., Ren, S., & Sun, J., " Deep Residual Learning for Image Recognition.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[9] Huang, G., Liu, Z., Van Der Maaten, L., " Densely Connected Convolutional Networks.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
[10] Chan, K. C., Wang, X., Yu, K., Dong, C., & Loy, C. C., " BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4947-4956, 2021.
[11] Zhou, K., Li, W., Lu, L., Han, X., & Lu, J., " Revisiting Temporal Alignment for Video Restoration.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6053-6062, 2022.
[12] Isobe, T., Li, S., Jia, X., Yuan, S., Slabaugh, G., Xu, C., ... & Tian, " Video Super-Resolution with Temporal Group Attention.", In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8008-8017, 2020.
[13] Dumoulin, V., & Visin, F., " A Guide to Convolution Arithmetic for Deep Learning.", arXiv preprint arXiv:1603.07285, 2016.
[14] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., ... & Wang, Z., "Real-Time Single Image and Video Super-Resolution using an Efficient Sub-Pixel Convolutional Neural Network.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874-1883, 2018.
[15] Zhu, X., Hu, H., Lin, S., & Dai, J., " Deformable Convnets v2: More Deformable, Better Results.", In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9308-9316, 2019.
[16] Yu, F., & Koltun, V., "Multi-Scale Context Aggregation by Dilated Convolutions", arXiv preprint arXiv:1511.07122., 2015.
[17] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A., "Going Deeper with Convolutions.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
[18] Chen, L. C., Papandreou, G., Schroff, F., & Adam, H., " Rethinking Atrous Convolution for Semantic Image Segmentation.", arXiv preprint arXiv:1706.05587, 2017.
[19] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S., " Feature Pyramid Networks for Object Detection.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.
[20] Hu, J., Shen, L., & Sun, G., " Squeeze-and-Excitation Networks.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.
[21] Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A., " Context Encoding for Semantic Segmentation.", In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151-7160, 2018.
[22] Gao, Z., Xie, J., Wang, Q., & Li, P., " Global Second-Order Pooling Convolutional Networks. ", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog-nition, pp. 3024-3033,2019.
[23] Jaderberg, M., Simonyan, K., & Zisserman, A., "Spatial Transformer Networks.", Advances in neural information processing systems, 28, 2015.
[24] Wang, X., Girshick, R., Gupta, A., & He, K., "Non-Local Neural Networks.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.
[25] Ranjan, A., & Black, M. J., " Optical Flow Estimation using a Spatial Pyramid Network.", In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4161-4170, 2017.
[26] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., ... & Brox, T., "Flownet: Learning Optical Flow with Convolutional Networks.", In Proceedings of the IEEE international conference on computer vision, pp. 2758-2766, 2015.
[27] Nah, S., Baik, S., Hong, S., Moon, G., Son, S., Timofte, R., & Mu Lee, K., " Ntire 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-0, 2019.
[28] Liu, C., & Sun, D., "On Bayesian Adaptive Video Super Resolution," IEEE transactions on pattern analysis and machine intelligence, Vol.36, No.2. pp. 346-360, 2014.
[29] Haris, M., Shakhnarovich, G., & Ukita, N., "Recurrent Back-Projection Network for Video Super-Resolution.", In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897-3906, 2019.
[30] Li, W., Tao, X., Guo, T., Qi, L., Lu, J., & Jia, J., "Mucan: Multi-correspondence aggregation network for video super-resolution.", In European conference on computer vision, pp. 335-351., 2020.
[31] Wang, H., Xiang, X., Tian, Y., Yang, W., & Liao, Q., " STDAN: Deformable Attention Network for Space-Time Video Super-Resolution.", arXiv preprint arXiv:2203.06841, 2022.
[32] Shim, G., Park, J., & Kweon, I. S., "Robust Reference-Based Super-Resolution with Similarity-Aware Deformable Convolution.", In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8425-8434, 2020.

指導教授

范國清高巧汶(Kuo-Chin Fan Chiao-Wen Kao)

審核日期

2022-8-30

推文