SILP: Enhancing Skin Lesion Classification using Swin Transformer with Spatial Interaction and Local Perception Modules

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：57

、訪客IP：18.116.12.121

姓名

周裕惠(Yu-Hui Zhou) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(SILP: Enhancing Skin Lesion Classification using Swin Transformer with Spatial Interaction and Local Perception Modules)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法	★ 車用WiFi網路中的碰撞分析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於紫外線和全球環境因素的變化，全球各地患有皮膚病的病患數量有日益增加的情況。在某些地區，因為其相對來說有限的醫療資源，因此，對於各項皮膚病的診斷仍是一大未知的挑戰。然而，未經正確醫治的皮膚病可能會發展為皮膚癌，所以，在此情況下，則迫切需要一個高效、準確且易於使用的系統來識別可疑病變。儘管目前存在許多皮膚病分類模型，但在準確性等評估指標上仍有改進的空間。為了提高皮膚病分類的準確性，本研究提出了名為SILP的新分類系統，該影像分類系統引入了兩個模塊：本地感知模塊和空間交互模塊。除此之外，我們也對激活函數進行了修改，以改進訓練時間和準確性。在實驗方面，主要通過在兩個公開的皮膚病資料及上進行實驗，以此來評估了SILP的性能。根據實驗結果顯示，SILP不僅在準確性方面比最先進的皮膚病分類模型還要好之外，在其他評估指標上也表現的十分亮眼。

摘要(英)

Because of the harmful effects of ultraviolet rays and global environmental factors, the number of patients with skin lesions is increasing. If left untreated, skin lesions may lead to skin cancer. However, limited access to specialized medical care remains a challenge in certain regions. Therefore, there is an urgent need for an efficient, accurate, and accessible tool to identify suspicious lesions. Although there are many classification models for skin lesions, there is still room for improvement in terms of accuracy. To enhance the accuracy of skin lesion classification, a novel system named SILP is proposed in this study. There are two modules in SILP: the Local Perception Module and the Spatial Interaction Module. Additionally, we have modified the activation function to improve both training time and accuracy. SILP, along with several other models, has been tested on two public skin lesion datasets. The results demonstrate that our proposed system outperforms the state-of-the-art skin lesion classification model, not only in terms of accuracy but also in various other evaluation metrics.

關鍵字(中)

★ 皮膚癌
★ 醫學影像
★ 影像分類
★ 視覺轉換器

關鍵字(英)

★ skin lesion
★ medical imaging
★ image classification
★ vision transformer

論文目次

Contents
1 Introduction 1
2 Related Work 4
2.1 Image classification............................... 4
2.1.1 Convolutional Neural Networks .................... 4 2.1.1.1 Traditional Convolutional Neural Networks . . . . . . . . 4
2.1.1.2 Efficient Deep Neural Networks............... 4
2.1.1.3 Attention-based Models ................... 5
2.1.2 Vision Transformer........................... 6
2.2 Skin Lesion Classification............................ 7
3 Preliminary 8
3.1 Data Augmentation............................... 8
3.1.1 Mixup.................................. 8
3.1.2 Intra-class Augmentation........................ 9
3.2 Convolutional Neural Networks ........................ 9 3.2.1 Convolutional Neural Networks .................... 9
3.2.2 Dilated Convolution .......................... 10
3.2.3 Residual Learning............................ 11
3.2.4 Layer Normalization .......................... 11
3.3 Vision Transformer ............................... 12
3.3.1 Attention Mechanism.......................... 12
3.3.2 Swin Transformer............................ 14
3.4 Sigmoid-weighted Linear Unit ......................... 15
3.5 Spatial Interaction Module........................... 16
4 Design 18
4.1 Motivation.................................... 18
4.2 Problem Statement............................... 18
4.3 Research Challenges .............................. 19
4.4 Proposed System Architecture......................... 20
4.4.1 Data Augmentation........................... 21
4.4.2 Model .................................. 21
4.4.2.1 Local Perception Module................... 22
4.4.2.2 Spatial Interaction Module ................. 23
4.4.2.3 Multilayer Perceptron in Swin Transformer . . . . . . . . 25
5 Performance 27
5.1 Datasets..................................... 27
5.2 Evaluation Metrics ............................... 28
5.3 Experimental Setup............................... 29
5.4 Experimental Results and Analysis ...................... 30
5.4.1 In HAM10000 Dataset ......................... 30
5.4.2 In ISIC2017 Dataset .......................... 32
5.5 Ablation Studies ................................ 34
5.5.1 Performance............................... 34
6 Conclusion............................... 35

參考文獻

[1] Abien Fred Agarap. Deep learning using rectified linear units (relu). CoRR, abs/1803.08375, 2018.
[2] Oma N. Agbai, Kesha Buster, Miguel Sanchez, Claudia Hernandez, Roopal V. Kundu, Melvin Chiu, Wendy E. Roberts, Zoe D. Draelos, Reva Bhushan, Susan C. Taylor, and Henry W. Lim. Skin cancer and photoprotection in people of color: A review and recommendations for physicians and the public. Journal of the American Academy of Dermatology, 70(4):748–762, 2014.
[3] R. Anand, K.G. Mehrotra, C.K. Mohan, and S. Ranka. An improved algorithm for neural network classification of imbalanced training sets. IEEE Transactions on Neural Networks, 4(6):962–969, 1993.
[4] Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016.
[5] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
[6] Samia Benyahia, Boudjelal Meftah, and Olivier L ́ezoray. Skin lesion classification using convolutional neural networks based on multi-features extraction. In Nicolas Tsapatsoulis, Andreas Panayides, Theo Theocharides, Andreas Lanitis, Constanti- nos S. Pattichis, and Mario Vento, editors, Computer Analysis of Images and Patterns - 19th International Conference, CAIP 2021, Virtual Event, September 28-30, 2021, Proceedings, Part I, volume 13052 of Lecture Notes in Computer Science, pages 466–475. Springer, 2021.
[7] John S. Bridle. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Proceedings of the 2nd International Conference on Neural Information Processing Systems, NIPS’89, page 211–217, Cambridge, MA, USA, 1989. MIT Press.
[8] Andy Brock, Soham De, Samuel L. Smith, and Karen Simonyan. High-performance large-scale image recognition without normalization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 1059–1071. PMLR, 2021.
[9] Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27-28, 2019, pages 1971–1980. IEEE, 2019.
[10] Chun-Fu (Richard) Chen, Quanfu Fan, and Rameswar Panda. Crossvit: Cross-attention multi-scale vision transformer for image classification. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 347–356. IEEE, 2021.
[11] Noel C. F. Codella, David A. Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin K. Mishra, Harald Kittler, and Allan Halpern. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (ISIC). CoRR, abs/1710.05006, 2017.
[12] Soumyya Kanti Datta, Mohammad Abuzar Shaikh, Sargur N. Srihari, and Mingchen Gao. Soft attention improves skin cancer classification performance. In Mauricio Reyes, Pedro Henriques Abreu, Jaime S. Cardoso, Mustafa Hajij, Ghada Zamzmi, Rahul Paul, and Lokendra Thakur, editors, Interpretability of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data - 4th International Workshop, iMIMIC 2021, and 1st International Workshop, TDA4MedicalData 2021, Held in Conjunction with MICCAI 2021, Stras- bourg, France, September 27, 2021, Proceedings, volume 12929 of Lecture Notes in Computer Science, pages 13–23. Springer, 2021.
[13] Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style convnets great again. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 13733–13742. Computer Vision Foundation / IEEE, 2021.
[14] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. Open- Review.net, 2021.
[15] Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. CoRR, abs/1702.03118, 2017.
[16] Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nat., 542(7639):115–118, 2017.
[17] Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry P. Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 1790–1799. IEEE Computer Society, 2017.
[18] Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. Dual attention network for scene segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3146–3154. Computer Vision Foundation / IEEE, 2019.
[19] Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, Cambridge, MA, USA, 2016. http://www.deeplearningbook.org.
[20] Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghostnet: More features from cheap operations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 1577–1586. Computer Vision Foundation / IEEE, 2020.
[21] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
[22] Xin He, Yong Zhou, Jiaqi Zhao, Di Zhang, Rui Yao, and Yong Xue. Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022.
[23] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv: Learn- ing, 2016.
[24] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[25] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 7132–7141. Computer Vision Foundation / IEEE Computer Society, 2018.
[26] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2261–2269. IEEE Computer Society, 2017.
[27] Jin Huang and Charles X. Ling. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng., 17(3):299–310, 2005.
[28] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.
[29] Rachneet Kaur, Hamid Gholamhosseini, and Roopak Sinha. Deep convolutional neural network for melanoma detection using dermoscopy images. In 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2020, Montreal, QC, Canada, July 20-24, 2020, pages 1524–1527. IEEE, 2020.
[30] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, L ́eon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 1106–1114, 2012.
[31] Ken’ichi Kumatani, Robert Gmyr, Felipe Cruz Salinas, Linquan Liu, Wei Zuo, De- vang Patel, Eric Sun, and Yu Shi. Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition. CoRR, abs/2112.05820, 2021.
[32] Zhangli Lan, Songbai Cai, Xu He, and Xinpeng Wen. Fixcaps: An improved capsules network for diagnosis of skin cancer. IEEE Access, 10:76261–76267, 2022.
[33] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[34] Xinyu Lei, Hongguang Pan, and Xiangdong Huang. A dilated cnn model for image classification. IEEE Access, 7:124087–124095, 2019.
[35] Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. Selective kernel networks. In
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 510–519. Computer Vision Foundation / IEEE, 2019.
[36] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 9992–10002. IEEE, 2021.
[37] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 11966–11976. IEEE, 2022.
[38] Ze Lu, Xudong Jiang, and Alex Kot. Deep coupled resnet for low-resolution face recognition. IEEE Signal Processing Letters, 25(4):526–530, 2018.
[39] Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet V2: practical guidelines for efficient CNN architecture design. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV, volume 11218 of Lecture Notes in Computer Science, pages 122–138. Springer, 2018.
[40] Teresa Mendonc ̧a, Pedro M. Ferreira, Jorge S. Marques, Andr ́e R. S. Marcal, and Jorge Rozeira. Ph2 - a dermoscopic image database for research and benchmarking. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5437–5440, 2013.
[41] Katsuhiro Nakai and Xian-Hua Han. Dpe-botnet: Dual position encoding bottleneck transformer network for skin lesion classification. In 19th IEEE International Symposium on Biomedical Imaging, ISBI 2022, Kolkata, India, March 28-31, 2022, pages 1–5. IEEE, 2022.
[42] Xianfeng Ou, Pengcheng Yan, Yiming Zhang, Bing Tu, Guoyun Zhang, Jianhui Wu, and Wujing Li. Moving object detection method via resnet-18 with encoder–decoder structure in complex scenes. IEEE Access, 7:108152–108160, 2019.
[43] Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In So Kweon. BAM: bottleneck attention module. In British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018, page 147. BMVA Press, 2018.
[44] Razvan Pascanu, Toma ́s Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pages 1310–1318. JMLR.org, 2013.
[45] Lorenzo M. Polvani, Darryn W. Waugh, Gustavo J. P. Correa, and Seok-Woo Son. Stratospheric ozone depletion: The main driver of twentieth-century atmospheric circulation changes in the southern hemisphere. Journal of Climate, 24(3):795 – 812, 2011.
[46] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4510–4520. Computer Vision Foundation / IEEE Computer Society, 2018.
[47] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[48] Liang Sui, Wenli Sun, and Xu Gao. Near-infrared maritime target detection based on swin-transformer model. In Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning, SPML ’22, page 218–225, New York, NY, USA, 2022. Association for Computing Machinery.
[49] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Satinder Singh and Shaul Markovitch, editors, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 4278–4284. AAAI Press, 2017.
[50] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 2818–2826. IEEE Computer Society, 2016.
[51] Sho Takase, Shun Kiyono, Sosuke Kobayashi, and Jun Suzuki. On layer normalizations and residual connections in transformers, 2022.
[52] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 2019.
[53] Mingxing Tan and Quoc V. Le. Efficientnetv2: Smaller models and faster training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 10096–10106. PMLR, 2021.
[54] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablay- rolles, and Herv ́e J ́egou. Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 10347–10357. PMLR, 2021.
[55] Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Herv ́e J ́egou. Going deeper with image transformers. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10- 17, 2021, pages 32–42. IEEE, 2021.
[56] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5, 2018.
[57] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[58] Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. Residual attention network for image classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 6450–6458. IEEE Computer Society, 2017.
[59] Shuhang Wang, Szu-Yeu Hu, Eugene Cheah, Xiaohong Wang, Jingchao Wang, Lei Chen, Masoud Baikpour, Arinc Ozturk, Qian Li, Shinn-Huey Chou, Constance D. Lehman, Viksit Kumar, and Anthony E. Samir. U-net using stacked dilated convolutions for medical image segmentation. CoRR, abs/2004.03466, 2020.
[60] Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Convnext V2: co-designing and scaling convexts with masked autoencoders. CoRR, abs/2301.00808, 2023.
[61] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: convolutional block attention module. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision - ECCV 2018 - 15th Euro- pean Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII, volume 11211 of Lecture Notes in Computer Science, pages 3–19. Springer, 2018.
[62] Xiangkai Xu, Zhejun Feng, Changqing Cao, Mengyuan Li, Jin Wu, Zengyan Wu, Yajie Shang, and Shubing Ye. An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sensing, 13(23), 2021.
[63] Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
[64] Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis E. H. Tay, Jiashi Feng, and Shuicheng Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 538–547. IEEE, 2021.
[65] Hongyi Zhang, Moustapha Ciss ́e, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
[66] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 6848–6856. Computer Vision Foundation / IEEE Computer Society, 2018.
[67] Yuepeng Zhou, Huiyou Chang, Yonghe Lu, and Xili Lu. Cdtnet: Improved image classification method using standard, dilated and transposed convolutions. Applied Sciences, 12(12), 2022.

指導教授

孫敏德(Min-Te Sun)

審核日期

2023-7-13

推文