參考文獻 |
[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, NV, Dec.3-8, 2012, pp.1106-1114.
[2] C.-H. Huang, H.-Y. Wu, and Y.-L. Lin, “HarDNet-MSEG: a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 Mean Dice and 86 FPS,” arXiv:2101.07172.
[3] P. Chao, C.-Y. Kao, Y.-S. Ruan, C.-H. Huang, and Y.-L. Lin, “HarDNet: a low memory traffic network,” arXiv:1909.00948.
[4] S. Liu, D. Huang, and Y. Wang, “Receptive field block net for accurate and fast object detection,” arXiv:1711.07767.
[5] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” arXiv:1608.06993.
[6] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: a survey,” arXiv:2001.05566v5.
[7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv:1411.4038v2.
[8] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” arXiv:1505.04597v1.
[9] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: a nested U-Net architecture for medical image segmentation,” arXiv:1807.10165v1.
[10] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: redesigning skip connections to exploit multiscale features in image segmentation,” arXiv:1912.05074v2.
[11] F. I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, “ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data,” ISPRS Journal of Photogrammetry and Remote Sensing, vol.162, pp.94-114, 2020.
[12] D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen, “DoubleU-net: a deep convolutional neural network for medical image segmentation,” arXiv:2006.04868.
[13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385.
[14] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” arXiv:1511.00561v3.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556v6.
[16] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” arXiv:1412.7062v4.
[17] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” arXiv:1606.00915v2.
[18] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587v3.
[19] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv:1802.02611v3.
[20] F. Chollet, “Xception: deep learning with depthwise separable convolutions,” arXiv:1610.02357v3.
[21] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861.
[22] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” arXiv:1703.06870v3.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” arXiv:1506.01497v3.
[24] R. Girshick, “Fast R-CNN,” arXiv:1504.08083v2.
[25] Z. Cai and N. Vasconcelos, “Cascade R-CNN: high quality object detection and instance segmentation,” arXiv:1906.09756.
[26] K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, and C. C. Loy, and D. Lin, “Hybrid task cascade for instance segmentation,” arXiv:1901.07518.
[27] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: real-time instance segmentation,” arXiv:1904.02689.
[28] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” arXiv:1708.02002.
[29] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” arXiv:1612.03144.
[30] J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” arXiv:1703.05175.
[31] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT++: better real-time instance segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol.44, no.2, pp.1108-1121, 2022.
[32] H. Liu, R. A. R. Soto, F. Xiao, and Y. J. Lee, “YolactEdge: real-time instance segmentation on the edge,” arXiv:2012.12259.
[33] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei , “Deformable convolutional networks,” arXiv:1703.06211.
[34] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842.
[35] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” arXiv:1512.00567.
[36] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” arXiv:1602.07261.
[37] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv:1409.0473.
[38] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv:1709.01507v4.
[39] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, “CBAM: convolutional block attention module,” arXiv:1807.06521v2.
[40] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” arXiv:1809.02983v4.
[41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.-N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762v5.
[42] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv:1805.08318v2.
[43] H. Li, P. Xiong, J. An, and L. Wang, “Pyramid attention network for semantic segmentation,” arXiv:1805.10180v3.
[44] Y. Hu, G. Wen, M. Luo, D. Dai, J. Ma, and Z. Yu, “Competitive inner-imaging squeeze and excitation for residual network,” arXiv:1807.08920v4.
[45] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: efficient channel attention for deep convolutional neural networks,” arXiv:1910.03151v4.
[46] B. Niu, W. Wen, W. Ren, X. Zhang, L. Yang, S. Wang, K. Zhang, X. Cao, and H. Shen, “Single image super-resolution via a holistic attention network,” arXiv:2008.08767v1.
[47] J.-B. Cordonnier, A. Loukas, and M. Jaggi, “Multi-head attention: collaborate instead of concatenate,” arXiv:2006.16362.
[48] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: transformers for image recognition at scale,” arXiv:2010.11929.
[49] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: hierarchical vision transformer using shifted windows,” arXiv:2103.14030.
[50] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167v3.
[51] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. of International Conference on Machine Learning (ICML), Haifa, Israel, Jun.21-24, 2010, pp.807-814.
[52] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” arXiv:1603.07285v2.
[53] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in Proc. of Neural Information Processing Systems (NIPS), Palais des Congrès de Montréal, Montréal, Canada, Dec.2-8, 2018, pp.8778-8788.
[54] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: fully convolutional neural networks for volumetric medical image segmentation,” arXiv:1606.04797v1.
[55] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980v9. |