參考文獻 |
[1] L. Ma, X. Jia, B. Schiele, T. Tuytelaars, L. Van Gool. "Pose guided person image generation," Advances in Neural Information Processing Systems 30, 2017, pp. 406-416.
[2] P. Esser, E. Sutter, and B. Ommer, "A variational u-net for conditional appearance and shape generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8857-8866.
[3] Y. Ren, X. Yu, J. Chen, T. H. Li, G. Li, "Deep image spatial transformation for person image generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7690-7699.
[4] Y. Men, Y. Mao, Y. Jiang, W.-Y. Ma, and Z. Lian, "Controllable person image synthesis with attribute-decomposed gan," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5084-5093.
[5] A. K. Bhunia, S. Khan, H. Cholakkal, R. M. Anwer, J. Laaksonen, M. Shah, and F. S. Khan, "Person image synthesis via denoising diffusion model," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5968-5976.
[6] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, "Deep unsupervised learning using nonequilibrium thermodynamics," in Proceedings of the International Conference on Machine Learning, PMLR, 2015.
[7] P. Dhariwal and A. Nichol, "Diffusion models beat gans on image synthesis," Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780–8794, 2021.
[8] A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models," in Proceedings of the 39th International Conference on Machine Learning, vol. 162, PMLR, 2022, pp. 16784-16804. Available: https://proceedings.mlr.press/v162/nichol22a.html
[9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695.
[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017.
[11] P. Esser, R. Rombach, and B. Ommer, "Taming Transformers for High-Resolution Image Synthesis," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12873-12883.
[12] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 586-595, doi: 10.1109/CVPR.2018.00068.
[13] P. Isola, J. Zhu, T. Zhou, and A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5967-5976, doi: 10.1109/CVPR.2017
[14] J. Ho, A. Jain, and P. Abbeel, "Denoising Diffusion Probabilistic Models," Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 33, 2020, pp. 6840–6851.
[15] T. Brooks, A. Holynski, and A. A. Efros, "InstructPix2Pix: Learning To Follow Image Editing Instructions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18392-18402.
[16] Y. Shi, C. Xue, J. Liew, J. Pan, H. Yan, W. Zhang, V. Tan, and S. Bai, "DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing," 2023, arXiv:2306.14435 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2306.14435
[17] L. Zhang, A. Rao, and M. Agrawala, "Adding Conditional Control to Text-to-Image Diffusion Models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2023, pp. 3836-3847.
[18] A. K. Bhunia, S. Khan, H. Cholakkal, R. M. Anwer, J. Laaksonen, M. Shah, and F. S. Khan, "Person Image Synthesis via Denoising Diffusion Model," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 5968-5976.
[19] J. Karras, A. Holynski, T.-C. Wang, and I. Kemelmacher-Shlizerman, "DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22623-22633., doi: 10.1109/ICCV51070.2023.02073.
[20] X. Han, X. Zhu, J. Deng, Y.-Z. Song, and T. Xiang, "Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion," in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2023, pp. 22768-22777.
[21] Y. Ren, X. Fan, G. Li, S. Liu, and T. H. Li, "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13525-13534.
[22] J. Ho and T. Salimans, "Classifier-Free Diffusion Guidance," 2022, arXiv:2207.12598 [cs.LG]. [Online]. Available: https://arxiv.org/abs/2207.12598
[23] C. Schuhmann et al., "LAION-5B: An open large-scale dataset for training next generation image-text models," dvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., Curran Associates, Inc., vol. 35, 2022, pp. 25278-25294.
[24] J. Song, C. Meng, and S. Ermon, "Denoising Diffusion Implicit Models," in Proceedings of the International Conference on Learning Representations, 2021.
[25] Hsu-Yung Cheng*, C.C. Yu, Chih-Lung Lin, "Generating Dance Videos using Pose Transfer Generative Adversarial Network with Multiple Scale Region Extractor and Learnable Region Normalization," IEEE Multimedia, vol. 29, no. 1, Mar 2022. (SCI, EI)
[26] J. Canny, "A Computational Approach to Edge Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, 1986, doi: 10.1109/TPAMI.1986.4767851.
[27] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1623-1637, 2022, doi: 10.1109/TPAMI.2020.3019967.
[28] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291-7299.
[29] S. Y. Cheong, A. Mustafa, and A. Gilbert, "UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct. 2023, pp. 4173-4182.
[30] Stefan Elfwing, Eiji Uchibe, and Kenji Doya, "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning," CoRR, vol. abs/1702.03118, 2017. [Online]. Available: http://arxiv.org/abs/1702.03118.
[31] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in International Conference on Learning Representations (ICLR), 2021. |