基於多尺度注意力模型之物件偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：28

、訪客IP：3.145.43.92

姓名

林凱君(Kai-Chun Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於多尺度注意力模型之物件偵測
(Multi-Scale Attention Model Based Object Detection)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來，深度學習和機器學習已經被大家所關注，其中以卷積神經網路 (Convolution Neural Network, CNN)在影像辨識中相較於傳統分類方法有突破性的表現。其中影像辨識中的物件偵測可以應用在生活中的許多地方，包含行人偵測、人臉辨識、無人商店以及無人車的應用。物件偵測的其中一個網路架構為 SSD : Single Shot MultiBox Detector，其特色為結合多尺度特徵的物件偵測方法，有效提升準確度。本論文使用了兩種網路的優勢，其一為多尺度網路，另一個為特徵金字塔，提出了在網路加入了注意力機制，本論文提出方法為端對端(end-to-end)訓練的網路架構。
本論文是以多尺度金字塔模型(FPNSSD)為基礎，並且加入了注意力機制。注意力機制在殘差注意力分類(Residual Attention Classification)已經實現了分類上的注意力，並且對分類準確度有相當的提升，因此本論文在 FPNSSD 中加入注意力機制，且繼承原本網路可以多尺度偵測的特性，使得在辨識的時候，對於小物件比較能抓出它的關鍵特徵，能使小物件的準確度提升。
在實驗上，我們在VOC2012 的測試集上做實驗，實驗結果顯示加入注意力機制的網路對於小物件，例如鳥類、瓶子等小物件有較高準確度。

摘要(英)

In recent years, deep learning plays an important role in Artificial Intelligence, which Convolutional Neural Network(CNN) has a breakthrough performance comparing with the traditional methods in image classification. Object detection is the popular issue in the image processing, and it has a lot of applications in our life, include face detection, pedestrian detection which can be used in self-driving car and the self-service store need the object detection application in product detection. There were lots of object detection research published in the world. One is SSD: Single Shot Multibox Detector, which combines predictions from multiple feature maps with different resolutions to naturally handle objects of various size. Our paper combines the advantages of two networks: multi-scale network and feature pyramid network. Proposed adding the attention mechanism to the network. This network can be trained end-to-end.
In this work, based on FPNSSD network and add Attention mechanism into multi-scale network. The Attention mechanism can let the deep network learned the important area in the feature map, and gave more weight in important area. Because the attention mechanism had better performance in classification and segmentation, we add attention in the multi-scale network, hopes it have better performance in small object detection.
In the experiment, FPNSSD with attention got the better performance of bonding box and classification in the small object like bird, bottle in VOC challenge 2012.

關鍵字(中)

★ 物件偵測
★ 注意力模型
★ 深度學習
★ 卷積神經網路

關鍵字(英)

★ Object Detection
★ Attention Model
★ Deep Learning
★ Convolution Neural Network

論文目次

中文摘要 .....vi
Abstract ..... vii
目錄 ....... viii
圖目錄....x
表目錄........ xii
第一章緒論 ...1
1.1 研究背景...1
1.2 研究動機與目的 .............. 2
1.3 研究方法與章節概要 ....... 2
第二章深度學習........4
2.1 類神經網路簡介與發展...............................4
2.2 感知機與類神經網路 ................................ 5
2.3 倒傳遞類神經網路.....................................6
2.4 深層神經網路............................................8
2.5 卷積神經網路常見架構.............................11
第三章目標檢測分析.....................................16
3.1 目標檢測簡介..........................................16
3.2 Two-stage Detector ..............................17
3.3 One-stage Detector...............................23
3.4 多尺度目標檢測.......................................27
第四章提出架構............................................31
4.1 多尺度金字塔目標檢測模型......................31
4.2 注意力模型.............................................35
4.3 多尺度注意力檢測模型............................39
第五章實驗.................................................41
5.1 實驗設置介紹..........................................41
5.2 實驗架構實驗.........................................43
5.3 實驗結果................................................46
第六章結論及未來研究方向..........................48
第七章參考文獻...........................................49

參考文獻

1. Hinton, G.E., S. Osindero, and Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput., 2006. 18(7): p. 1527-1554.
2. Mikolov, T., et al. Recurrent neural network based language model. in INTERSPEECH. 2010.
3. Lecun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 86(11): p. 2278-2324.
4. Krizhevsky, A., I. Sutskever, and G.E. Hinton, ImageNet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. 2012, Curran Associates Inc.: Lake Tahoe, Nevada. p. 1097-1105.
5. Deng, J., et al. ImageNet: A large-scale hierarchical image database. in 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009.
6. Simonyan, K. and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, 2014. abs/1409.1556.
7. Szegedy, C., et al., Going Deeper with Convolutions. CoRR, 2014. abs/1409.4842.
8. He, K., et al. Deep Residual Learning for Image Recognition. ArXiv e-prints, 2015.
1512.
9. Girshick, R., et al. Rich feature hierarchies for accurate object detection and semantic
segmentation. ArXiv e-prints, 2013. 1311.
10. Girshick, R. Fast R-CNN. ArXiv e-prints, 2015. 1504.
11. Ren, S., et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks. ArXiv e-prints, 2015. 1506.
12. Lowe, D.G., Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput.
Vision, 2004. 60(2): p. 91-110.
13. Dalal, N. and B. Triggs. Histograms of oriented gradients for human detection. in 2005
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR′05). 2005.
14. Liu, W., et al., SSD: Single Shot MultiBox Detector. CoRR, 2015. abs/1512.02325.
15. Lin, T.-., Yi, et al., Feature Pyramid Networks for Object Detection. CoRR, 2016.
abs/1612.03144.
16. TorchCV: a PyTorch vision library mimics ChainerCV. https://github.com/kuangliu/torchcv.
17. Everingham, M., et al., The Pascal Visual Object Classes (VOC) Challenge.
International Journal of Computer Vision, 2010. 88(2): p. 303-338.
18. Wang, F., et al., Residual Attention Network for Image Classification. CoRR, 2017.
abs/1704.06904.
19. Hebb, D.O., The organization of behavior: A neuropsychological theory. 1949, New
York: Wiley.
20. Rosenblatt, F., The Perceptron: A Probabilistic Model for Information Storage and
Organization in The Brain. Psychological Review, 1958: p. 65-386.
21. Marvin, M., and Papert Seymour, Perceptrons. 1969.
22. Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by back-
propagating errors. Nature, 1986. 323: p. 533.
23. Werbos, P.J., Backpropagation Through Time: What It Does and How to Do It.
Proceedings of the IEEE, 1990. 78(10): p. 1550-1560.
24. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation,
1997. 9(8): p. 1735-1780.
25. Long, J., E. Shelhamer, and T. Darrell Fully Convolutional Networks for Semantic
Segmentation. ArXiv e-prints, 2014. 1411.
26. Introduction to Different Activation Functions for Deep Learning.
https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-
deep-learning-9689331ba092.
27. Huang, G., Z. Liu, and K.Q. Weinberger, Densely Connected Convolutional Networks.
CoRR, 2016. abs/1608.06993.
28. Szegedy, C., S. Ioffe, and V. Vanhoucke, Inception-v4, Inception-ResNet and the Impact
of Residual Connections on Learning. CoRR, 2016. abs/1602.07261.
29. Ioffe, S. and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift. CoRR, 2015. abs/1502.03167.
30. Szegedy, C., et al., Rethinking the Inception Architecture for Computer Vision. CoRR,
2015. abs/1512.00567.
31. Lin, M., Q. Chen, and S. Yan, Network In Network. CoRR, 2013. abs/1312.4400.
32. Yu, F. and V. Koltun, Multi-Scale Context Aggregation by Dilated Convolutions. CoRR,
2015. abs/1511.07122.
33. Dai, J., et al., Deformable Convolutional Networks. CoRR, 2017. abs/1703.06211.
34. Jaderberg, M., et al. Spatial Transformer Networks. ArXiv e-prints, 2015. 1506.
35. Farfade, S.S., M.J. Saberian, and L.-. Li, Jia, Multi-view Face Detection Using Deep Convolutional Neural Networks. CoRR, 2015. abs/1502.02766.
36. Dolla?r, P., et al., Fast Feature Pyramids for Object Detection. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2014. 36(8): p. 1532-1545.
37. Lienhart, R. and J. Maydt. An extended set of Haar-like features for rapid object detection. in Proceedings. International Conference on Image Processing. 2002.
38. Mac?kiewicz, A. and W. Ratajczak, Principal components analysis (PCA). Computers & Geosciences, 1993. 19(3): p. 303 - 342.
39. Felzenszwalb, P.F., et al., Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010. 32(9): p. 1627-1645.
40. Cortes, C. and V. Vapnik, Support-Vector Networks. Machine Learning, 1995. 20(3): p. 273-297.
41. Sermanet, P., et al., OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. CoRR, 2013. abs/1312.6229.
42. Huang, J., et al., Speed/accuracy trade-offs for modern convolutional object detectors. CoRR, 2016. abs/1611.10012.
43. Uijlings, J.R.R., et al., Selective Search for Object Recognition. International Journal of Computer Vision, 2013. 104(2): p. 154-171.
44. He, K., et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ArXiv e-prints, 2014. 1406.
45. Dai, J., et al., R-FCN: Object Detection via Region-based Fully Convolutional Networks. CoRR, 2016. abs/1605.06409.
46. He, K., et al. Mask R-CNN. ArXiv e-prints, 2017. 1703.
47. Redmon, J., et al., You Only Look Once: Unified, Real-Time Object Detection. CoRR,
2015. abs/1506.02640.
48. Redmon, J. and A. Farhadi, YOLOv3: An Incremental Improvement. CoRR, 2018.
abs/1804.02767.
49. Redmon, J. and A. Farhadi, YOLO9000: Better, Faster, Stronger. CoRR, 2016.
abs/1612.08242.
50. Lin, T.-., Yi, et al., Focal Loss for Dense Object Detection. CoRR, 2017. abs/1708.02002.
51. Fu, C.-., Yang, et al., DSSD : Deconvolutional Single Shot Detector. CoRR, 2017.
abs/1701.06659.
52. Li, Z. and F. Zhou, FSSD: Feature Fusion Single Shot Multibox Detector. CoRR, 2017.
abs/1712.00960.
53. Shrivastava, A., A. Gupta, and R.B. Girshick, Training Region-based Object Detectors with Online Hard Example Mining. CoRR, 2016. abs/1604.03540.
54. Zhang, K., et al., Joint Face Detection and Alignment using Multi-task Cascaded
Convolutional Networks. CoRR, 2016. abs/1604.02878.
55. Kong, T., et al., HyperNet: Towards Accurate Region Proposal Generation and Joint
Object Detection. CoRR, 2016. abs/1604.00600.
56. Liu, S., et al., Path Aggregation Network for Instance Segmentation. CoRR, 2018.
abs/1803.01534.
57. Singh, B. and L.S. Davis, An Analysis of Scale Invariance in Object Detection - SNIP.
CoRR, 2017. abs/1711.08189.
58. Nallapati, R., B. Xiang, and B. Zhou, Sequence-to-Sequence RNNs for Text
Summarization. CoRR, 2016. abs/1602.06023.
59. Bahdanau, D., K. Cho, and Y. Bengio, Neural Machine Translation by Jointly Learning
to Align and Translate. CoRR, 2014. abs/1409.0473.
60. Chung, J., et al., Empirical Evaluation of Gated Recurrent Neural Networks on
Sequence Modeling. CoRR, 2014. abs/1412.3555.
61. Xu, K., et al., Show, Attend and Tell: Neural Image Caption Generation with Visual
Attention. CoRR, 2015. abs/1502.03044.
62. Chen, L.-., Chieh, et al., Attention to Scale: Scale-aware Semantic Image
Segmentation. CoRR, 2015. abs/1511.03339.
63. Zhao, B., et al., Diversified Visual Attention Networks for Fine-Grained Object
Classification. IEEE Transactions on Multimedia, 2017. 19(6): p. 1245-1256.
64. Hu, J., L. Shen, and G. Sun, Squeeze-and-Excitation Networks. CoRR, 2017.
abs/1709.01507.
65. Stollenga, M.F., et al., Deep Networks with Internal Selective Attention through
Feedback Connections. CoRR, 2014. abs/1407.3068.
66. Chen, L., et al., SCA-CNN: Spatial and Channel-wise Attention in Convolutional
Networks for Image Captioning. CoRR, 2016. abs/1611.05594.
67. Lin, T.-Y., et al. Microsoft COCO: Common Objects in Context. 2014. Cham: Springer
International Publishing.

指導教授

王家慶

審核日期

2018-8-7

推文