基於時空域摺積神經網路之抽菸動作辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.191.202.57

姓名

邱千芳(Chien-Fang Chiu) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於時空域摺積神經網路之抽菸動作辨識
(Smoking Action Recognition Based on Spatial-Temporal Convolutional Neural Networks)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

國際上有許多國家或各區於室內公共或工作場所全面禁止抽菸，台灣也不例外。但在醫院的門口、校園的角落，仍時常看到有人在抽菸。即使沒有吸菸，但若站在吸菸者旁邊，仍會吸到菸，此菸稱為二手菸。二手菸對於人體危害甚多，除了增加罹患疾病的機率，如癌症、心臟病、中風、呼吸道疾病等，更進一步有可能傷害大腦機能。我們希望經由深度學習的技術與方法，用以辨識揪出違法的吸菸者。
本研究為「基於時空域摺積神經網路之抽菸動作辨識」，提出應用於抽菸動作辨識的系統。採用資料平衡與資料增加等方式增加效能，使用深度學習中的摺積神經網路 GoogLeNet，與Temporal segment networks之影片分段架構，組成擁有時間結構之空間域摺積神經網路(即題目之時空域神經網路)，達成有效辨識抽菸影片之系統。於原先之 Hmdb51 抽菸影片，辨識達100%，於增加之 Activitynet smoking 日常抽菸影片 (Hmdb51 + Activi-tynet smoking)，可達99.16%。於選擇之 AVA data 電影抽菸片段，亦能達到91.667%，能有效分辨抽菸之影片。

摘要(英)

Cigarette smoking increases risk for death from all causes in men and wom-en. If one stands next to a smoker, this person still can be infected, called passive smoking. Consequently, smoking is prohibited in many closed public areas such as government buildings, educational facilities, hospitals, enclosed sport facili-ties, and buses. However, it still often happens that smokers smoke even in highly prohibited places such as hospitals and elementary school campuses. The objective of this work is to develop a smoking action recognition system based on deep learning, which allows quick discovery of smoking behavior.
In this work, we propose a system that can recognize smoking action. It uti-lizes data balancing and data augmentation based on GoogLeNet and Temporal segment networks (TSN) architecture to achieve effective smoking action recog-nition. In our experiment, spatial CNN is more powerful than temporal CNN in smoking action. The experimental results show that the smoking accuracy rate can reach 100% for Hmdb51 test dataset. For additional ActivityNet smoking, accuracy rate can reach 99.16%. For additional irrelevant movie smoking clips, the accuracy can also be as high as 91.67%.

關鍵字(中)

★ 抽菸動作辨識
★ 視訊分類
★ 摺積神經網路
★ 深度學習

關鍵字(英)

★ Smoking action recognition
★ Video Classification
★ Convolutional neural networks
★ Deep learning

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章緒論 1
1-1 研究背景 1
1-2研究動機與目的 2
1-3 論文架構 3
第二章類神經網路與深度學習 4
2-1 類神經網路 4
2-1-1 類神經網路之發展 5
2-1-2 倒傳遞類神經網路 7
2-2 深度學習 11
2-2-1 深度神經網路 11
2-2-2 摺積神經網路 (CNN) 13
2-2-3 批次資料正規化 (Batch Normalization) 18
2-3 動作辨識領域之發展 22
2-3-1 雙串流的神經網路 (Two-stream networks) 22
2-3-2 Temporal segment networks (TSN) 23
第三章提出之方法與相關使用 26
3-1 影片影格之提取與資料前處理 27
3-2 訓練階段 28
3-3 測試階段 29
第四章實驗結果與分析 30
4-1 實驗環境 30
4-2 相關參數與資料選擇 33
4-2-1 預訓練模型與輸入網路選擇之實驗 34
4-2-2 資料增加之選擇 35
4-2-3 資料平衡 (data balancing) 後之實驗結果 36
4-2-4實驗結果比較與分析 39
第五章結論與未來展望 44
參考文獻 45

參考文獻

[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning ap-plied to document recognition,” in Proceedings of the IEEE 86.11, pp. 2278-2324, 1998.
[2] K. Alex, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Pro-cessing Systems, pp.1097-1105, 2012.
[3] ImageNet Large Scale Visual Recognition Competition: http://www.image-net.org/challenges/LSVRC/
[4] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), pp. 1-9, 2015.
[6] 華文戒菸網-菸害防制法: https://www.e-quit.org/CustomPage/HtmlEditorPage.aspx?MId=242&ML=3
[7] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” In 2011 International Con-ference on Computer Vision, pp.2556–2563, IEEE, 2011.
[8] F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. Carlos Niebles, “Activi-tynet: A large-scale video benchmark for human activity understanding,” in Computer Vision and Pattern Recognition (CVPR), pp. 961-970, 2015.
[9] C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, ... and C. Schmid, “AVA: A video dataset of spatio-temporally localized atomic visual actions,” arXiv preprint arXiv: 1705.08421, 2017.
[10] Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” ACM International Conference on Multimedia, 2014.
[11] H. Wang, and C. Schmid, “Action recognition with improved trajectories,” In: Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, pp. 3551-3558, 2013.
[12] H. Wang, A. Klaser, C. Schmid, and C. L. Liu, “Dense trajectories and mo-tion boundary descriptors for action recognition,” International journal of computer vision, 103.1, pp. 60-79, 2013.
[13] K. Simonyan, and A. Zisserman. “Two-stream convolutional networks for action recognition in videos,” Advances in neural information processing systems, pp.568-576, 2014.
[14] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spa-tiotemporal features with 3d convolutional networks,” Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, pp. 4489-4497, 2015.
[15] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segment networks: Towards good practices for deep action recognition,” in European Conference on Computer Vision, pp. 20-36, 2016.
[16] OpenCV: Open Source Computer Vision Library , https://opencv.org/
[17] K. He, Zhang, X., S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[18] R. Girshick, J. Donahue, T Darrell., and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time ob-ject detection with region proposal networks.” Advances in neural infor-mation processing systems, pp. 91-99, 2015.
[20] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural net-works,” Proceedings of the IEEE conference on Computer Vision and Pat-tern Recognition, pp. 1725-1732, 2014.
[21] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
[22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, 2016.
[23] W. S. Mcculloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bulletin of Mathematical Biophysics, vol.5, no.4, pp.115-133, Dec. 1943.
[24] D. O. Hebb, “The Organization of Behavior,” New York: Wiley & Sons, 1949.
[25] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological review , 65(6), 386, 1958.
[26] M. Minsky and S. Paper, “Perceptrons,” Cambridge, MA: MIT Press, 1969.
[27] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal repre-sentations by error propagation,” No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[28] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986.
[29] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, 18(7), pp. 1527-1554, 2006.
[30] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015.
[31] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?,” IEEE transactions on medical imag-ing 35(5), pp. 1299-1312, 2016.
[32] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.

指導教授

張寶基(Pao-Chi Chang)

審核日期

2018-7-25

推文