博碩士論文 110522607 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:105 、訪客IP:13.59.182.74
姓名 佩馬納(Aditya Permana)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 以二胡學習為例之動作識別與糾正的混合分類方法
(A Hybrid Classification Method for Action Recognition and Correction – Learning Erhu as An Example)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 動作識別是深度學習方法的一種實現形式,目前應用於信息技術、體育和藝術
等更廣泛的領域。二胡是起源於中國的一種弦樂器。在演奏這種樂器時,如何正確定
位演奏者的身體並正確握住樂器是有規則的。我們需要一個系統來檢測每位二胡演奏
者的動作來滿足這些需求。因此,本研究將討論使用 3D-CNN、YOLOv3 和 GCN 三種
方法對視頻進行動作識別。 3D-CNN 方法是一種具有 CNN 基礎的方法。 CNN 是一種
常用的圖像處理方法。 3DCNN 已被證明可以有效地從連續視頻幀中捕獲運動信息。為
了提高捕獲每個動作中存儲的每個信息的能力,有必要在 3D-CNN 模型中結合 LSTM
層。 LSTM 是一種高級 RNN,一種順序網絡。它能夠處理 RNN面臨的梯度消失問題。
本研究中使用的另一種具有圖像處理能力的方法是 YOLOv3。 YOLOv3 是一個準確度
水平比較好的目標檢測器,可以實時檢測目標。然後為了最大化 YOLOv3 的性能,本
研究將 YOLOv3 與 GCN 結合起來,這樣我們就可以使用身體關鍵點來幫助 YOLOv3
方法更容易進行分類。 GCN 通過合併圖上局部相鄰周圍節點的幾個特徵來執行空間卷
積。本研究使用 RGB 視頻作為數據集,預處理和特徵提取有三個主要部分。三個主要
部分是身體,二胡桿和弓。為了執行預處理和特徵提取,本研究提出了兩種方法。第
一種方法利用 MaskRCNN 方法對輸入視頻進行分割處理。第二種方法使用身體標誌對
身體片段進行預處理和特徵提取。相比之下,二胡和弓段使用 Hough Lines 演算法。然
後將根據已定義的類別將三個主要部分劃分成幾個部分。此外,對於分類過程,本研
究提出了兩種被使用的深度學習演算法。本研究將所有深度學習方法與傳統的圖像處
理算法方法相結合。這些組合演算法過程將從二胡演奏者展示的每個動作中產生錯誤
消息作為輸出。
摘要(英) Action recognition is one form of implementation of the deep learning method, which
is currently used in a wider field related to information technology, sports, and the arts. Erhu is
a stringed instrument originating from China. In playing this instrument, there are rules on how
to position the player′s body and hold the instrument correctly. We need a system to detect
every Erhu player′s movement to meet these needs. So that in this study will discuss action
recognition on video using three methods such as 3D-CNN, YOLOv3, and GCN. The 3D-CNN
method is a method that has a CNN base. CNN is a method commonly used to perform image
processing. 3DCNN has been proved effective in capturing motion information from
continuous video frames. To improve the ability to capture every information stored in every
movement, combining an LSTM layer in the 3D-CNN model is necessary. LSTM is an
advanced RNN, a sequential network. It is capable of handling the vanishing gradient problem
faced by RNN. Another method used in this study that has the ability in image processing is
YOLOv3. YOLOv3 is an object detector with a relatively good accuracy level and can detect
objects in real-time. Then to maximize the performance of YOLOv3, this study will combine
YOLOv3 with GCN so that we can use the body key points to help YOLOv3 methods be easier
for classification. GCN performs spatial convolution by merging several features of nodes
around local neighbors on the graph. This research uses RGB video as a dataset, and there are
three main parts in preprocessing and feature extraction. The three main parts are the body, erhu
pole, and bow. To perform preprocessing and feature extraction, this study proposes two
approaches. The first approach uses a segmentation process on the input video by utilizing the
MaskRCNN method. The second approach uses a body landmark to perform preprocessing and
feature extraction on the body segment. In contrast, the erhu and bow segments use the Hough
Lines algorithm. The three main sections will then be divided into several sections according
to the class that has been defined. Furthermore, for the classification process, this study
proposes two algorithms to be used, namely, deep learning. This study will combine all deep
learning methods with traditional image processing algorithm methods. These combination
algorithm processes will produce an error message output from every movement displayed by
the erhu player.
關鍵字(中) ★ : 動作識別, CNN, 3D-CNN, LSTM, YOLOv3, GCN 關鍵字(英) ★ Action Recognition, CNN, 3D-CNN, LSTM, YOLOv3, GCN
論文目次 TABLE OF CONTENTS
摘要.............................................................................................................................................i
ABSTRACT ...............................................................................................................................v
ACKNOWLEDGEMENT.........................................................................................................vi
Table of Contents .....................................................................................................................vii
List of Figure .............................................................................................................................ix
List of Tables.............................................................................................................................xi
INTRODUCTION......................................................................................................................1
1.1 Background..................................................................................................................1
1.2 Problem Statement.......................................................................................................3
1.3 Outline Chapter............................................................................................................3
1.4 Research Limitation.....................................................................................................3
LITERATURE REVIEW...........................................................................................................5
2.1 Previous Work .............................................................................................................5
2.2 Mask-RCNN Segmentation .......................................................................................10
2.3 Traditional Algorithm................................................................................................11
2.4 Convolution Neural Network.....................................................................................12
2.5 The Hough Transform ...............................................................................................14
2.6 Three Dimension Convolution Neural Network (3D-CNN) .....................................16
2.7 Long Short Term Memory (LSTM)...........................................................................18
2.8 YOLOv3 ....................................................................................................................20
2.9 Graph Convolutional Network (GCN) ......................................................................22
2.10 Categorical Cross Entropy Loss ................................................................................23
RESEARCH METHOD ...........................................................................................................36
3.1 Research Architecture................................................................................................36
3.2 Dataset .......................................................................................................................39
3.3 Segmentation Process ................................................................................................40
3.4 Data Preprocessing ....................................................................................................41
3.5 Defining Classification ..............................................................................................43
3.6 Experiment Design ....................................................................................................45
3.7 Evaluation Matrix ......................................................................................................49
3.8 Experiment Setting ....................................................................................................50
RESULT AND DISCUSSION.................................................................................................51
viii
4.1 Experiment Result......................................................................................................51
4.2 Experiment Evaluation ..............................................................................................55
4.3 Discussion..................................................................................................................63
CONCLUSION AND SUGGESTION.....................................................................................65
5.1 Conclusion .................................................................................................................65
5.2 Suggestion..................................................................................................................65
REFERENCE ...........................................................................................................................67
參考文獻 [1] Z. Lian, “A Reflection On The Intonation In Erhu Performance”, in Journal Bulletin Social
Economic and Humanitarian Research, pp 2658-5561, doi:10.5281/zenodo.3593861, 2019
[2] Li Zusheng, "On the flavor beauty of Erhu Art", Journal of Wuhan Conservatory of Music No 9,
2003
[3] T. Bing, L. Li, Y. Qu, L. Yan., "Video Object Detection for Tractability with Deep Learning
Method". Fifth International Conference on Advanced Cloud and Big Data,
doi:10.1109/CBD.2017.75, 2017
[4] H. Kaiming, Georgia Gkioxari, Piotr Dollar, Ross Girshick. "Mask R-CNN",
doi:10.48550/arXiv.1703.06870, 2018
[5] Hou, Rui, Chen & Sukthankar, Rahul & Shah, Mubarak., "An Efficient 3D CNN for
Action/Object Segmentation in Video", 2019
[6] L. Ma, F. Xu, T. Li and H. Zhang, "A Moving Object Detection Method Based on 3D Convolution
Neural Network", 7th International Conference on Information Science and Control Engineering
(ICISCE), pp. 55-59, doi:10.1109/ICISCE50968.2020.00022, 2020
[7] T. Jin, Z. He, A. Basu, J. Soraghan, G. Di Caterina and L. Petropoulakis, "Dense Convolutional
Networks for Efficient Video Analysis", 5th International Conference on Control, Automation
and Robotics (ICCAR), pp. 550-554, doi: 10.1109/ICCAR.2019.8813408, 2019
[8] M. -W. Lin, S. -J. Ruan and Y. -W. Tu, "A 3DCNN-LSTM Hybrid Framework for sEMG-Based
Noises Recognition in Exercise", IEEE Access, vol. 8, pp. 162982-162988,
doi:10.1109/ACCESS.2020.3021344, 2020
[9] H. Hou, Y. Li, C. Zhang, H. Liao, Y. Zhang and Y. Liu, "Vehicle Behavior Recognition using
Multi-Stream 3D Convolutional Neural Network", 36th Youth Academic Annual Conference of
Chinese Association of Automation (YAC), pp. 355-360, doi:
10.1109/YAC53711.2021.9486615, 2021
[10] G. Zhu, L. Zhang, P. Shen, J. Song, S. A. A. Shah and M. Bennamoun, "Continuous Gesture
Segmentation and Recognition Using 3DCNN and Convolutional LSTM", IEEE Transactions on
Multimedia, vol. 21, no. 4, pp. 1011-1021, doi: 10.1109/TMM.2018.2869278, April 2019
68
[11] M. Pandiya, S. Dassani and P. Mangalraj, "Analysis of Deep Learning Architectures for Object
Detection - A Critical Review", IEEE-HYDCON, pp. 1-6, doi:
10.1109/HYDCON48903.2020.9242776, 2020
[12] M. Al-Hammadi, G. Muhammad, W. Abdul, M. Alsulaiman, M. A. Bencherif and M. A.
Mekhtiche, "Hand Gesture Recognition for Sign Language Using 3DCNN", IEEE Access, vol.
8, pp. 79491-79509, doi: 10.1109/ACCESS.2020.2990434, 2020
[13] Suharjito, H. Gunawan, N. Thiracitta and A. Nugroho, "Sign Language Recognition Using
Modified Convolutional Neural Network Model", Indonesian Association for Pattern
Recognition International Conference (INAPR), pp. 1-5, doi: 10.1109/INAPR.2018.8627014,
2018
[14] R. Girshick, "Fast R-CNN", IEEE International Conference on Computer Vision (ICCV), pp.
1440-1448, doi: 10.1109/ICCV.2015.169, 2015
[15] Merlet, Jean-Pierre, "A Note on the History of Trigonometric Functions", 195-200. 10.1007/1-
4020-2204-2_16, 2004
[16] Yamashita, R., Nishio, M., Do, R.K.G. et al, "Convolutional neural networks: an overview and
application in radiology", Insights Imaging 9, 611–629, 2018
[17] Alzubaidi, L., Zhang, J., Humaidi, A.J. et al, "Review of deep learning: concepts, CNN
architectures, challenges, applications, future directions", J Big Data 8, pp. 53, 2021
[18] S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural
network", International Conference on Engineering and Technology (ICET), pp. 1-6, doi:
10.1109/ICEngTechnol. 2017.8308186, 2017
[19] Han Qi, Kai Zhao, Jun Xu and Ming-Ming Cheng, "Deep Hough Transform for Semantic Line
Detection", Nankai University, 1999
[20] S. Ji, W. Xu, M. Yang and K. Yu, "3D Convolutional Neural Networks for Human Action
Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1,
pp. 221-231, doi: 10.1109/TPAMI.2012.59, Jan 2013
[21] Manju D, Seetha M and Sammulal P, "Early action prediction using 3DCNN with LSTM and
bidirectional LSTM", Turkish Journal of Computer and Mathmatics Education, vol. 12, no. 6,
pp. 2275-2281, 10.48550/ARXIV.2106.00050, 2021
[22] Hedegaard L., Alexandros Iosifidis, "Continual 3D Convolutional Neural Networks for Realtime Processing of Videos", arXiv, 2021
[23] Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra
Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and
Andrew Zisserman, "The kinetics human action video dataset", preprint, arXiv:1705.06950,
2017
[24] Zhao, Z., Chen, W., Wu, X., Chen, P. C., & Liu, J., "LSTM network: a deep learning approach
for short-term traffic forecast", IET Intelligent Transport Systems, 11(2), 68-75, 2017
[25] Yao, Li & Qian, Ying., "DT-3DResNet-LSTM: An Architecture for Temporal Activity
Recognition in Videos", 19th Pacific-Rim Conference on Multimedia, Hefei, China, September
21-22, Proceedings, Part I. 10.1007/978-3-030-00776-8_57, 2018
[26] Hochreiter, Sepp & Schmidhuber, Jürgen, "Long Short-term Memory. Neural computation", 9.
1735-80. 10.1162/neco.1997.9.8.1735, 1997
[27] Rachita Byahatti , Dr. S. V. Viraktamath , Madhuri Yavagal, "Object Detection and
Classification using YOLOv3", INTERNATIONAL JOURNAL OF ENGINEERING
RESEARCH & TECHNOLOGY (IJERT) Volume 10, 2021
[28] Chang, Lena, Yi-Ting Chen, Jung-Hua Wang, and Yang-Lang Chang, "Modified Yolov3 for Ship
Detection with Visible and Infrared Images" Electronics 11, no. 5: 739.
https://doi.org/10.3390/electronics11050739, 2022
[29] R. Wang, C. Huang and X. Wang, "Global Relation Reasoning Graph Convolutional Networks
for Human Pose Estimation", IEEE Access, vol. 8, pp. 38472-38480, doi:
10.1109/ACCESS.2020.2973039, 2020
[30] Z. M. Chen, X. S. Wei, P. Wang and Y. Guo, "Multi-Label Image Recognition With Graph
Convolutional Networks", IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2019, pp. 5172-5181, doi: 10.1109/CVPR.2019.00532, 2019
[31] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time
Object Detection", IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2016, pp. 779-788, doi: 10.1109/CVPR.2016.91, 2016
[32] Redmon, Joseph & Farhadi, Ali. "YOLOv3: An Incremental Improvement", University of
Washington, 2018
[33] Q. C. Mao, H. M. Sun, Y. B. Liu and R. S. Jia, "Mini-YOLOv3: Real-Time Object Detector for
Embedded Applications", IEEE Access, vol. 7, pp. 133529-133538, doi:
10.1109/ACCESS.2019.2941547, 2019
[34] Bruna, Joan & Zaremba, Wojciech & Szlam, Arthur & Lecun, Yann, "Spectral Networks and
Locally Connected Networks on Graphs", 2013
[35] Elides on Graph Representation Learning by Jure Leskovec (Stanford):
https://drive.google.com/file/d/1By3udbOt10moIcSEgUQ0TR9twQX9Aq0G/view?usp=sharin
g
[36] Nepal, Upesh & Eslamiat, Hossein, "Comparing YOLOv3, YOLOv4 and YOLOv5 for
Autonomous Landing Spot Detection in Faulty UAVs", Sensors. 22. 10.3390/s22020464, 2022
[37] Y. Zhang, K. Hao, X. Tang, B. Wei and L. Ren, "Long-term 3D Convolutional Fusion Network
for Action Recognition" IEEE International Conference on Artificial Intelligence and Computer
Applications (ICAICA), pp. 216-220, doi: 10.1109/ICAICA.2019.8873471, 2019
指導教授 施 國琛 Aina Musdholifah Anny Kartika Sari(Shih Timothy K. Aina Musdholifah Anny Kartika Sari) 審核日期 2022-7-25
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明