以二胡學習為例之動作識別與糾正的混合分類方法;A Hybrid Classification Method for Action Recognition and Correction – Learning Erhu as An Example

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/89793

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89793

题名:	以二胡學習為例之動作識別與糾正的混合分類方法;A Hybrid Classification Method for Action Recognition and Correction – Learning Erhu as An Example
作者:	佩馬納;Permana, Aditya
贡献者:	資訊工程學系
关键词:	: 動作識別, CNN, 3D-CNN, LSTM, YOLOv3, GCN;Action Recognition, CNN, 3D-CNN, LSTM, YOLOv3, GCN
日期:	2022-07-25
上传时间:	2022-10-04 11:59:54 (UTC+8)
出版者:	國立中央大學
摘要:	動作識別是深度學習方法的一種實現形式，目前應用於信息技術、體育和藝術等更廣泛的領域。二胡是起源於中國的一種弦樂器。在演奏這種樂器時，如何正確定位演奏者的身體並正確握住樂器是有規則的。我們需要一個系統來檢測每位二胡演奏者的動作來滿足這些需求。因此，本研究將討論使用 3D-CNN、YOLOv3 和 GCN 三種方法對視頻進行動作識別。 3D-CNN 方法是一種具有 CNN 基礎的方法。 CNN 是一種常用的圖像處理方法。 3DCNN 已被證明可以有效地從連續視頻幀中捕獲運動信息。為了提高捕獲每個動作中存儲的每個信息的能力，有必要在 3D-CNN 模型中結合 LSTM 層。 LSTM 是一種高級 RNN，一種順序網絡。它能夠處理 RNN面臨的梯度消失問題。本研究中使用的另一種具有圖像處理能力的方法是 YOLOv3。 YOLOv3 是一個準確度水平比較好的目標檢測器，可以實時檢測目標。然後為了最大化 YOLOv3 的性能，本研究將 YOLOv3 與 GCN 結合起來，這樣我們就可以使用身體關鍵點來幫助 YOLOv3 方法更容易進行分類。 GCN 通過合併圖上局部相鄰周圍節點的幾個特徵來執行空間卷積。本研究使用 RGB 視頻作為數據集，預處理和特徵提取有三個主要部分。三個主要部分是身體，二胡桿和弓。為了執行預處理和特徵提取，本研究提出了兩種方法。第一種方法利用 MaskRCNN 方法對輸入視頻進行分割處理。第二種方法使用身體標誌對身體片段進行預處理和特徵提取。相比之下，二胡和弓段使用 Hough Lines 演算法。然後將根據已定義的類別將三個主要部分劃分成幾個部分。此外，對於分類過程，本研究提出了兩種被使用的深度學習演算法。本研究將所有深度學習方法與傳統的圖像處理算法方法相結合。這些組合演算法過程將從二胡演奏者展示的每個動作中產生錯誤消息作為輸出。 ;Action recognition is one form of implementation of the deep learning method, which is currently used in a wider field related to information technology, sports, and the arts. Erhu is a stringed instrument originating from China. In playing this instrument, there are rules on how to position the player′s body and hold the instrument correctly. We need a system to detect every Erhu player′s movement to meet these needs. So that in this study will discuss action recognition on video using three methods such as 3D-CNN, YOLOv3, and GCN. The 3D-CNN method is a method that has a CNN base. CNN is a method commonly used to perform image processing. 3DCNN has been proved effective in capturing motion information from continuous video frames. To improve the ability to capture every information stored in every movement, combining an LSTM layer in the 3D-CNN model is necessary. LSTM is an advanced RNN, a sequential network. It is capable of handling the vanishing gradient problem faced by RNN. Another method used in this study that has the ability in image processing is YOLOv3. YOLOv3 is an object detector with a relatively good accuracy level and can detect objects in real-time. Then to maximize the performance of YOLOv3, this study will combine YOLOv3 with GCN so that we can use the body key points to help YOLOv3 methods be easier for classification. GCN performs spatial convolution by merging several features of nodes around local neighbors on the graph. This research uses RGB video as a dataset, and there are three main parts in preprocessing and feature extraction. The three main parts are the body, erhu pole, and bow. To perform preprocessing and feature extraction, this study proposes two approaches. The first approach uses a segmentation process on the input video by utilizing the MaskRCNN method. The second approach uses a body landmark to perform preprocessing and feature extraction on the body segment. In contrast, the erhu and bow segments use the Hough Lines algorithm. The three main sections will then be divided into several sections according to the class that has been defined. Furthermore, for the classification process, this study proposes two algorithms to be used, namely, deep learning. This study will combine all deep learning methods with traditional image processing algorithm methods. These combination algorithm processes will produce an error message output from every movement displayed by the erhu player.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	22	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....