利用深度學習模型融合多重感測器之小提琴弓法動作辨識;Violin Bowing Action Recognition based on Multiple Modalities by Deep Learning-Based Sensing Fusion

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/83813

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/83813

題名:	利用深度學習模型融合多重感測器之小提琴弓法動作辨識;Violin Bowing Action Recognition based on Multiple Modalities by Deep Learning-Based Sensing Fusion
作者:	劉寶云;Liu, Bao-Yun
貢獻者:	通訊工程學系
關鍵詞:	動作辨識;Kinect;深度攝影機;慣性感測器;深度學習;多重裝置融合;Action recognition;Kinect;Depth camera;Inertial sensor;deep learning;Multiple modalities;LSTM;CNN;violin
日期:	2020-07-23
上傳時間:	2020-09-02 17:09:30 (UTC+8)
出版者:	國立中央大學
摘要:	隨著人工智慧的興起，利用深度學習做人類動作辨識也變成現今很重要的研究議題之一，像是在電腦視覺與圖形辨識領域中，動作辨識就是其熱門的研究項目。本篇所提出的論文是針對小提琴中弓法的動作辨識，是因為多媒體藝術表演中往往需要許多人力及時間，重複測試及彩排才能將環境的聲光效果與表演者完美配合，因此若能利用動作辨識使機器能夠在表演中辨識表演者所做的動作，之後就能夠利用該系統做後續觸發聲光效果等應用。我們提出利用多重裝置做動作辨識，裝置包括Kinect攝影機及Myo armband慣性感測器，來獲取深度影像及慣性資料，並個別經過前處理及資料擴增後，分別進入三維卷積架構以及長短期記憶架構中進行特徵訓練，最後透過決策融合的方法，將不同模型訓練後的特徵做融合，並輸出成最終的分類結果。不同裝置錄製的資料都有其優缺點，因此使用適當的多重裝置可以彌補單一裝置資料上的不足。這套系統應用在我們自己所拍攝的Vap多重裝置之小提琴動作資料庫上，可以達到不錯的辨識正確率。 ;With the rise of Artificial Intelligence, the use of deep learning for human action recognition (HAR) has become one of the most important research topics today. For example, in the field of computer vision and graphics recognition, action recognition is its popular research project. The paper presented in this article is aimed at the action recognition of the bowing in the violin because multimedia art performances often require a lot of manpower and time. Repeated tests and rehearsals can perfectly match the sound and light effects of the environment with the performers, so if they can be used action recognition enables the machine to recognize the actions performed by the performer during the performance, and then can use the system for subsequent triggering of sound and light effects and other applications. We propose to use multiple devices for action recognition. The devices include Kinect and Myo armband Inertial Measurement Unit (IMU). After preprocessing and data augmentation, the image data will be sent to the 3D convolution in deep learning for training. The inertial data will be sent to the long short-term memory (LSTM) network in deep learning for training. After training, we use the decision fusion to fuse the features of different devices, and output the final classification results.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	90	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....