基於多軸信號機器學習之虛擬打擊樂器設計

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：3

、訪客IP：3.15.149.132

姓名

林廷亮(Ting-Liang Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於多軸信號機器學習之虛擬打擊樂器設計
(The design of virtual percussion instrument based on multi-axis signals using machine learning)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在這個時代，科學技術的進步比人們想像的要快得多，人與機器
之間的交互不限於鍵盤鼠標和屏幕之間的交互。過去，我們認為只有
電影才能看到的虛擬介面和或是聲光線效果並不用再依賴演員的想
像力，相反地，現今的科技可以通過虛擬現實的技術為用戶提供身臨
其境的體驗。諸如虛擬實境頭盔或是手套等不同的攝像頭和傳感器為
用戶帶來了全新的體驗。科學技術的進步來自人們的需求，在當今社
會，娛樂已成為生活中不可缺少的一部分。
Kinect 是由微軟開發的深度相機，可以通過捕捉深度信息來追?人
體的骨骼。通過識別人體關節的位置，用戶可以利用肢體的動作與機
器進行交互，而我們利用並通過Kinect 辨識深度資訊的功能實現了
虛擬打擊樂器的設計。
由於硬件限制， Kinect 相機的分類器無法確定蒙面的肢體或微妙而
快速的移動。例如，在演奏打擊樂器的過程中，我們經常使用手指的
細微動作來改變敲擊的位置，然而攝影機卻無法捕捉如此細微的動作，
因此，我們希望通過在木槌上安裝六軸慣性傳感器以獲取加速度以及
角加速度的資料來解決這些問題。
近年來機器學習的迅速發展使其無法忽視它，機器在不同的領域裡都
取得了相當的成就。在本文中，我們嘗試通過機器學習方法來處理複
雜手勢識別的任務。
我們將藉由設計虛擬樂器來收集手部運動的三軸加速度和角加速度,
並以這些數據將做為手勢辨識的依據。當用戶執行虛擬樂器演奏時，
用戶可以通過在空中寫入來輸入手勢，並通過識別結果執行命令以改
變音調或音調。我們認為，多軸信號與機器學習的結合除了可以彌補
攝影機天生的缺陷之外,也擴大了人們對於人機交互的想像。

摘要(英)

In this era, the progress of science and technology is much faster than people think. The interaction between humans and machines is not limited to the interaction between the keyboard mouse and the screen.
In the past, we believe virtual interfaces and acoustic lighting effects that can be only seen in the movie will no longer depend on the imagination of actors.
Instead, technologies nowadays can provide users with immersive experiences through virtual reality technology.
Different cameras and sensors like virtual reality headset or glove bring users a whole new experience.

The advancement of science and technology comes from people′s needs.
In today′s society, entertainment has become an indispensable part of life.
Kinect is a depth camera developed by Microsoft that can track the body′s skeleton by capturing depth information.
By recognizing the position of human joints, users can use the movement of the limbs to interact with the machine, and we have realized the design of virtual percussion instruments through these depth information.
Due to hardware limitations, the Kinect camera′s classifier cannot determine the masked limbs or the subtle and fast movements.
For example, in the process of playing percussion instrument, we often use the subtle movements of our fingers that can not be captured by camera to change the position of tapping .
Therefore, we hope to solve these problems by installing six-axis inertial sensors on mallet to obtain acceleration and angular acceleration data

The rapid development of machine learning in recent years has made it impossible to ignore, and it has made considerable achievements in different fields.

In our thesis, we will collect triaxial accelerations and angular accelerations of hand movements by designing virtual instruments and use these signals as the basis dataset for gesture recognition.

When the user play a virtual musical instrument, the user can input a gesture by writing in the air, and execute a command such as changing the pitch or tone according to the recognition result.
In our opinion, the combination of multi-axis signal and machine learning not only compensates for the inborn defects of the camera, but also expands people′s imagination for human-computer interaction.

關鍵字(中)

★ 機器學習
★ 多軸信號
★ 虛擬樂器
★ 手勢辨識

關鍵字(英)

★ Machine Learning
★ Multi-axis signal
★ Virtual instrument
★ Gesture recognition
★ Kinect

論文目次

1 Introduction 1
1.1 Background . ................................. 1
1.2 Motivation . .................................. 3
1.3 Thesis Organization . ............................. 4
2 Related Work 5
2.1 Applications by cameras and sensors . ................... 5
2.2 Tracking and recognizing technologies . .................. 9
2.3 Applications by motion sensors . ...................... 12
2.4 Recognition by Neural Network . ..................... 14
2.5 Artificial Neural Network . ......................... 15
2.5.1 Recurrent neural network and LSTM network . ......... 16
3 Proposed Method 19
3.1 Environment . ................................. 19
3.2 Trigger point detection . ........................... 21
3.2.1 Accelerometer Signals Processing . ................ 21
3.2.2 Support vector machine . ...................... 23
3.3 Background removal and ROI detecting . ................. 25
3.4 Gesture recognition with LSTM . ...................... 30
3.4.1 The proposed framework architecture . .............. 30
4 Application 35
4.1 Virtual Xylophone implement . ....................... 36
4.1.1 User Interface . ............................ 36
4.1.2 Hand Positioning . .......................... 38
4.1.3 Signal preprocessing . ........................ 38
4.1.4 Note triggering . ........................... 40
4.1.5 MIDI message . ............................ 41
4.1.6 VST and Sound Generator . ..................... 41
4.2 Multi-complexity motion gestures recognition . ............. 44
4.2.1 Data collection and Training . ................... 44
5 Experiment Result 47
5.1 Experiment environment . .......................... 47
5.2 Trigger point detection using SVM . .................... 47
5.2.1 Classification performance of SVM . ................ 48
5.3 Recognition of motion gestures using LSTM Models . .......... 50
5.4 Evaluation of instrument performance . .................. 51
6 Conclusion 57
6.1 Performance of virtual instrument . .................... 59
6.2 Future Work . ................................. 59
6.2.1 Different kinds of feature . ..................... 60
6.2.2 Attempts of different Neural Network models . ......... 60
References 63

參考文獻

[1] L. Vera, J. Gimeno, I. Coma, and M. Fernandez, Augmented mirror: interactive augmented reality system based on kinect, in IFIP Conference on Human-Computer Interaction, pp. 483–486, Springer, 2011.
[2] A. Rosa-Pujazon, I. Barbancho, L. J. Tardon, and A. M. Barbancho, Multimedia Tools and Applications 75, 8137 (2016).
[3] E. A. S. Guarnizo and L. M. R. Rios, Portable percussion midi controller, in Signal Processing, Images and Computer Vision (STSIV A), 2015 20th Symposium on, pp. 1–7, IEEE, 2015.
[4] P. Chochai, T. Mekrungroj, and T. Matsumaru, Real-time gesture recognition with finger naming by rgb camera and ir depth sensor, in Robotics and Biomimetics (ROBIO), 2014 IEEE International Conference on, pp. 931–936, IEEE, 2014.
[5] F. Jiang, S. Zhang, S. Wu, Y. Gao, and D. Zhao, The Journal of Machine Learning Research 16, 227 (2015).
[6] M. A. Almasre and H. Al-Nuaim, Recognizing arabic sign language gestures using depth sensors and a ksvm classifier, in Computer Science and Electronic Engineering (CEEC), 2016 8th, pp. 146–151, IEEE, 2016. [7] R. Schramm, C. R. Jung, and E. R. Miranda, IEEE Transactions on Multimedia 17, 243 (2015).
[8] F. Hong, S. You, M. Wei, Y. Zhang, and Z. Guo, Sensors 16, 530 (2016).
[9] S. Xu and Y. Xue, A long term memory recognition framework on multi-complexity motion gestures, in Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on Vol. 1, pp. 201–205, IEEE, 2017.
[10] C. R. Naguri and R. C. Bunescu. 63
[11]M.K.Halpernetal.,Moboogie:creativeexpressionthroughwholebodymusical interaction,inProceedingsoftheSIGCHIConferenceonHumanFactorsinComputing Systems, pp. 557–560, ACM, 2011.
[12]I. Barbancho, A. Rosa-Pujazon, L. J. Tardon, and A. M. Barbancho, Human– computer interaction and music, inSound-Perception-Performance, pp. 367–389, Springer, 2013.
[13] C. Levinson. [14]Y.Li,J.Cheng,W.Feng,andD.Tao,Featurefusionoftriaxialaccelerationsignals and depth maps for human action recognition, inInformation and Automation (ICIA), 2016 IEEE International Conference on, pp. 1255–1260, IEEE, 2016.
[15]Y. Zhou, L. Jing, J. Wang, and Z. Cheng, Separator design of gesture signals based on adaptive threshold using wearable sensors, inAdvanced Information NetworkingandApplicationsWorkshops(WAINA),201226thInternationalConference on, pp. 7–12, IEEE, 2012.
[16]J. Liu, L. Zhong, J. Wickramasuriya, and V. Vasudevan, Pervasive and Mobile Computing5, 657 (2009). [17]M.Chen,G.AlRegib,andB.-H.Juang,IEEETransactionsonMultimedia15,561 (2013).
[18]Y. Lee and M. Song, Recognizing problem behaviors of children with develop-mental disabilities using smartwatch, inComputer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on, pp. 1001–1002, IEEE, 2016. [19]G.Cicirelli,C.Attolico,C.Guaragnella,andT.D’Orazio,InternationalJournalof Advanced Robotic Systems12, 22 (2015).

指導教授

施國琛(Timothy K. Shih)

審核日期

2018-7-20

推文