以深度學習預測二胡裝飾音之研究： 原始錄製資料的應用分析;A Study on Predicting Erhu Ornamentation Using Deep Learning and Original Recording Audio Data

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98178

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98178

题名:	以深度學習預測二胡裝飾音之研究：原始錄製資料的應用分析;A Study on Predicting Erhu Ornamentation Using Deep Learning and Original Recording Audio Data
作者:	張祥洲;ZHANG, XIANG-ZHOU
贡献者:	資訊工程學系
关键词:	脈衝編碼調變(PCM);裝飾音;Pulse-code modulation(PCM);Erhu Ornamentation;TimesFM
日期:	2025-06-27
上传时间:	2025-10-17 12:27:30 (UTC+8)
出版者:	國立中央大學
摘要:	本研究旨在探討以深度學習模型 TimesFM 應用於二胡演奏中裝飾音之音訊預測能力。在當前音樂生成研究中，主要著重於旋律生成、伴奏生成、風格轉換、音樂結構建模與聲碼器技術等方向，較少針對如滑音、顫音、倚音等演奏層次的裝飾技法進行細緻建模。為此，本研究建置涵蓋五種常見二胡裝飾音的資料集，將錄製之音訊轉換為 PCM 資料後，透過不同取樣率壓縮為適當長度之時間序列，再切割成單音資料，以建立結構化的輸入輸出對。模型訓練以指定的基礎音作為輸入，學習對應的裝飾音音訊表現，進而實現裝飾音的預測與生成。本研究採用 TimesFM 模型進行訓練與推論，輸出以中位數分位點（quantile = 0.5）作為主要預測結果，並使用 Huber loss 作為損失函數。實驗結果顯示，模型所生成之音訊在音色連貫性與裝飾技法風格表現上具有可辨識性。本研究驗證了基於原始 PCM 音訊資料進行時間序列建模在傳統樂器裝飾音生成上的可行性，拓展了音樂生成技術在演奏細節建模方面的應用潛力。;This study investigates the audio prediction capabilities of the deep learning model TimesFM in the context of ornamentation techniques in Erhu performance. Current research in music generation primarily focuses on melody generation, accompaniment synthesis, style transfer, music structure modeling, and vocoder technologies, while detailed modeling of performance-level ornamentations—such as glissando, vibrato, and appoggiatura—has received comparatively less attention. To address this gap, this study constructed a dataset encompassing five common types of Erhu ornamentations. The recorded audio was converted into PCM data, then compressed into time series of suitable length via different sampling rates, and finally segmented into monophonic units to establish structured input-output pairs. The model was trained to predict and generate ornamented audio expressions given a specified base tone as input. The TimesFM model was employed for both training and inference, using the median quantile (quantile = 0.5) as the primary output and Huber loss as the loss function. Experimental results indicate that the generated audio exhibits recognizable timbral coherence and stylistic features of the ornamentations. This study demonstrates the feasibility of modeling ornamentations in traditional instruments using time series based on raw PCM audio data, and expands the application potential of music generation technologies in capturing performance-level details.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	34	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....