以深度學習預測二胡裝飾音之研究： 原始錄製資料的應用分析;A Study on Predicting Erhu Ornamentation Using Deep Learning and Original Recording Audio Data

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98178

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98178

Title:	以深度學習預測二胡裝飾音之研究：原始錄製資料的應用分析;A Study on Predicting Erhu Ornamentation Using Deep Learning and Original Recording Audio Data
Authors:	張祥洲;ZHANG, XIANG-ZHOU
Contributors:	資訊工程學系
Keywords:	脈衝編碼調變(PCM);裝飾音;Pulse-code modulation(PCM);Erhu Ornamentation;TimesFM
Date:	2025-06-27
Issue Date:	2025-10-17 12:27:30 (UTC+8)
Publisher:	國立中央大學
Abstract:	本研究旨在探討以深度學習模型 TimesFM 應用於二胡演奏中裝飾音之音訊預測能力。在當前音樂生成研究中，主要著重於旋律生成、伴奏生成、風格轉換、音樂結構建模與聲碼器技術等方向，較少針對如滑音、顫音、倚音等演奏層次的裝飾技法進行細緻建模。為此，本研究建置涵蓋五種常見二胡裝飾音的資料集，將錄製之音訊轉換為 PCM 資料後，透過不同取樣率壓縮為適當長度之時間序列，再切割成單音資料，以建立結構化的輸入輸出對。模型訓練以指定的基礎音作為輸入，學習對應的裝飾音音訊表現，進而實現裝飾音的預測與生成。本研究採用 TimesFM 模型進行訓練與推論，輸出以中位數分位點（quantile = 0.5）作為主要預測結果，並使用 Huber loss 作為損失函數。實驗結果顯示，模型所生成之音訊在音色連貫性與裝飾技法風格表現上具有可辨識性。本研究驗證了基於原始 PCM 音訊資料進行時間序列建模在傳統樂器裝飾音生成上的可行性，拓展了音樂生成技術在演奏細節建模方面的應用潛力。;This study investigates the audio prediction capabilities of the deep learning model TimesFM in the context of ornamentation techniques in Erhu performance. Current research in music generation primarily focuses on melody generation, accompaniment synthesis, style transfer, music structure modeling, and vocoder technologies, while detailed modeling of performance-level ornamentations—such as glissando, vibrato, and appoggiatura—has received comparatively less attention. To address this gap, this study constructed a dataset encompassing five common types of Erhu ornamentations. The recorded audio was converted into PCM data, then compressed into time series of suitable length via different sampling rates, and finally segmented into monophonic units to establish structured input-output pairs. The model was trained to predict and generate ornamented audio expressions given a specified base tone as input. The TimesFM model was employed for both training and inference, using the median quantile (quantile = 0.5) as the primary output and Huber loss as the loss function. Experimental results indicate that the generated audio exhibits recognizable timbral coherence and stylistic features of the ornamentations. This study demonstrates the feasibility of modeling ornamentations in traditional instruments using time series based on raw PCM audio data, and expands the application potential of music generation technologies in capturing performance-level details.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	7	View/Open

社群 sharing

Loading...