具有基於神經網路的音訊瞬態/穩態分離和多頻 段壓縮器之虛擬低音系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：45

、訪客IP：3.138.118.215

姓名

周軒宇(Xuan-Yu Chou) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

具有基於神經網路的音訊瞬態/穩態分離和多頻段壓縮器之虛擬低音系統
(Virtual bass system with neural network based transient/stationary audio separation and multiband compressor)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-3-1以後開放)

摘要(中)

移動多媒體電子設備的尺寸和厚度減小，嚴格限制了可配置揚聲器的尺
寸。因此，很容易犧牲播放的音質，尤其是低音。在本文中，我們提出了一
種基於神經網路的虛擬低音增強系統來解決這項問題。此外，虛擬低音增強
任務中產生的額外諧波可能會導致算術溢出而發生削波失真。因此，我們在
系統末端添加了一個多頻段壓縮器，以減少由於虛擬低音增強而導致的削
波失真。虛擬低音增強可以分為兩種主要的方法，一種是非線性元件（Nonlinear Device, NLD），另一種則是相位聲碼器（Phase Vocoder, PV）。NLD 通
過非線性元件（如乘法迴圈）直接在時域中產生諧波來實現虛擬低音; 而 PV
首先將訊號轉換至頻域，並使用頻譜偏移產生更高次的諧波。相較之下，由
於其設計特性，NLD 更適合使用於鼓和打擊樂等瞬態訊號(transient signal)，
而 PV 更適合人聲等穩態訊號(stationary signal)。因此，我們首先使用神經
網路將輸入音訊訊號分離成瞬態和穩態分量，並分別對它們應用虛擬低音
增強的方法，我們使用這些技術提出了一個完整的虛擬低音增強系統。最後，
通過與其他虛擬低音系統相比的主觀聽覺測試，可以驗證我們的虛擬低音
系統具有更高的低音感知和更低的失真。

摘要(英)

The reduced size and thickness of mobile multimedia electronics strictly limit
the size of the configurable loudspeakers. As a result, it is easy to sacrifice the
sound quality of playback, especially the bass. In this paper, we propose a neural
network-based virtual bass system to solve this problem. In addition, the
additional harmonics generated in the virtual bass enhancement may lead to
arithmetic overflow and distortion due to clipping. Therefore, we add a multiband
compressor at the end of the system to reduce clipping due to virtual bass
enhancement. Virtual bass enhancement can be divided into two main approaches,
one is the non-linear device (NLD), and the other is the phase vocoder (PV). NLD
achieves virtual bass by generating harmonics directly in the time domain through
a non-linear device such as a multiplication loop. The PV first converts the signal
to the frequency domain and uses spectrum shifting to generate higher harmonics.
In contrast, due to their design characteristics, NLDs are more suitable for
transient signals such as drums and percussion, while PVs are more suitable for
stationary signals such as vocals. Therefore, we first use a neural network to split
the input audio signal into transient and stationary components and apply the
virtual bass enhancement to them separately. We use these techniques to propose
a complete virtual bass enhancement system. Finally, using subjective listening
tests compared with other virtual bass systems, we can see that our virtual bass
system has higher bass perception and lower distortion.

關鍵字(中)

★ 虛擬低音系統
★ 音訊瞬態/穩態分離
★ 深度神經網路
★ 非線性元件
★ 相位聲碼器
★ 多頻段壓縮器

關鍵字(英)

★ Virtual bass system
★ Stationary-transient source separation
★ Deep neural networks
★ Nonlinear device
★ Phase vocoder
★ Multiband compressor

論文目次

目錄
摘要 I
ABSTRACT II
1. 序論 1
1.1研究背景與動機 1
1.2論文架構 5
2. 文獻探討 6
2.1非線性元件(NON-LINEAR DEVICE, NLD) 6
2.2相位聲碼器(PHASE VOCODER, PV) 8
2.3混合式虛擬低音系統 10
2.4多頻段壓縮器(MULTIBAND COMPRESSOR, MBC) 15
3. 混合式虛擬低音系統架構設計 17
3.1基於神經網路之瞬態(TRANSIENT)/穩態(STATIONARY)音訊分離 18
3.2用於增強瞬態訊號之非線性元件(NON-LINEAR DEVICE, NLD)方法 21
3.3用於增強穩態訊號之相位聲碼器(PHASE VOCODER, PV)方法 23
3.4多頻段壓縮器(MULTIBAND COMPRESSOR, MBC) 26
4. 實驗結果與討論 30
4.1主觀聽力測試(SUBJECTIVE LISTENING TEST) 31
5. 結論 34
參考文獻 35

參考文獻

[1] Schouten, Jan F., R. J. Ritsma, and B. Lopes Cardozo. "Pitch of the residue." The Journal of the Acoustical Society of America 34.9B (1962): 1418-1424. https://doi.org/10.1121/1.1918360
[2] Terhardt, Ernst. "Calculating virtual pitch." Hearing research 1.2 (1979): 155-182.
[3] Fastl, Hugo, and Eberhard Zwicker. Psychoacoustics: facts and models. Vol. 22. Springer Science & Business Media, 2006.
[4] Moore, Brian CJ. An introduction to the psychology of hearing. Brill, 2012.
[5] Gan, Woon-Seng, and Nay Oo. "Analytical and perceptual evaluation of nonlinear devices for virtual bass system." Audio Engineering Society Convention 128. Audio Engineering Society, 2010.
[6] Bai, Mingsian R., and Wan-Chi Lin. "Synthesis and implementation of virtual bass system with a phase-vocoder approach." Journal of the Audio Engineering Society 54.11 (2006): 1077-1091.
[7] Hill, Adam J., and Malcolm OJ Hawksford. "A hybrid virtual bass system for optimized steady-state and transient performance." 2010 2nd Computer Science and Electronic Engineering Conference (CEEC). IEEE, 2010.
[8] Oda, Mikio. "Low frequency audio conversion circuit." U.S. Patent No. 5,668,885. 16 Sep. 1997.
[9] T. Unemura, “Audio circuit,” US Patent 5,771,296, June 23, 1998.
[10] E. E. Feremans and F. De Smet, “Method and device for processing signals,” US Patent 5,828,755, Oct. 27 1998.
[11] Ben-Tzur, Daniel, and Martin Colloms. "The effect of MaxxBass psychoacoustic bass enhancement on loudspeaker design." Audio Engineering Society Convention 106. Audio Engineering Society, 1999.
[12] Shashoua, Meir, and Daniel Glotter. "Method and system for enhancing quality of sound signal." U.S. Patent No. 5,930,373. 27 Jul. 1999.
[13] Gan, Woon S., Sen M. Kuo, and Chee W. Toh. "Virtual bass for home entertainment, multimedia PC, game station and portable audio systems." IEEE Transactions on Consumer Electronics 47.4 (2001): 787-796.
[14] Larsen, Erik, and Ronald M. Aarts. "Reproducing low-pitched signals through small loudspeakers." Journal of the Audio Engineering Society 50.3 (2002): 147-164.
[15] Larsen, Erik, and Ronald M. Aarts. Audio bandwidth extension: application of psychoacoustics, signal processing and loudspeaker design. John Wiley & Sons, 2005.
[16] Oo, Nay, and Woon-Seng Gan. "Harmonic analysis of nonlinear devices for virtual bass system." 2008 International Conference on Audio, Language and Image Processing. IEEE, 2008.
[17] Gan, Woon-Seng, and Nay Oo. "Harmonic and intermodulation analysis of nonlinear devices used in virtual bass systems." Audio Engineering Society Convention 124. Audio Engineering Society, 2008.
[18] Lim, Wee-Tong, Nay Oo, and Woon-Seng Gan. "Synthesis of polynomial-based nonlinear device and harmonic shifting technique for virtual bass system." 2009 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2009.
[19] Schaefer, Richard A. "Electronic musical tone production by nonlinear waveshaping." Journal of the Audio Engineering Society 18.4 (1970): 413-417.
[20] Mu, Hao, Woon-Seng Gan, and Ee-Leng Tan. "A timbre matching approach to enhance audio quality of psychoacoustic bass enhancement system." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
[21] Moon, Hyungi, et al. "A Phase-Matched Exponential Harmonic Weighting for Improved Sensation of Virtual Bass." Audio Engineering Society Convention 140. Audio Engineering Society, 2016.
[22] Mu, Hao, Woon-Seng Gan, and Ee-Leng Tan. "A psychoacoustic bass enhancement system with improved transient and steady-state performance." 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012.
[23] Mu, Hao, and Woon-Seng Gan. "Perceptual quality improvement and assessment for virtual bass systems." Journal of the Audio Engineering Society 63.11 (2015): 900-913.
[24] D. Fitzgerald, “Harmonic/percussive separation using median filtering,” Proc. Int. Conf. Digital Audio Effects (DAFx- 10), Sept. 2010.
[25] Ono, Nobutaka, et al. "Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram." 2008 16th European Signal Processing Conference. IEEE, 2008.
[26] Moliner, Eloi, Jussi Rämö, and Vesa Välimäki. "Virtual bass system with fuzzy separation of tones and transients." Proc. Digital Audio Effects (DAFx), Vienna, Austria (2020).
[27] Damskägg, Eero-Pekka, and Vesa Välimäki. "Audio time stretching using fuzzy classification of spectral bins." Applied Sciences 7.12 (2017): 1293.
[28] Mu, Hao, and Woon-Seng Gan. "A virtual bass system with improved overflow control." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.
[29] Cuthbertson, David. "Lab Report 2-Multiband compressor for mastering." (2013).
[30] Zölzer, Udo. Digital audio signal processing, 2nd ed. John Wiley & Sons, 2008.
[31] KORYCKI, Rafał. "Implementation of dynamic range controller on digital signal processor." Archives of Acoustics 33.1 (2008): 87-91.
[32] Giannoulis, Dimitrios, Michael Massberg, and Joshua D. Reiss. "Digital dynamic range compressor design—A tutorial and analysis." Journal of the Audio Engineering Society 60.6 (2012): 399-408.
[33] Roma, Gerard, Owen Green, and Pierre Alexandre Tremblay. "Stationary/transient audio separation using convolutional autoencoders." 21st International Conference on Digital Audio Effects. 2018.
[34] Hershey, Shawn, et al. "CNN architectures for large-scale audio classification." 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2017.
[35] Oo, Nay, Woon-Seng Gan, and Malcolm OJ Hawksford. "Perceptually-motivated objective grading of nonlinear processing in virtual-bass systems." Journal of the Audio Engineering Society 59.11 (2011): 804-824.
[36] Oo, Nay, Woon-Seng Gan, and Wee-Tong Lim. "Generalized harmonic analysis of Arc-Tangent Square Root (ATSR) nonlinear device for virtual bass system." 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010.
[37] Järveläinen, Hanna, Vesa Välimäki, and Matti Karjalainen. "Audibility of the timbral effects of inharmonicity in stringed instrument tones." Acoustics Research Letters Online 2.3 (2001): 79-84.
[38] Linkwitz, Siegfried H. "Passive crossover networks for noncoincident drivers." Journal of the Audio Engineering Society 26.3 (1978): 149-150.
[39] Brickhill, Thomas. "Issues with Multiband Compression Design." (2013).
[40] Schoeffler, Michael, et al. "webMUSHRA—A comprehensive framework for web-based listening tests." Journal of Open Research Software 6.1 (2018).
[41] Series, B. "Method for the subjective assessment of intermediate quality level of audio systems." International Telecommunication Union Radiocommunication Assembly (2014).

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2023-4-21

推文