音檔壓縮資訊之和弦特徵轉換效能分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：13.59.233.20

姓名

張戴明(Tai-Ming Chang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

音檔壓縮資訊之和弦特徵轉換效能分析
(Chord Transformation and Performance Analysis for Compressed Audio)

相關論文

★ 基於區域權重之衛星影像超解析技術	★ 延伸曝光曲線線性特性之調適性高動態範圍影像融合演算法
★ 實現於RISC架構之H.264視訊編碼複雜度控制	★ 基於卷積遞迴神經網路之構音異常評估技術
★ 具有元學習分類權重轉移網路生成遮罩於少樣本圖像分割技術	★ 具有注意力機制之隱式表示於影像重建三維人體模型
★ 使用對抗式圖形神經網路之物件偵測張榮	★ 基於弱監督式學習可變形模型之三維人臉重建
★ 以非監督式表徵分離學習之邊緣運算裝置低延遲樂曲中人聲轉換架構	★ 基於序列至序列模型之 FMCW雷達估計人體姿勢
★ 基於多層次注意力機制之單目相機語意場景補全技術	★ 基於時序卷積網路之單FMCW雷達應用於非接觸式即時生命特徵監控
★ 視訊隨選網路上的視訊訊務描述與管理	★ 基於線性預測編碼及音框基頻週期同步之高品質語音變換技術
★ 基於藉語音再取樣萃取共振峰變化之聲調調整技術	★ 即時細緻可調性視訊在無線區域網路下之傳輸效率最佳化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著音樂專輯製作爆炸性的成長，如何管理鉅量的音樂資料以及快速檢索音樂資訊成為一項重要議題。對於鉅量的音樂資料庫，直接從音樂壓縮檔中直接擷取重要的頻率參數來表示音樂特徵，此方式大大的有益於提升音樂檢索速度。此論文中，我們針對高級音訊編碼(AAC)音檔進行分析離散傅立葉轉換(FFT)與離散餘弦轉換(MDCT)之間的頻率轉換差異，並考量轉換後的頻率解析度來選取適當的頻率範圍，進而提出一套在AAC壓縮域中Chroma特徵轉換方法。直接使用AAC壓縮資訊進行Chroma轉換時，其長短窗框轉換機制會致使不同窗框有著不同的頻率解析度，忽略此窗框切換進行Chroma特徵轉換會嚴重的影響其特徵對映的準確性，因此，如何在對有長短窗框切換機制的AAC檔進行Chroma特徵轉換是為一項挑戰。對於有著較差頻率解析的短窗框，我們提出Peak competition方法合併8個接續的短窗框來增強音調的資訊。而在訊框切割方面，我們提出一簡單動態切割的方法取代複雜度高的節拍追蹤(Beat tracking)。再者，為了能夠處理不同取樣率的AAC音檔，我們提出動態頻率選擇機制來自動選擇不同取樣率以及不同窗框下的頻率範圍。實驗結果顯示，在Covers80資料庫中，我們提出的方法在Top-1音樂搜尋結果比先前壓縮域研究的文獻提升10%準確率，其音樂搜尋效能與現今在原始域的搜尋技術相去不遠，此外，我們所提出的動態頻率選擇方法對於不同取樣率下的AAC檔，其音樂檢索能力呈現穩定且具有相當的準確性。

摘要(英)

With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. In addition, a dynamic frequency selection method is proposed to deal with various sampling rate of AAC files. Our experimental results show that the proposed method increased the accuracy rate by approximately 10% in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did in Covers80 dataset. Furthermore, the proposed dynamic frequency method shows a stable performance for a comprehensive AAC retrieval system.

關鍵字(中)

★ 高級音訊編碼
★ 壓縮域
★ 離散餘弦轉換
★ Chroma特徵
★ 音樂檢索系統

關鍵字(英)

★ AAC
★ transform domain
★ chroma feature
★ MDCT
★ music information retrieval

論文目次

Abstract I
List of Figures VII
List of Tables IX
1. Introduction ………………………………………………………………………… 1
1.1 Motivation of the Research ……………………………………………………… 1
1.2 Contributions…………………………………………………………………… 3
1.3 Organization of the Dissertation ………………………………………………… 3
2. Content-based Music Information Retrieval ……………………………………… 4
2.1 Overview………………………………………………………………………… 4
2.2 Feature Representation ………………………………………………………… 5
2.3 Tonal and Temporal Alignment ………………………………………………… 5
2.4 Music Identification …………………………………………………………… 6
3. Modern Audio Compression Techniques ………………………………………… 8
3.1 Introduction ……………………………………………………………………… 8
3.2 Psychoacoustic Model ………………………………………………………… 9
3.3 MPEG-1 Layer III ……………………………………………………………… 13
3.4 Advanced Audio Coding ……………………………………………………… 14
4. Enhanced Direct Chord Transformation in the AAC Transform Domain …… 16
4.1 Introduction …………………………………………………………………… 16
4.2 Analysis of AAC Characteristics ……………………………………………… 16
4.2.1 Overview of AAC Decoding ………………………………………… 16
4.2.2 Frequency Resolution and Time Resolution …………………………… 18
4.2.3 Impact of MDCT and DFT ……………………………………………… 20
4.3 Feature Extraction ……………………………………………………………… 22
4.3.1 Long Window Frame Processing ……………………………………… 22
4.3.2 Short Window Frame Processing ……………………………………… 23
4.3.2.1 Subframe Combination ………………………………………… 24
4.3.2.2 Interpolation …………………………………………………… 27
4.4 Frequency of Notes: Analysis and Selection ………………………………… 28
4.5 Complexity Analysis …………………………………………………………… 29
4.6 Evaluation Functions ………………………………………………………… 29
4.7 Dynamic Segmentation ………………………………………………………… 30
4.8 Similarity Measurement ……………………………………………………… 31
4.8.1 Optimal Transposition Index and Similarity Matrix …………………… 31
4.8.2 Dynamic Programming Local Alignment ……………………………… 32
5. Experimental Evaluation ………………………………………………………… 35
5.1 Comparison of Waveform-Domain Music Retrieval Methods ………………… 35
5.1.1 Music Datasets and Assessment Methodology ………………………… 35
5.1.2 Frequency Range and Segmentation …………………………………… 36
5.1.3 Chroma Transformation Evaluation for SWF …………………………… 37
5.1.4 Results …………………………………………………………………… 39
5.2 Diverse Sampling Rate of AAC File for MIR ………………………………… 40
5.2.1 Music Dataset …………………………………………………………… 40
5.2.2 Adaptive Frequency Selection ………………………………………… 40
5.2.3 Results …………………………………………………………………… 42
6. Conclusion ………………………………………………………………………… 44
6.1 Summary ……………………………………………………………………… 44
6.2 Future Extension ……………………………………………………………… 45
Reference ………………………………………………………………………………… 46

參考文獻

[1] ISO/IEC 11172-3 (F) (1999) Information technology - Coding of moving picture and associated audio for digital storage media at up to about 1.5Mbits/s Part3: Audio.
[2] ISO/IEC 13818-7 (1997) Information technology - Generic coding of moving pictures and associated audio, Part7: Advance Audio Coding.
[3] R. B. Dannenberg, W. P. Birmingham, B. Pardo, N. Hu, C. Meek, and G. Tzanetakis, “A comparative evaluation of search techniques for query-by-humming using the musart testbed,” Journal of the American Society for Information Science and Technology, vol. 58, no. 5, pp. 687-701, 2007.
[4] J. Serrà, E. Gómez, and P. Herrera, Audio cover song identification and similarity: background, approaches, evaluation and beyond, in Advances in Music Information Retrieval, Germany Springer, 2010.
[5] T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” in Proc. Int. Comput. Music Conf., pp. 464-467, 1999.
[6] M. Müller and S. Ewert, “Towards timbre-invariant audio features for harmony-based music,” IEEE Transactions on Audio Speech and Signal Processing, vol. 18, no. 3, pp. 649-662, 2010.
[7] J. P. Bello and J. Pickens, “A robust mid-level representation for harmonic content in music signals,” in Proc. Int. Conf. Music Inf. Retrieval, pp. 304-311, 2005.
[8] D. Gusfield, Algorithms on strings, trees and sequences: computer sciences and computational biology, Cambridge University Press, 1997.
[9] L. R. Rabiner and B. H. Juang. Fundamental of speech recognition, Prentice, Englewood Cliffs, NJ, 1993.
[10] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics-Doklady, vol. 10, no. 8, pp. 707-710, 1966.
[11] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequences of two proteins,” Journal of Molecular Biology, vol. 48, no. 3, pp. 443-453, 1970.
[12] P. H. Sellers, “On the theory and computation of evolutionary distances,” SIAM Journal on Applied Mathematics, vol. 26, no. 4, pp. 787-793, 1974.
[13] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195-197, 1981.
[14] D. P. W. Ellis and G. E. Polliner, “Identifying cover songs with chroma features and dynamic programming beat tracking,” MIREX extended abstract, 2006.
[15] D. P. W. Ellis & G. E. Polliner, “Identifying cover songs with chroma features and dynamic programming beat tracking,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 1429-1432, April 2007.
[16] E. Gómez, Tonal description of music audio signals, Ph.D. dissertation, Music Technol. Group, Univ. Pompeu Fabra, Barcelona, Spain, 2006.
[17] E. Gómez and P. Herrera, “Estimating the tonality of polyphonic audio files: Cognitive versus machine learning modelling strategies,” in Proc. Int. Symp. Music. Inf. Retrieval (ISMIR), pp. 92-95, 2004,
[18] M. Riley, E. Heinen, and J. Ghosh, “A text retrieval approach to content-based audio retrieval,” In: Int. Symp. on Music Information Retrieval (ISMIR), pp. 295-300, Sep. 2008.
[19] C Todd, “A Digital Audio System for Broadcast and Prerecorded Media,” in Proc. 75th Conv. Aud. Eng. Soc., Mar. 1984.
[20] E. F. Schroder and W. Voessing, “High Quality Digital Audio Encoding With 3.0 Bits/Sample Using Adaptive Transform Coding,” in Proc. 80th Conv. Aud. Eng. Soc., Mar. 1986.
[21] G. Theile, M. Link, and G. Stoll, “Low-Bit Rate Coding of High Quality Audio Signals”, in Proc. 82nd Conv. Aud. Eng. Soc., Mar. 1987.
[22] K. Brandenburg, “OCF – A New Coding Algorithm for High Quality Sound Signals,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 12, pp. 141-144, Apr. 1987.
[23] J. Johnston, “Transform Coding of Audio Signals Using Perceptual Noise Criteria,” IEEE J. Sel. Areas in Comm., vol. 6, no. 2, pp. 314-23, Feb. 1988.
[24] W. Y. Chan and A. Gersho, “High Fidelity Audio Transform Coding With Vector Quantization,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1109-1112, Apr. 1990.
[25] K. Brandenburg and J. D. Johnston, “Second Generation Perceptual Audio Coding: The Hybrid Coder,” in Proc. 88th Conv. Aud. Eng. Soc., Mar. 1990.
[26] K. Brandenburg, et al, “Aspec-Adaptive Spectral Entropy Coding of High Quality Music Signals,” in Proc. 90th Conv. Aud. Eng. Soc., Feb. 1991.
[27] Y. F. Dehery, M. Lever, and P. Urcum, “A MUSICAM Source Codec for Digital Audio Broadcasting and Storage,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 3605-3608, Apr. 1991.
[28] M. Iwadare, et al., “A 128 kb/s Hi-Fi Audio Codec Based on Adaptive Transform Coding with Adaptive Block Size MDCT”, IEEE J. Sel. Areas in Comm., vol. 10, no. 1, pp. 138-144, Jan. 1992.
[29] T. Painter and A. Spanias, "Perceptual coding of digital audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451-513, Apr. 2000.
[30] Steve Vernon, “Design and implementation of AC-3 Coders,” IEEE Transactions on Consumer Electronics, vol. 41, no. 3, pp. 754-759, Aug. 1996.
[31] H. Sakamoto, Y. Shibuya, H. Takano, and O. Kitabatake, “A Dolby AC-3/MPEG1 Audio Decoder Core suitable for Audio/Visual System Integration,” IEEE Custom Integrated Circuits Conference, pp. 241-248, Nov. 1997.
[32] D. Pan, “A Tutorial on MPEG/Audio Compression,” IEEE Multimedia, vol. 2, no.2, pp. 60-71, 1995.
[33] E. Zwicker and H. Fastl, Psychoacoustics - Facts and Models, Springer Berlin, Heidelberg, 1990.
[34] J. D. Johnston and A. J. Ferreira, “Sum-difference stereo transform coding,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 569-572, San Francisco, USA, March 1992.
[35] K. H. Huang and J. F. Yang, Low Data Rate MPEG-1 Layer III Audio Coder Enhancement, Thesis for Master of Science, Department of Electrical Engineering National Cheng Kung University, 2002.
[36] N. V. Patel and I. K. Sethi, “Audio characterization for video indexing,” In Proc. SPIE, vol. 2670, pp. 373-384, 1996.
[37] Y. Nakajima, Y. Lu, M. Sugano, A. Yoneyama, H. Yamagihara, and A. Kurematsu, “A fast audio classification from MPEG coded data,” In proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. 3005-3008, 1999.
[38] X. Shao, C. Xu, Y. Wang, and M. Kankanhalli,“Automatic music summarization in compressed domain,” In Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc. (ICASSP), vol. 4, pp. 261-264, 2004.
[39] T. M. Chang, E. T. Chen, C. B. Hsieh, and P. C. Chang, “Cover song identification with direct chroma feature extraction from AAC files,” IEEE 2nd Global Conference on Consumer Electronics, pp. 55-56, 2013.
[40] E. Ravelli, G. Richard, and L. Daudet, “Audio signal representations for indexing in the transform domain,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 434-446, 2010.
[41] C. H. Yu and S. D. You, “On the possibility of only using long windows in MPEG-2 AAC coding,” IEEE Pacific Rim Conference on Multimedia, pp. 663-670, 2002.
[42] T. H. Tsai and C. Liu, “A configurable common filterbank processor for multi-standard audio decoder,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 90 no.9, pp.1913-1923, 2007.
[43] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol. 19, pp.297-301, 1965.
[44] S. Chen, N. Xiong, J. Park, M. Chen, and R. Hu, “Spatial parameters for audio coding: MDCT domain analysis and synthesis,” Multimedia Tools Applications, vol. 48, no. 2, pp. 225-246, 2010.
[45] H. Malvar, Signal processing with lapped transforms. Artech House, Inc., 1992.
[46] J. Fan and Q. Yao, Nonlinear time series: nonparametric and parametric methods, Springer, 2005.
[47] G. Hinsen and D. Klösters, “The sampling series as a limiting case of Lagrange interpolation,” Applicable Analysis, vol. 49, no. 1-2, pp. 49-60, 1993.
[48] Programs for Digital Signal Processing, IEEE Press, 1979.
[49] G. Oetken, T. W. Parks, and H. W. Schussler, “New results in the design of digital interpolators,” IEEE Trans. Acoust. Speech, Signal Processing, vol. 23, no. 3, pp. 301-309, 1975.
[50] J. Serra, G. Emilia, and H. Perfecto, Advances in music information retrieval, Springer-Verlag, Berlin Heidelberg, 2010.
[51] J. Serra, E. Gomez, P. Herrera, and X. Serra, “Chroma binary similarity and local alignment applied to cover song identification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 6, pp. 1138-1151, 2008.
[52] S. Ravuri and D. P. W. Ellis, “The hydra system of unstructured cover song detection,” Ext. Abstract for the MIREX Audio Cover Song Identification task submission, Kobe, Japan, 2009.
[53] T. Bertin-Mahieux and D.P.W. Ellis, “Large-scale cover song recognition using hashed chroma landmarks,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 117-120, 2011.
[54] T. Bertin-Mahieux, D. P. W. Ellis, and B. Whitman, P. Lamere, “The million song dataset,” In Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011.
[55] S. Chakrabarti , R. Khanna , U. Sawant , and C. Bhattacharyya, “Structured learning for non-smooth ranking losses,” Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 88-96, 2008.
[56] M. H. Lee, S. Rho, and E. I. Choi, “Ontology based user query interpretation for semantic multimedia contents retrieval,” Multimedia Tools and Applications, doi:10.1007/s11042-013-1383-2, 2013.

指導教授

張寶基(Pao-Chi Chang)

審核日期

2014-7-22

推文