於對話中定位特定發音之研究 – 以滿意為例;Locating Satisfaction in Vocal Dialogue

NCUIR > School of Management at National Central University > Graduate Institute of Business Administration > Electronic Thesis & Dissertation > Item 987654321/91926

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/91926

Title:	於對話中定位特定發音之研究 – 以滿意為例;Locating Satisfaction in Vocal Dialogue
Authors:	蘇筱凌;Su, Hsiao-Ling
Contributors:	企業管理學系
Keywords:	關鍵字搜尋;顧客滿意度;梅爾倒頻譜係數;交叉注意力機制;語音辨識;Keyword search;Customer satisfaction;Mel-frequency cepstral coefficients;Cross-attention mechanism;Speech recognition
Date:	2023-07-13
Issue Date:	2023-10-04 14:49:59 (UTC+8)
Publisher:	國立中央大學
Abstract:	在企業中，了解顧客對產品或服務的滿意度對於提高顧客的再購率和推薦意願至關重要。因此，建立一種有效率的語音辨識方法，能夠準確分析客服語音，成為一個迫切的需求。然而，在長語音訊號中定位出顧客滿意情緒的聲音位置是一項具有挑戰性的任務。本研究旨在將關鍵字搜索與交叉注意力的技術相結合，以有效定位出特定聲音位置。研究中採用了包含不同說話者聲音的特定發音資料集以及業界電話訪談聲音資料集，透過對這些聲音資料進行分析和交叉匹配，目標是找到長語音訊號中正向或負向滿意情緒的聲音位置。在研究過程中，首先對這些資料進行資料前處理和聲音特徵萃取，接著，運用交叉注意力模型，將處理後的資料輸入其中，透過計算兩不同特徵向量之間的注意力分數，定位出具有最高注意力分數的滿意聲音位置。實驗結果顯示，濾波器組數量和位移步伐參數是影響命中率的重要因素，根據研究結果顯示，在不同的參數設置下，最佳參數為濾波器組數量30且位移步伐10的設置表現最佳，評估指標HR@5達到95.08％，HR@3達到84.15％，HR@1達到60.11%。 ;In the business, understanding customer satisfaction with products or services is crucial for improving customer repurchase rates and willingness to recommend. Therefore, establishing an efficient method of speech recognition that can accurately analyze customer service voice becomes an urgent requirement. However, locating the dialogues of customer satisfaction emotions within long speech signals is a challenging task. This research aims to combine keyword search with cross-attention techniques to effectively locate satisfaction vocal dialogue. The research utilizes specific pronunciation datasets containing voices from different speakers, as well as business telephone interview voice datasets. By analyzing and cross-matching these voice data, the goal is to find the dialogues of satisfied vocals conveying positive or negative emotions in long speech signals. In the research process, the data undergo preprocessing and feature extraction, followed by the application of a cross-attention model to input the processed data. By calculating the attention scores between different features, we can locate the dialogues of satisfied vocals with the highest attention scores. The experimental results demonstrate that the number of filter banks and the shift stride parameters are important factors affecting the hit ratio. According to the research findings, the optimal parameters are a filter banks quantity of 30 and a shift stride of 10, achieving the best performance across different evaluation metrics. The HR@5 reaches 95.08%, HR@3 reaches 84.15%, and HR@1 reaches 60.11%.
Appears in Collections:	[Graduate Institute of Business Administration] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	51	View/Open

社群 sharing

Loading...