中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/93462
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 80990/80990 (100%)
造访人次 : 41763902      在线人数 : 2152
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93462


    题名: 基於降維計算與群聚演算法優化語者分離系統;Speaker Diarization System Optimized by Dimensionality Reduction and Clustering Algorithm
    作者: 莊涵迪;Zhuang, Han-Di
    贡献者: 資訊工程學系
    关键词: 深度學習;語者分離;降維計算;群聚演算法;Deep Learning;Speaker Diarization;Dimension Reduction;Clustering Algorithm
    日期: 2023-08-11
    上传时间: 2024-09-19 17:03:03 (UTC+8)
    出版者: 國立中央大學
    摘要: 近年來深度學習的興起帶來了⼀系列技術的進展,尤其在語⾳這塊領域,透過深度 學習可達到傳統⽅法或⼈⼒達不到的境界。隨著各項語⾳服務的需求,語者分離技術也 愈發受到重視,語者分離系統主要任務爲標⽰出⼀段對話中“誰”在“何時”有說話,⽽這 類的標識結果可幫助其他深度學習語⾳模型辨識出使⽤者與⾮使⽤者的交談內容,如語 ⾳辨識,聊天機器⼈,語⾳增強等,都是透過語者分離將⽬標⼈聲分離出,再進⼀步達 成其下游任務。語者分離模型最常⾒的做法是利⽤語者特徵提取網路,如 x-vectors,i vectors 等,將參與對話的語者特徵提取出,再透過群聚演算法⽐對輸⼊語⾳⽚段達到語 者分離的結果。然⽽,使⽤常⾒的群聚演算法如 K-means、SC(Spectral Clustring)皆遭遇 到相似的瓶頸,在⾼維度的特徵向量在進⾏群聚演算法的計算時,可能導致分類效果不 佳,以及運算時間過⾧。爲了突破此瓶頸,我們在實驗上先使⽤ ECAPA-TDNN 當作語 者特徵提取器,再使⽤ UMAP 對特徵向量降維,UMAP 在許多與分類相關領域都取得 優異的成績,使⽤其進⾏降維可以較好的保留原資料在⾼維度中的結構,使其成爲近年 相當熱⾨的降維演算法。接著我們分別使⽤ Leiden 與 HDBSCAN 演算法進⾏群聚,其 中 HDBSCAN 演算法在配合 UMAP 在分離結果上取得優異的成績,在 AMI Meeting 數 據集上取得了 2.61 的分離錯誤率。我們的⽅法透過降維計算優化了在⾼維度下進⾏群 聚計算的穩定度,並透過 HDBSCAN 演算法加強了分類準確度。我們提出的⽅法具有⼀ 般化及通⽤性,可套⽤在不同語者特徵提取網路的語者分離任務。;In recent years, the rise of deep learning has brought about a series of technological advances. With the demand for various speech services, speaker diarization technology has gained increasing attention. The main task of a speaker diarization system is to identify "who" spoke "when" during a conversation. The most common approach in speaker diarization models is to use speaker feature extraction techniques such as TDNN to extract the features of the speakers participating in the conversation. Then, clustering algorithms are used to match the input speech segments to achieve speaker diarization results. However, using common clustering algorithms such as K-means and SC often encounter similar bottlenecks. In more complex scenarios, we would like to use higher-dimensional feature vectors to preserve complete features. High-dimensional feature vectors can lead to poor classification results and long computation times. To overcome this bottleneck, we used ECAPA-TDNN as a speaker feature extractor, and then used UMAP to reduce the dimensionality of the feature vectors. Using UMAP can better preserve the structure of the original data in high dimensions, and it allows users to define the target dimensions for the reduction, making it a popular dimensionality reduction algorithm in recent years. We then used the HDBSCAN algorithm for clustering, which has achieved excellent results. We achieved a diarization error rate of 2.61 % on the AMI Meeting dataset. Our method optimizes stability in high dimensions through dimensionality reduction calculations and accelerates classification accuracy through the HDBSCAN algorithm. Our proposed method has generalization and universality and can be applied to speaker diarization tasks in different datasets
    显示于类别:[資訊工程研究所] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML15检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明