博碩士論文 92542011 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator吳毓傑zh_TW
DC.creatorYu-Chieh Wuen_US
dc.date.accessioned2007-11-16T07:39:07Z
dc.date.available2007-11-16T07:39:07Z
dc.date.issued2007
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=92542011
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract本論文提出一個整合性之中文語法相依性分析之架構,並包含了中文詞切割及詞性標注等問題。我們首先討論中文斷詞與詞性標注等問題,並且探討將此問題轉換為一個常見的序列分析分類的過程。然要訓練一個序列分析器,我們也研究了幾個目前常用而且效果很好的方法。在我們的實驗中發現最好的一個方法-CRF要優於其他分析器,但其缺點就是慢而且與類別數量成二次方成長,這使得中文詞性標注的問題無法在實際上能處理。為克服此問題,我們提出了一個結合CRF與SVM的二階段模型,結合CRF高效能的優點,並以SVM快速且準確的特性補足其效率問題。實驗證明我們的方法要明顯優於其他方法(包含CRF, 96.2 vs. 95.9 in F-measure)。 在公認的中文斷詞語料(SIGHAN-3)上,我們的方法也達到幾乎最佳的結果。 藉由二階段中文詞標注,文章中的詞彙與其詞性都能以此切分。因此,我們使用這詞性分析器所分出的詞彙用來進行下一階段的詞性語法相依性分析。為了使文法分析結果更上一層樓,我們也整合頂層與底層的句法關係並列入考量。同時,本研究也與目前公認最好的詞法相依性分析方法比較。實驗結果顯示,本研究的方法,不但比其他方法準確,而且訓練與測試時間要大大的減少。 此外,本研究也提出一個近似K-best 搜尋法來改善整體解析文法與斷詞之結果。此法的優點在於可以不須修改訓練模組,而在測試時對所有可能的候選一起列入考慮,以決定最後之文法解析結果。zh_TW
dc.description.abstractThis thesis proposes a unified Chinese dependency parsing framework where the word segmentation and POS-tagging were included. We first discuss the issue of the Chinese word segmentation and part-of-speech tagging. Then we exploit the conversion of Chinese POS tagging as sequential chunk-labeling problem and treat it as the conventional sequential chunk labeling tasks. To train a sequential labeler several classification algorithms are investigated. However, the observed best method-CRF yields superior but slower performance than the other approaches which make the POS tagging intractable. To circumvent this, we propose a two-pass sequential chunk labeling model to combine CRF with SVM. The experimental result showed that the two-pass learner achieves the best result than the other single-pass methods (96.2 vs. 95.9). In the well-known benchmark corpus (SIGHAN), our method also showed very competitive performance. By means of the two-pass Chinese POS tagging, the words associated with their part-of-speech labels could be auto-segmented and labeled. We therefore employ the auto-segmented words for dependency parsing. To enhance the performance our parser integrates both top-down and bottom-up syntactic information. Meanwhile, we also compare with current state-of-the-art dependency parsers. The experimental result showed that our method is not only more accurate but also spends much less training and testing time than the other approaches. In addition, an approximate K-best reranking method is designed to improve the overall dependency parse and also for word segmentation results. The advantage is that one can independently train these modules, while taking the global parse into consideration through the K-best selection.en_US
DC.subject詞性標註zh_TW
DC.subject中文斷詞zh_TW
DC.subject中文詞相依性剖析zh_TW
DC.subject機器學習zh_TW
DC.subjectpart-of-speech taggingen_US
DC.subjectword segmentationen_US
DC.subjectChinese dependency parsingen_US
DC.subjectmachine learningen_US
DC.title結合頂層與底層句法資訊之中文詞相依性分析zh_TW
dc.language.isozh-TWzh-TW
DC.titleIncorporating top-level and bottom-level information for Chinese word dependency analysisen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明