形聲字發音規則探勘; Pronunciation Rules Discovery for Picto-Phonetic Chinese Characters

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/48490

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/48490

題名:	形聲字發音規則探勘;Pronunciation Rules Discovery for Picto-Phonetic Chinese Characters
作者:	林書彥;Shu-Yen Lin
貢獻者:	資訊工程研究所
關鍵詞:	形聲字;關聯規則;picto-phonetic;association rule mining
日期:	2011-07-28
上傳時間:	2012-01-05 14:56:12 (UTC+8)
摘要:	由於中國市場的崛起，全世界有超過四千萬非華語人口正在學習漢語。在台灣，也因為社會變遷的關係，外籍與大陸配偶的人數從2002年的二十三萬人成長至今已有四十四萬人，其中外籍配偶約十四萬六千多人，已取得國籍者約九萬人，這些現象顯示了漢語學習需求及重要性日益顯著。然而，漢語字形讀音繁複，初學者並不易掌握學習要訣，尤其漢語的發音更是複雜多變。事實上華語作為第二語言的學習，比起英文作為第二語言的學習更是難上許多，因為漢語的字形與音調相較拼音文字複雜，學習者要同時進行形、音、義三者的連結，如果沒有適當的聯想，將需要很大的記憶力。漢字的構成包含象形、指事、會意、形聲、轉注、假借(總稱六書)，其中形聲字占的比例最高，至少占八成。形聲字不僅可由形旁表意，又可以聲符表音，因此即使沒見過的字也可以由偏旁推論其音及義，這也是所謂的「有邊讀邊，沒邊念中間」的法則。然而，形聲字發音規則探勘的困難在於聲旁僅代表相近的發音，之間的演變規則尚未有人探究過，例如：泡、抱、飽三個字同樣與『包』的發音相近，然而發音如何由『包』的發音轉變成其他三個字的發音，則仍待研究。由於形聲字所占的比例極高，而聲符在形聲字中又扮演極為重要的腳色。因此本文第一階段目標是建立形聲字標記系統，藉由人工標記的方式標定14598個形聲字聲符。然而，人工標記曠日費時。為此，我們提出三種自動判定聲符的方法。其中機率分佈比較法準確率達九成八，而後再藉此方法排序出部件發音強度，進而達到重要部件先學習的目的。第二階段為探討重點在於找出常用漢字的各種特徵(如筆畫部首)，並利用關聯式探勘法則(Apirori)找出形聲字的發音規則，並以漢字初學者的角度出發，將發音規則過濾整合，留下容易記誦的部份。我們的目標是提出一個以聲符部件教學為主的漢字學習策略，用以提高學習曲線，讓漢字不是教一個字才學到一個字，而能搭配發音關聯規則「一舉數字」，發揮數位學習的優點。 Because of the rise of Chinese market, there are more than 40 million foreigners learning Chinese around the world. Due to changes of social construction, in Taiwan, the number of spouses of foreign and mainland China have been growing from 230 thousand in 2002, to 440 thousand now. Amount these people, foreign spouses are about 140 thousand, and 90 thousand people have already acquired Taiwanese identity. This phenomenon shows that the demand of learning Chinese is getting more important. However, form and pronunciation of Chinese characters are complicated so that beginners are not easy to learn, especially the part of learning pronunciation. In fact, learning Chinese as a second language is harder than learning English as a second language. Chinese learners have to use Chinese phonetic symbols or other spelling method to help reading. Besides the sound of character, they also have to study form and meaning together. Such restrictions above delay the progress of learning. That is the reason why some experts intended to Latinized Chinese at the first of 20th. Chinese characters have many kinds, including pictophonetic, ideographic, ideogrammic compounds, phono-semantic compounds, derivative cognates, and rebus characters, (collectively called Six Writings) in which pictophonetic characters mean the word that is composed by a phonetic element and a semantic complement. Therefore even you have not seen the word before, you can have a logical guess of the word’s pronunciation and meaning from its phonetic and semantic symbols. That is so-called 「有邊讀邊，沒邊念中間」. In fact, about 80 percent of Chinese characters are picto-phonetic characters. However, the main difficulty of this identification method is that the phonetic symbols only represent similar pronunciation, and the development between how the symbols influence the words’ pronunciation had not yet been studied. For example, 泡, 抱, 飽 these three words have similar pronunciation with the word 包, but how 包’s pronunciation influences and becomes these three words remains to be investigated. The main objective of this essay is to find the important phonetic words, which can help Chinese learners learning characters’ pronunciation and the rules of pronunciation of Chinese phonetic words. This paper is divided into two parts. First part of the discussion used proposed method “probability distribution comparison” to find the major picto-phonetic words in Chinese characters and sorting them by their significance, together with the rules, Chinese beginners can learn step by step by this arrangement. They will be able to learn a huge amount of other words and their pronunciation in a very short time. It will enhance the efficiency of learning Chinese characters’ pronunciation for Chinese beginners. The second part focused on finding different traits of commonly used Chinese characters, and we use association rule mining to identify pronunciation rules of phonetic characters (such as radicals’ strokes). Moreover, we filter and organize the pronunciation rules from Chinese beginners, remaining only those easy to memorize. Our goal is to give a learning strategy which focusing on phonetic symbol teaching, in order to risen the learning curve. When learning a new Chinese character, together with association rules, students can not only learn that one word but also many different related words. In the long run, they can make the most use of E-technology learning.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	649	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....