姓名 張保擏(Hendy Sulistio)  查詢紙本館藏   畢業系所 資訊管理學系
(A Similarity-based Method to Retrieve Bilingual Documents from the Theses and Dissertation Database)
摘要(中) 現代的電子文件的量已經巨大地增長,網路科技使用戶獨立地分享信息和知識。語言用來寫文件也有很多種。這種現象引導我們發會放法能精確地檢索文件和以能力解決语言障隘。
本這次研究, 我們發會相似度放法用來從論文和學術論文系統檢索雙語科學文件。我們計算雙語文件相似度(漢語和英語)。 結合一個检索系统以能力解決语言障隘是富挑戰性任務。
摘要(英) Electronic documents have grown tremendously in quantity nowadays, the internet technology enable users to share information and knowledge independently. The language which is used to write the documents might also variant. This phenomenon has leads us to develop a methodology which can retrieved documents precisely and with the ability to solve language barrier.
In this research we develop a similarity-based methodology to retrieve bilingual scientific documents from Theses and Dissertation System. We compute the similarity of bilingual documents (Chinese and English). Integrated a retrieval system with the ability to solve language barrier is a challenging tasks.
Every scientific document in our research is divided into 4 fields which are: Title, Keyword, Abstract, and Cited Reference. To compute a similarity of every field we used a different technique. The result of our methodology shows that our methodology is able to retrieve bilingual documents accurately.
關鍵字(中) 關鍵字(英) ★ Bilingual
★ Similarity-based
★ Text Mining
論文目次 Chapter 1 Introduction 1
1.1. Research Background 1
1.2. Research Motivation 2
1.3. Research Purpose 3
1.4. Research Flow 4
1.5. Theses Structure 5
Chapter 2 Literature Review 6
2.1. Document Preprocessing 6
2.1.1 Document Preprocessing (Chinese Documents) 6
2.1.2 Document Preprocessing (English Documents) 12
2.2. Translation of Documents 16
2.2.1 Statistical Machine Translation 16
2.2.2 Bilingual Comparable Text Corpora 18
2.3. Document Matching 18
2.4. Related Technology and Method 19
2.4.1 Information Retrieval 19
Chapter 3 Methodology 23
3.1. Translation Process 23
3.2. Similarity Based Method 33
Chapter 4 Experiment 47
4.1. Experiment Environment and Data 47
4.2. Experiment Design 47
4.3. Experiment Result 51
Chapter 5 Conclusion 54
5.1. Discussion 54
5.2. Future Research 55
References 56
指導教授 陳彥良(Yen-Liang Chen) 審核日期 2009-7-21
