中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/63241
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 40882477      線上人數 : 2861
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/63241


    題名: 易延伸的語言模型之設計及其在數位語言學習之應用;A New Breed of Machine Tractable Language Model for Digital Language Learning
    作者: 衛友賢;陳孟彰
    貢獻者: 國立中央大學學習與教學研究所
    關鍵詞: 語文;科學教育
    日期: 2013-12-01
    上傳時間: 2014-03-17 14:24:29 (UTC+8)
    出版者: 行政院國家科學委員會
    摘要: 研究期間:10208~10307;The proposed project addresses one of the central limitations currently constraining digital language learning, that is, a lack of sophisticated digital knowledge resources to support language learning. We focus on a domain of language knowledge that we have targeted for the past seven years, the domain called multiword expressions (MWEs), and propose in our new work to construct a corpus-derived lexico-grammatical knowledgebase of such expressions for English. The knowledgebase is called StringNet. We target the area of MWEs because of the persistent and widely acknowledged challenges it poses both for the field of computational linguistics (Sag et al 2002; Baldwin et al 2007; Zhang et al 2006; inter alia) and the field of second language education (Wray 2002; Pawley and Syder 1984; Lewis 2002; Nattinger and DeCarrico 1992; inter alia). An important characteristic of our research and of the design of StringNet is that they address these two aspects of MWEs (computational and educational) within one coherent framework, as two sides of the same coin. Current computational language models are based on n-grams, that is, sequences of word pairs or triples or 4-grams and so on. N-grams are flat and represent word combinations with only the syntagmatic dimension. For StringNet we have created the novel notion of hybrid n-gram, which introduces the paradigmatic dimension to the language model by allowing part-of-speech categories to occur within n-grams alongside words. Thus, not only ‘consider himself lucky’ but also the more general ‘consider [pnx] lucky’ with the reflexive pronoun category showing the substitutability of the middle slot. StringNet then cross-indexes all the hybrid n-grams that it extracts from BNC (so ‘consider [pnx] lucky’ is indexed to ‘consider himself lucky’ and to ‘consider herself lucky’ indicating the subordinate/superordinate relation holding between them). Unlike other language models then, StringNet is not a list, but a cross-indexed web of lexical patterns ranging from specific to abstract (from ‘it’s the thought that counts’ to ‘it’s the [noun] that [verb]’). StringNet Navigator will provide a web interface allowing users not only to submit query words but to navigate through the relations among the patterns given as search results. The test-of-concept version of StringNet has been successfully created, results published, and it has been acknowledged for ‘advancing the field’ in this difficult area. The online beta version of this test-of-concept has received queries from 30-40 countries every month for the past year since it was made available. The project PI (Wible) has already been invited to contribute an article to the prestigious SSCI journal Annual Review of Applied Linguistics on the theme of this project: MWEs and digital language learning. The present three-year project proposes to create a mature version of StringNet based on the successful test-of-concept beta version and to develop and implement a range of applications to support second language education. In the first year, the mature version of StringNet will be extracted from the full 100,000,000-word British National Corpus (BNC) compared to the sampled 6,000,000-word version used for the test-of-concept version. In subsequent years, corpus resources for StringNet will be expanded beyond BNC to Google Books, Wikipedia and other clean corpora. With respect to applications, the project will (1) develop APIs that will make StringNet accessible from any website by means of plug-ins, (2) produce StringNet Builder as a web service that can generate a StringNet knowledgebase for any corpus that a user submits; (3) extract domain-specific lexical patterns from e-textbooks for particular fields and apply the results to digitally supported English for Academic Purposes; (4) create Query Doctor—a tool that uses edit distance techniques and the knowledge structures of StringNet to detect and correct errors produced in multiword queries to Google or other search engines, thus addressing the dangerous and common practice of using Google as an error checker; (5) develop word similarity measures that distinguishes confusable words for learners and teachers and (6) create exercise wizards that generate discovery exercises and cloze exams from StringNet search results. The breakthroughs represented by StringNet’s novel knowledge structures will provide fertile territory for cutting-edge research for the coming decade and beyond.
    關聯: 財團法人國家實驗研究院科技政策研究與資訊中心
    顯示於類別:[學習與教學研究所 ] 研究計畫

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML466檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明