English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78728/78728 (100%)
Visitors : 34082845      Online Users : 2594
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/63241

    Title: 易延伸的語言模型之設計及其在數位語言學習之應用;A New Breed of Machine Tractable Language Model for Digital Language Learning
    Authors: 衛友賢;陳孟彰
    Contributors: 國立中央大學學習與教學研究所
    Keywords: 語文;科學教育
    Date: 2013-12-01
    Issue Date: 2014-03-17 14:24:29 (UTC+8)
    Publisher: 行政院國家科學委員會
    Abstract: 研究期間:10208~10307;The proposed project addresses one of the central limitations currently constraining digital language learning, that is, a lack of sophisticated digital knowledge resources to support language learning. We focus on a domain of language knowledge that we have targeted for the past seven years, the domain called multiword expressions (MWEs), and propose in our new work to construct a corpus-derived lexico-grammatical knowledgebase of such expressions for English. The knowledgebase is called StringNet. We target the area of MWEs because of the persistent and widely acknowledged challenges it poses both for the field of computational linguistics (Sag et al 2002; Baldwin et al 2007; Zhang et al 2006; inter alia) and the field of second language education (Wray 2002; Pawley and Syder 1984; Lewis 2002; Nattinger and DeCarrico 1992; inter alia). An important characteristic of our research and of the design of StringNet is that they address these two aspects of MWEs (computational and educational) within one coherent framework, as two sides of the same coin. Current computational language models are based on n-grams, that is, sequences of word pairs or triples or 4-grams and so on. N-grams are flat and represent word combinations with only the syntagmatic dimension. For StringNet we have created the novel notion of hybrid n-gram, which introduces the paradigmatic dimension to the language model by allowing part-of-speech categories to occur within n-grams alongside words. Thus, not only ‘consider himself lucky’ but also the more general ‘consider [pnx] lucky’ with the reflexive pronoun category showing the substitutability of the middle slot. StringNet then cross-indexes all the hybrid n-grams that it extracts from BNC (so ‘consider [pnx] lucky’ is indexed to ‘consider himself lucky’ and to ‘consider herself lucky’ indicating the subordinate/superordinate relation holding between them). Unlike other language models then, StringNet is not a list, but a cross-indexed web of lexical patterns ranging from specific to abstract (from ‘it’s the thought that counts’ to ‘it’s the [noun] that [verb]’). StringNet Navigator will provide a web interface allowing users not only to submit query words but to navigate through the relations among the patterns given as search results. The test-of-concept version of StringNet has been successfully created, results published, and it has been acknowledged for ‘advancing the field’ in this difficult area. The online beta version of this test-of-concept has received queries from 30-40 countries every month for the past year since it was made available. The project PI (Wible) has already been invited to contribute an article to the prestigious SSCI journal Annual Review of Applied Linguistics on the theme of this project: MWEs and digital language learning. The present three-year project proposes to create a mature version of StringNet based on the successful test-of-concept beta version and to develop and implement a range of applications to support second language education. In the first year, the mature version of StringNet will be extracted from the full 100,000,000-word British National Corpus (BNC) compared to the sampled 6,000,000-word version used for the test-of-concept version. In subsequent years, corpus resources for StringNet will be expanded beyond BNC to Google Books, Wikipedia and other clean corpora. With respect to applications, the project will (1) develop APIs that will make StringNet accessible from any website by means of plug-ins, (2) produce StringNet Builder as a web service that can generate a StringNet knowledgebase for any corpus that a user submits; (3) extract domain-specific lexical patterns from e-textbooks for particular fields and apply the results to digitally supported English for Academic Purposes; (4) create Query Doctor—a tool that uses edit distance techniques and the knowledge structures of StringNet to detect and correct errors produced in multiword queries to Google or other search engines, thus addressing the dangerous and common practice of using Google as an error checker; (5) develop word similarity measures that distinguishes confusable words for learners and teachers and (6) create exercise wizards that generate discovery exercises and cloze exams from StringNet search results. The breakthroughs represented by StringNet’s novel knowledge structures will provide fertile territory for cutting-edge research for the coming decade and beyond.
    Relation: 財團法人國家實驗研究院科技政策研究與資訊中心
    Appears in Collections:[學習與教學研究所 ] 研究計畫

    Files in This Item:

    File Description SizeFormat

    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明