建構構式語言模型：在語境中呈現構式之形與義;Modeling Linguistic Constructions: Capturing Contextual Correlates of Constructional Meaning

NCU Institutional Repository > 文學院 > 學習與教學研究所 > 研究計畫 > Item 987654321/78835

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/78835

题名:	建構構式語言模型：在語境中呈現構式之形與義;Modeling Linguistic Constructions: Capturing Contextual Correlates of Constructional Meaning
作者:	衛友賢
贡献者:	國立中央大學學習與教學研究所
关键词:	詞彙文法構式;構式語法;英語學習資源;電腦計算語言模型;Lexico-grammatical constructions;Construction grammar;English learning resource;Computational language model
日期:	2018-12-19
上传时间:	2018-12-20 13:56:09 (UTC+8)
出版者:	科技部
摘要:	早期的構式語法研究(Fillmore, Kay, and O’Connor 1988, inter alia)所極力著重的，是語言使用中規則性面向與不規則面向，兩者是不可分割的。然而，隨著幾十年來構式語法影響的擴及，此特性在計算語言學的運算模型中卻無法被呈現。最明顯的例子便是詞彙文法構式，此範疇介於字典中完全凝結之固定用語、以及符合語法創造原則句構兩者之間，電腦運算往往無力建立語言模型來呈現其樣貌。遺憾的是，詞彙文法構式正是早期構式語法研究中所強調的，被主流模組式語法忽略之重點。有鑑於此，本提案立基於研究團隊過去八年來所研發的StringNet語言模型，此模型以構式語法理論為基礎進行開發，特別能分析、運算和呈現詞彙文法構式的複雜性。現有之StringNet模型以及研究團隊施測的課堂語言實驗則提供了優化的契機，包括如何改進模型以呈現更多詞彙文法構式，以及如何讓個別使用者更容易發現這些構式。因此，本研究案旨在針對現有模型做出根本性的優化改進，提升模型在呈現詞彙文法構式的深度及廣度，並讓使用者更易於理解所呈現構式之形式和語義。這其中的關鍵核心是，StringNet能提供在語境關係中偵測到語義的機會。然而在現存模型中，這些語境關係所呈現的特徵是隱而不顯的，使用者很難從模型內的幾十億個構式中發現其存在。本提案即旨在超越現存模型所編入索引之關聯性，擬於優化的模型中呈現更高階的關聯性。新增的高階關聯性將聯結具有相似模式的字詞，並能將涵蓋這些字詞的句構提取出來作為候選的構式。最重要的是，不同於黑箱測試法的分佈式語言模型，諸如向量空間模型(VSMs)等，我們仍會留心保持模型的資訊透明度以及可導覽性，以利使用者能在改良後的模型中看到這些高階關聯性的來源，並清楚理解這些關聯。而這樣的特性也讓本提案的模型以及相關工具格外地適切於以學習者為本位的語言學習和探索。有關本提案之優化版模型，其核心理念以及概念測試，已在研究團隊初步實測後進行學術發表(Tsao and Wible 2013)，研究結果肯定此模型之發展性，並對後續語言模型設計帶來深刻啟發。最後，研究團隊擬蒐集約五千萬字的國際學術期刊論文篇章，利用優化後之模型從中另建立學術版StringNet模型，這兩套語言模型將能互為參照比對，提供後續研究以深入探究學術英語與一般英語的迥異處，而非僅止於淺碟式的研究學術字彙或句型清單。 ;Early constructionist approaches to language (Fillmore, Kay, and O’Connor 1988, inter alia) devoted copious attention to the inextricable mix of the idiomatic and the productive aspects of language use. As construction grammar has steadily grown in its reach and influence over recent decades, however, it has also proven resistant to computational modeling. Most resistant to such modeling has been the poorly covered lexico-grammatical territory that falls between frozen items found in dictionaries on the one hand and patterns of maximally productive rules of grammar on the other. The resistance to computational modelingof this territory is especially unfortunate since it is preciselythis terrainwhich early constructionist research persistently highlighted as being of central importance but neglected by mainstream modular theories.The motivation of the current proposal lies in a theoretically grounded model that we have developed over the past eight years called StringNet which has provided uniquely promising results in capturing this lexico-grammatical territory. The current version of StringNet and our classroom field-testing with learners have revealed specific sources of potential for further breakthroughs in both capturing more lexico-grammatical constructions in the model and in making these patterns and their meaning more readily discoverable by independent users. The purpose of the proposed research is to create fundamentalrefinements to fulfill thispotential and make the patterns and their meaning even more accessible and intelligible to users. At the heart of this new potential is the unique opportunity the model affords for detecting contextual correlates of meaning. These contextual features, however, are latent in the current model, distributed throughout billions of patterns and thus out of reach for discovery by current users. Our proposed approach consists in bootstrapping from the relations indexed in the current model into a more refined model with additional higher-order relations. Importantly, unlike black-box distributed language models such as vector space models (VSMs), we are careful to retain our model’s transparency and navigability, making the higher order relations of the new model traceable and therefore intelligible to users. This makesour proposed model and our tools for accessing it uniquely suited for supporting learner-centered language exploration and discovery. We have implemented and published a test of concept of the core idea behind the proposed refinements (Tsao and Wible 2013) with promising results andrich implications for the design of the proposed model. We propose to use the new StringNet design to construct an academic StringNet out of a corpus of 50 million words of academic journal articles. The new general English StringNet and the academic StringNet will be cross-indexed to allow for detailed comparisons and therefore deeper research into the distinctive aspects of academic English beyond simple academic word and formula lists.
關聯:	財團法人國家實驗研究院科技政策研究與資訊中心
显示于类别:	[學習與教學研究所 ] 研究計畫

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	364	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....