摘要: | 隨著大型語言模型 (LLMs) 在各種應用領域廣泛使用,這些模型可能內含的偏見和不公平性問題引起了廣泛關注。現有研究主要集中於評估以英語為主的大型語言模型中存在的偏見,而對非英語語言的偏見評估研究則相對匱乏。以中文為例,簡體中文和繁體中文在語言使用環境、語言特徵及文化內涵上存有差異,現有研究過於集中於簡體中文,忽視了以繁體中文為主的地區語言和文化的獨特性。本研究的主要目標是建立一個細緻的繁體中文偏見基準,用於評估大型語言模型在台灣社會文化背景下的性別和族群偏見。不同於現有研究,我們將深入探討不同人口群體的各種刻板印象的類型,包括性格、制度和文化等方面,以提供更全面的偏見評估視角。我們重新定義了基於CHBias的偏見定義(Bias Specification),並採用困惑度(Perplexity)的平均值作為統計差異度量,以更精確地識別和評估語言模型中的偏見。此外,改善了先前研究直接計算搜集之刻板印象句子的困惑度的方式,我們透過加入提示範本(prompt template) 的方式進行困惑度的計算,試圖在評估模型時以較為符合實際應用的形式進行偏見的評估,降低評估過程高估模型偏見的情況。此外,本研究還將標記句子是否含有有害言論(Toxic Language),探討其對偏見程度的影響。另外,本研究中提出的評估方法除了用來評出LLMs的偏見之外,亦可用利用評估繼續預訓練模型的偏見,來去窺探訓練數據中存在的偏差。本研究的貢獻在於創建一個基於台灣文化背景的社會偏見 (性別和族群) 基準,可供學術界和產業界使用。通過本研究的方法和發現,我們希望能為評估繁體中文大型語言模型中的社會偏見提供有價值的參考,促進大型語言模型朝向更公平的方向前進,以便更好地服務於多元化的應用場景之中。;As Large Language Models (LLMs) are widely used in various applications, the potential biases and unfairness embedded in these models have received considerable attention. Existing studies have primarily focused on assessing bias in English LLMs, while research on bias assessment for non-English languages remains relatively scarce. Taking Chinese as an example, Simplified Chinese and Traditional Chinese differ in linguistic environments, language features, and cultural connotations, yet existing research has been overly concentrated on Simplified Chinese, neglecting the unique linguistic and cultural characteristics of regions where Traditional Chinese is the predominant form. The primary objective of this study is to establish a Traditional Chinese social bias benchmark for evaluating the handling of gender and ethnic group bias in large language models in the context of Taiwanese cultural background. Unlike previous research, we delve deeper into the various types of stereotypes in different demographic groups, including personality, institutional, cultural aspects, etc., to provide a more comprehensive bias assessment perspective. We redefine the Bias Specification based on CHBias and adopt the average perplexity as the metric for statistical difference calculation to more accurately identify and evaluate the biases present in language models. Furthermore, this study has improved upon previous research that directly calculated the perplexity (PPL) of collected stereotypical sentences by incorporating prompt templates for PPL calculation. This approach attempts to evaluate biases in a manner closer to actual application scenarios and reduces the overestimation of model biases during the evaluation process. Additionally, this study annotates whether sentences contain harmful speech and explores its impact on the degree of bias. Moreover, the proposed evaluation method can also be used to analyze biases in the training data by evaluating continual pretraining models and inferring biases present in the training data. The contributions of this research lie in the creation of a social bias(gender and ethnic group) benchmark based on the Taiwanese cultural context, which can be used by both academia and industry. Through the methods and findings of this study, we hope to provide valuable references for assessing social biases in Traditional Chinese large language models, promoting fairness and reducing bias in these models to better serve diverse application scenarios. |