Assessing Implicit Gender and Racial Biases Towards Professions in Large Language Models: An Empirical Investigation

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98340

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98340

題名:	Assessing Implicit Gender and Racial Biases Towards Professions in Large Language Models: An Empirical Investigation
作者:	陳泓嘉;Chen, Evan
貢獻者:	資訊工程學系
關鍵詞:	大型語言模型;性別偏見;人工智慧公平性;性別代表性;人工智慧對齊;LLM;Gender Bias;AI fairness;Gender Representation;Human-AI alignment
日期:	2025-07-28
上傳時間:	2025-10-17 12:39:07 (UTC+8)
出版者:	國立中央大學
摘要:	本研究提出了一套新穎的評估框架，旨在揭示大型語言模型（LLMs）中，針對不同職業所存在的內隱性別與種族偏見。此方法有別於以往採用結構化情境或特定提示詞（prompt）的作法，我們轉而利用「自由形式的故事創作」：我們提示大型語言模型為各種職業生成角色，接著分析文本中的性別線索（如：姓名、代名詞），並從姓名推斷其種族歸屬。我們對十個主流大型語言模型進行系統性分析後，發現普遍存在一個現象：在許多職業中，女性角色的比例被過度呈現，這很可能是受到「人類回饋強化學習」（RLHF）等「對齊」（alignment）工作的影響。與此同時，我們也發現模型產出的內容中，職業普遍與被認定為白人的姓名產生連結。此外，大型語言模型所生成的職業性別分佈，比起真實世界的勞動統計數據，更接近於人類社會的刻板印象。這些研究結果凸顯出，我們在實施平衡的偏見緩解措施時所面臨的挑戰與其重要性，以促進公平，並防止現有或新興的性別與種族社會偏見被再次強化或建立。 ;This study introduces a novel evaluation framework, distinct from prior methods that mostly use prompts in structured scenarios, to uncover implicit gender and racial biases in large language models (LLMs) regarding professions. Our approach leverages free-form storytelling: we prompt LLMs to generate characters for various occupations, then analyze gender cues (e.g., names, pronouns) and infer racial associations from these names. A systematic analysis of ten prominent LLMs reveals a consistent overrepresentation of female characters in most occupations, likely influenced by alignment efforts such as RLHF. Despite this, when ranking the occupations from most female-associated to most male-associated based on the models′ outputs, the resulting order aligns more closely with human stereotypes than with real-world labor statistics. In addition, we find a predominant association of professions with white-identifying names in the LLM outputs. These findings highlight the challenge and importance of implementing balanced mitigation measures to promote fairness and prevent the reinforcement of existing or the establishment of new societal biases related to gender and race.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	6	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....