以 Llama 3與 K-means 分群來發展推特機器人帳號之偵測;Development of X(Twitter) Bot Detection Using Llama 3 and K-Means Clustering

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/99398

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99398

題名:	以 Llama 3與 K-means 分群來發展推特機器人帳號之偵測;Development of X(Twitter) Bot Detection Using Llama 3 and K-Means Clustering
作者:	劉禮豪;LIU, LI-HAO
貢獻者:	資訊管理學系
關鍵詞:	機器人帳號偵測;生成式人工智慧;大型語言模型;參數高效微調;社群媒體安全;Bot Account Detection;Generative Artificial Intelligence;Large Language Model;Parameter-Efficient Fine-Tuning， PEFT;Social Media Security
日期:	2026-01-22
上傳時間:	2026-03-06 18:53:44 (UTC+8)
出版者:	國立中央大學
摘要:	社群媒體平台長期存在大量自動化帳號，其內容生成與互動策略持續演化，隨著生成式人工智慧(Generative Artificial Intelligence，GAI)技術的普及，社群媒體機器人帳號(Social Media Bots)的行為表現更趨擬人化，使得僅依賴傳統特徵工程或單一資料來源的偵測方法逐漸面臨泛化能力與可解釋性不足的挑戰，也導致一般使用者在辨識可疑帳號時面臨判斷門檻高、資訊不足與缺乏解釋支援等問題。本研究以大型語言模型 Llama 3 為核心，提出一套結合「機器人帳號偵測」與「分群解釋」之推特(Twitter, known as X)機器人輔助系統。在偵測模型建置上，本研究採用多種公開推特資料集以提升模型於不同帳號型態下的適用性，並透過資料清理、欄位對齊、嵌入與資料分割流程，完成 Llama 3 之訓練與參數高效微調(Parameter-Efficient Fine-Tuning, PEFT)。另外，為進一步區分機器人帳號之類型，本研究進一步針對「被判定為機器人」之帳號導入 K-means 分群，最終形成 10 個群集，並針對粉絲數、關注數、發文數與按讚數等指標進行群集差異分析，以描繪不同機器人群集在行為與互動結構上的差異，強化模型輸出的可解釋性與使用者理解。最後在系統實作上，本研究建置"X-Bot Detector"原型系統，提供「用戶檢查」與「機器人帳號知識問答」兩大功能。透過互動式問答介面，使用者可以藉由 XAPI擷取目標帳號後設資訊與近期貼文，交由參數高效微調後的模型輸出判斷，同時於問答回覆中整合分群歸屬與群集特徵提供更具脈絡的說明。;Social media platforms have long existed a large numbers of automated accounts whose content generation and interaction strategies continue to evolve. With the popularization of Generative Artificial Intelligence (GAI), the behavior of social media bots is becoming increasingly human-like. As a result, bot-detection approaches that rely solely on traditional feature engineering or a single data source are gradually facing challenges in terms of generalization and interpretability. It also leads to problems for general users in identifying suspicious accounts, such as high judgment standards, insufficient information, and a lack of explanatory support. To address these issues, this study proposes a Twitter (X) bot-assistance system that integrates bot account detection with cluster-based interpretation, using the large language model Llama 3 as the core. In building the detection model, this study uses multiple publicly available Twitter datasets to improve the model′s applicability across different account types. Through data cleaning, field alignment, embedding, and dataset splitting, Llama 3 is trained and parameter-efficient fine-tuning (PEFT) for bot detection. Furthermore, to differentiate bot categories beyond binary classification, this study applies K-means clustering to accounts predicted as bots, resulting in ten clusters. Cluster differences are then analyzed using indicators such as number of followers, number of followings, number of posts, and number of likes to characterize the differences in behavior and interaction structure among different bot clusters, enhancing the interpretability of the model output and user understanding. Finally, this study implements a prototype system, "X-Bot Detector," providing two main functions: "User Check" and "Bot Knowledge Q&A". Through an interactive interface, users can retrieve target-account metadata and recent posts via the X API, which are then fed into the fine-tuned model for judgment. Meanwhile, the Q&A responses contextual explanations that incorporate both cluster assignments and cluster characteristics.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	194	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....