| 摘要: | 台灣的交通事故頻繁,現行人工筆錄流程繁瑣、耗時且易遺漏關鍵資訊,造成執法人員與當事人在人力與時間上的高成本。為改善此問題,本研究提出了一個基於大型語言模型的交通事故資訊蒐集代理人(Collision Care Guide, CCG),對話引導方式系統性蒐集交通事故相關資訊,並實現結構化事故紀錄與自然語言敘述的雙向準確轉換。 
 CCG 系統由三個模組構成:問題生成模組動態依據缺失欄位與使用者回覆調整提問;資訊擷取模組將口語敘述擷取為JSON結構化事故儲存格式;事故重建模組則將完整結構化資料重新組織為可讀性高的事故敘述供使用者確認。
 
 為驗證系統可用性與穩健性,本研究建立三層次評估框架:AI 代理測試、真人使用者測試與事故重建評估。結果顯示,AI代理測試中系統於對話品質與資訊擷取各維度均獲得 4.0分 以上的評分(滿分5分),且其LLM自動評分與人工評分的分數分布呈現顯著正相關(Spearman r = 0.474, p < 0.001),證實評估的一致性。真人使用者測試中,資訊擷取效能達F1=0.909,與AI代理測試結果高度一致。事故重建評估顯示,重建敘述轉換在資訊完整性獲得4.7分以上(滿分5分),語義相似度也高達0.9,證實結構事故紀錄與自然事故敘述之資訊保真與語義一致性。
 
 此外,為降低成本並提升私有部署與資料隱私可行性,本研究進一步比較微調開源 Llama 模型與基線商業模型(GPT-4o-mini)之效能。測試集結果顯示,資訊擷取模組欄位完全準確率皆大於0.89,整體語義相似度約0.995;問題生成模組的平均語義相似度約0.85,聯合訓練的模型結果也在整合兩項任務保持高穩定的效能。LLM 自動評估亦顯示微調模型在對話品質與資訊擷取兩面向表現均 ≥4(滿分 5),驗證任務特化微調的有效性。
 
 綜上,CCG 達成準確並結構化的資訊擷取,並證實開源模型微調方案具備實務部署價值,可為交通事故處理、保險理賠與法律前置蒐證提供標準化支援。;Frequent road traffic accidents in Taiwan impose substantial procedural and cognitive burdens on law enforcement, insurers, and involved parties. Conventional manually driven reporting workflows are time-consuming, error-prone, and susceptible to omission of salient facts, thereby delaying downstream responsibility assessment and claims processing. To address these limitations, we propose Collision Care Guide (CCG), a Large Language Model (LLM)-based conversational agent that systematizes the bi-directional transformation between unstructured natural-language accident narratives and a structured Traffic Accident Record Format (TARF) representation.
 
 CCG comprises three coordinated modules: (1) a Question Generation Module that adaptively formulates targeted inquiries based on missing fields and prior user responses; (2) an Information Extraction Module that converts colloquial, potentially partial or disfluent utterances into a structured JSON record; and (3) an Accident Reconstruction Module that regenerates a coherent, human-readable narrative from the completed structured record for verification and downstream use.
 
 We design a three-tier evaluation framework integrating large-scale AI agent simulation, human user dialogues, and reconstruction fidelity assessment. In AI agent experiments, CCG attains dialogue quality and information extraction scores ≥4.5/5 across fluency, relevance, and coherence dimensions; LLM-based automated scores exhibit significant positive correlation with human ratings (Spearman r = 0.474, p < 0.001). Human evaluation yields an information extraction F1 score of 0.909, closely matching AI agent performance (0.908), evidencing robustness across user types. Reconstruction achieves completeness scores ≥4.7/5 with semantic similarity of 0.90, confirming high-fidelity bidirectional conversion.
 
 To enhance deployability under cost and privacy constraints, we further fine-tune an open-source Llama model. Relative to the GPT‑4o‑mini baseline, the fine-tuned model achieves field-level exact accuracy >0.94 and overall JSON semantic similarity ≈0.99 in extraction, and a 0.85 average semantic similarity in question generation, while maintaining ≥4/5 LLM-based evaluation scores. Results collectively demonstrate CCG’s effectiveness, stability, and extensibility, offering a reusable methodological template for structured information collection in safety-critical legal-adjacent domains.
 |