| 摘要: | 半導體產業是臺灣經濟的重要核心,其中任何微小的製程參數錯誤或推論偏差,都 可能導致昂貴的後續失誤。雖然大型語言模型 (LLMs) 逐漸展現出在半導體工程中的應 用潛力,但其生成內容仍可能出現不可靠或與事實不符的輸出,使其在高風險情境中難 以直接部署。因此,提升模型在領域專屬任務上的「可靠性」成為關鍵研究課題。 本論文提出一套旨在提高事實可靠性的雙模型架構。系統包含兩個核心組件: (1) 生成器:以 Qwen2.5-14B 為基礎,透過二億 token 半導體語料進行持續預訓練 (Continued Pretraining),並利用 Chat Vector 進行推理能力增強,以強化領域知識內化; (2) 驗證器 (Verifier):以約 8,000 筆領域問答資料進行微調,用於攔截任何與標準答案 不符的輸出,採取召回導向的安全過濾策略,以降低錯誤資訊外洩的風險。 在 1,000 題測試中,本系統優於業界標準 RAG 基準。純生成器模型透過知識內化, 在準確率上超越 RAG (82.0% 對比 75.8%);完整系統則優先考量安全性,將錯誤率壓 低至 9.5% (顯著低於 RAG 的 24.2%)。雖然驗證器的保守過濾降低了覆蓋率,但有效 攔截了不安全輸出並維持低延遲,符合半導體工程對高事實精準度的嚴格需求。 綜合而言,本研究貢獻包含:(1) 提出首個專為半導體領域設計的生成器—驗證器 雙模型可靠性框架;(2) 建構領域專屬的持續預訓練語料與驗證資料集;(3) 建立強調端 到端安全性的評估方法。實驗結果顯示,召回優先的安全過濾機制能有效提升模型可靠 性,為大型語言模型在半導體等高風險工程場域中的可信部署提供可行路徑。;The semiconductor industry is a cornerstone of Taiwan’s economy, where even small mistakes in process parameters or fabrication reasoning can cause costly downstream fail- ures. Although large language models (LLMs) are increasingly capable, their outputs may still contain incorrect or unverifiable statements, limiting safe deployment in semi- conductor engineering. Enhancing reliability in such high-stakes, domain-specific settings is therefore essential. This thesis proposes a dual-model framework to improve factual reliability in semi- conductor related LLM outputs. The system comprises: (1) a generator model, based on Qwen2.5-14B, adapted through continued pretraining on a 200M-token semiconductor cor- pus and reasoning alignment via Chat Vector; and (2) a lightweight Verifier fine-tuned on ∼8,000 domain QA pairs to filter outputs that deviate from ground-truth references. The Verifier follows a recall-oriented design that prioritizes intercepting potentially incorrect answers. On a 1,000-QA benchmark, the system outperformed an industry-standard RAG baseline. Specifically, the Generator-only model surpassed RAG in accuracy (82.0% vs. 75.8%) via knowledge internalization, while the full system prioritized safety, reducing the error rate to 9.5%—significantly lower than RAG’s 24.2%. Although conservative filtering reduced coverage, this trade-off effectively minimized unsafe outputs while maintaining practical latency. Overall, this work contributes: (1) a reliability-focused generator–verifier architec- ture for semiconductor engineering, (2) domain-specific datasets for continued pretraining and verification, and (3) an evaluation framework centered on safety metrics. The find- ings show that recall-oriented verification offers a viable path toward trustworthy LLM deployment in semiconductor workflows where factual correctness is critical. |