HILA-Merging:具頭層級損失感知的細粒度剪枝與自適應模型合併;HILA-Merging: Head-Level Loss-Aware for Fine-Grained Pruning and Adaptive Model Merging

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/99374

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99374

題名:	HILA-Merging:具頭層級損失感知的細粒度剪枝與自適應模型合併;HILA-Merging: Head-Level Loss-Aware for Fine-Grained Pruning and Adaptive Model Merging
作者:	池秉宸;Chih, Bing-Chen
貢獻者:	資訊工程學系
關鍵詞:	大型語言模型;模型合併;多語言評估;任務向量;LLM;模型可解釋性;Large Language Models;Model Merging;Multilingual Evaluation;Task Vector;LLM;Model Interpretability
日期:	2026-01-13
上傳時間:	2026-03-06 18:50:04 (UTC+8)
出版者:	國立中央大學
摘要:	本研究提出 HILA-Merging，一種無需訓練即可在共享的大型語言模型(LLM)架構中整合多語言與多領域專家知識的方法。此方法旨在解決跨語言與跨領域專業能力的遷移難題，同時避免多階段微調所帶來的高昂計算成本。HILA 透過細粒度、基於 loss 的結構分析運作。本研究的核心是以層級與頭級為單位的重要度量測機制，利用專家模型生成的黃金資料集，量化每個注意力頭與 MLP 層的功能性貢獻。這些重要度訊號同時引導剪枝與自適應加權融合，使方法能選擇性地保留關鍵專家組件，同時降低不同專家模型之間的干擾。我們在醫療、程式設計與金融三個領域、以及多語言設定下評估本方法。結果顯示，HILA-Merging 持續優於基於幅度或啟發式的模型融合方法。在醫療領域的基準測試中，我們的方法在 MedQA 上超越最強基線最多 1.65 分，並在 MedMCQA 上提升 1.6 分。在程式設計任務中，我們的方法在 HumanEval-XL 上提升超過 6 分，展現對結構化推理任務的明顯增益。在金融領域中，HILA 在 II-Finance 上帶來約 0.4 至 0.6 分的提升，同時在 FinanceQA 上維持具競爭力的表現並保留專家能力。這些結果整體顯示，基於重要度、無需訓練的融合策略能在異質任務中帶來穩健的效能提升，並有效支持跨語言知識遷移。綜合上述發現，基於損失的結構引導為構建多語言、多領域的大型語言模型提供了一條具可擴展性與良好泛化能力的途徑，而無需額外訓練。本研究之原始碼已公開於:https://github.com/charlie963258/HILA-Merging;This work introduces HILA-Merging, a training-free method for integrating multilin- gual and domain-specialist knowledge within a shared LLM architecture. Our method addresses the challenge of transferring expertise across languages and domains without incurring the computational cost of multi-stage fine-tuning. HILA operates through fine- grained, loss-aware structural analysis. At the core of our work is a layerwise and head- wise importance measurement procedure that quantifies the functional contribution of each attention head and MLP layer using specialist-generated golden datasets. These im- portance signals guide both pruning and adaptive weighted merging, enabling our method to selectively preserve critical expert components while mitigating interference between specialists. We evaluate our method across medical, programming, and finance domains under multilingual settings. It demonstrates consistent improvements over magnitude-based and heuristic merging methods. In medical benchmarks, our method exceeds the strongest baseline by up to 1.65 points on MedQA and 1.6 points on MedMCQA. In programming, our method improves HumanEval-XL accuracy by more than 6 points, showing substantial gains in structured reasoning tasks. In finance, HILA improves II-Finance performance by approximately 0.4 to 0.6 points, while maintaining competitive results on FinanceQA and preserving specialist knowledge. These results collectively show that importance-guided, training-free merging provides reliable performance gains across heterogeneous tasks and supports effective cross-lingual transfer. Together, the findings indicate that loss-aware structural guidance offers a scalable and generalizable path toward building multilingual, multi-domain LLMs without additional training. We release the source code at: https://github.com/charlie963258/HILA- Merging
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	60	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....