DSpace collection: 博碩士論文

基於視覺－語言模型之影像監控暴力行為偵測方法;Vision-Language Model–Based Approach for violence detection in Video Surveillance

Fri, 06 Mar 2026 10:15:19 GMT

title: 基於視覺－語言模型之影像監控暴力行為偵測方法;Vision-Language Model–Based Approach for violence detection in Video Surveillance abstract: 本研究旨在開發並評估一種基於視覺語言模型（Vision-Language Model, VLM）的暴力行為偵測系統。影片中的暴力行為偵測對於公共安全與監控系統而言相當重要，但真實世界中的影片往往具有畫質較低與場景複雜等問題。因此，本研究著重於提升VLM在真實環境下進行暴力行為偵測的效能。首先，本研究依據既有相關研究的常見設定建立一個基準模型（baseline model）。其次，採用零樣本（zero-shot）的 VLM，在不進行額外訓練的情況下評估其實際應用表現。第三，利用具標註的影片資料對VLM進行微調（fine-tuning），使其能更有效地適應暴力行為偵測任務。在微調過程中，模型學習更合適的視覺表徵，以提升對暴力行為的辨識能力。為了評估模型效能，本研究採用準確率（accuracy）、精確率（precision）、召回率（recall）以及 F1 分數等標準評估指標。所有實驗皆在相同條件下進行，以確保不同方法之間的比較具有公平性。實驗結果顯示，經過微調的VLM在準確率與F1分數方面皆優於基準模型與零樣本方法。此結果表示，微調能幫助模型更有效地擷取與暴力行為相關的視覺特徵。雖然零樣本模型具有高度彈性且不需額外訓練，其在真實場景中的表現仍屬可接受水準，僅略低於微調後的模型。整體而言，本研究所提出的方法具備良好的有效性與穩定性，並展現出應用於公共安全與監控系統中的實務潛力。;This study aims to develop and evaluate a violence detection system based on a Vision Language Model (VLM). Detecting violent actions in videos is important for public safety and surveillance, but real-world videos often have low quality and complex scenes. Therefore, this study focuses on improving VLM performance for real-world violence detection. First, a baseline model is implemented following common settings from previous work. Second, a zero-shot VLM is applied without additional training to evaluate its practical performance. Third, the VLM is fine-tuned using labeled video data to better adapt to the violence detection task. During fine-tuning, the model learns more suitable visual representations for recognizing violent actions. To evaluate performance, standard metrics such as accuracy, precision, recall, and F1-score are used. All experiments are conducted under the same conditions to ensure fair comparison. Results show that the fine-tuned VLM achieves higher accuracy and F1-score than both the baseline and the zero-shot approaches. This indicates that fine-tuning helps the model better capture visual patterns related to violence. Although the zero-shot model is flexible and requires no training, its performance remains acceptable and only slightly lower than the fine-tuned model in real-world scenarios. Overall, the proposed approach is effective and robust, showing strong potential for practical use in public safety and surveillance systems.

明代地名連結與歷史地圖對應：地理空間實體解析方法框架;Linking Ming Dynasty Toponyms with Historical Maps: A Geospatial Entity Resolution Framework

Fri, 06 Mar 2026 10:15:17 GMT

title: 明代地名連結與歷史地圖對應：地理空間實體解析方法框架;Linking Ming Dynasty Toponyms with Historical Maps: A Geospatial Entity Resolution Framework abstract: 中國數百年前的歷史段落時常蘊含豐富的基層社會流動與聚落資訊，是研究微觀歷史地理的重要史料。然而，傳統歷史地理資訊系統（HGIS）資料庫（如CCTS）多側重於縣級以上之行政地點，缺乏對村落層級的覆蓋，導致研究者難以定位方志中記載的庶民地名。此外，村落名稱高度重複（如「王家村」）所引發的空間歧義性，亦是自動化定位的主要挑戰。本研究提出一套整合深度學習與空間計算的地名消歧異框架。首先，利用客製化之命名實體識別模型（NER-MS）從明代段落中擷取地名；其次，鑑於明代地圖精度之不足，本研究採用1930年代河北省地圖作為地名資料來源；最後，建構基於行政邊界凸包（Convex Hull）與 5 公里緩衝區（Buffer）的空間過濾機制，有效降低同名異地之歧義問題。;Historical local records passages serve as invaluable repositories for understanding social mobility and settlement patterns. However, standard Historical Geographic Information System (HGIS) databases, such as the Chinese Civilization in Time and Space (CCTS), primarily index high-level administrative place names, lacking the granularity required to map village-level settlements. Furthermore, the prevalence of generic place names (e.g., ”Wang Village”) introduces significant spatial ambiguity, hindering auto mated localization efforts. This study proposes an integrated framework combining Named Entity Recognition (NER) with computational geometry to address these challenges. The methodology proceeds in three stages: first, extracting toponyms using a custom-trained NER model (NER-MS); second, utilizing 1930s historical maps of Hebei (河北) as a comparative reference materials to bridge the gap between pre-modern texts and modern coordinates; and third, implementing a spatial filtering mechanism based on administrative convex hulls augmented with a 5 km buffer to resolve homonymy.

透過溝通學習與知識結晶化提升低資源語言能力：以臺灣臺語為例;Enhancing Low-Resource Language Capabilities via Communicative Learning and Knowledge Crystallization A Case Study on Taiwanese Hokkien

Fri, 06 Mar 2026 10:15:15 GMT

title: 透過溝通學習與知識結晶化提升低資源語言能力：以臺灣臺語為例;Enhancing Low-Resource Language Capabilities via Communicative Learning and Knowledge Crystallization A Case Study on Taiwanese Hokkien abstract: 為了確保低資源語言適應過程中的知識可攜性與運作成本效益，本研究旨在解決大型語言模型在長程互動中面臨的指令碎片化、高昂推論成本以及規則遵循度下降等挑戰。本研究提出了一種「知識結晶化」機制以實現知識的可攜性，透過將短暫的對話互動提煉為高純度的語言規則，成功將閉源大型模型的語言專業知識遷移至更具成本效益的模型中。針對平行語料稀缺的「台灣台語」，本研究提出了一套結合溝通式學習與知識結晶化機制的創新「跨模型師生架構」。我們採用 Gemini 2.5 Pro 擔任「教師模型」，負責生成合成資料，並透過互動式對話指導 GPT-5.1 與 DeepSeek-v3 等「學生模型」。為解決直接堆疊原始對話所導致的指令碎片化與高推論成本，我們引入了「知識結晶化」機制。此機制的設計初衷並非僅為緩解記憶體限制，而是旨在透過「後設認知反思」，將稍縱即逝的互動過程提煉為高純度的語言規則。實驗結果顯示，在錯誤修正方面，互動式策略的表現顯著優於被動示範。值得注意的是，DeepSeek-v3 展現了卓越的適應性，能有效激活其潛在的多語言能力，在無需更新參數的情況下即逼近當前最先進LLM的水準。效率分析指出互動成效在第5輪達到峰值，實證了結晶化機制對於維持高資訊密度及優化 Token 消耗的必要性。本研究建立了一套具成本效益且免微調的範式，成功將通用模型與低資源語言對齊，為台灣台語的數位保存做出了貢獻。;To ensure instructional stability and economic efficiency in Low-Resource Language (LRL) adaptation, this study addresses the challenges of instructional fragmentation, high inference costs, and reduced rule-adherence during long-form LLM interactions. This study proposes a Knowledge Crystallization mechanism to achieve knowledge portability, distilling ephemeral interactions into high-purity rules to transfer linguistic expertise from closed-source giants to more cost-effective models. Focusing on Taiwanese Hokkien, a language characterized by limited parallel corpora, this study proposes a novel Cross-Model Teacher-Student framework that integrates communicative learning with a knowledge crystallization mechanism. We employ Gemini 2.5 Pro as the Teacher model to generate synthetic data and guide Student models (GPT-5.1 and DeepSeek-v3) through interactive dialogue. To address the instructional fragmentation and high inference costs associated with raw dialogue stacking, we introduce a "Knowledge Crystallization" mechanism. This process is designed not merely to accommodate memory constraints, but to distill ephemeral interactions into high-purity linguistic rules via metacognitive reflection. Experimental results demonstrate that interactive strategies significantly outperform passive demonstration in error correction. Notably, the DeepSeek-v3 model exhibits exceptional adaptability, effectively activating latent multilingual capabilities to approach SOTA levels without parameter updates. An efficiency analysis identifies a peak at 5 interaction turns, empirically validating that crystallization is essential for maintaining high information density and optimizing token consumption. This work establishes a cost-effective, fine-tuning-free paradigm for aligning general-purpose models with LRLs, contributing to the digital preservation of Taiwanese Hokkien.

應用於半導體領域之可靠知識生成與幻覺檢測雙模型語言系統;Dual-Model Language Systems for Reliable Knowledge Generation and Hallucination Detection in Semiconductor Applications

Fri, 06 Mar 2026 10:15:12 GMT

title: 應用於半導體領域之可靠知識生成與幻覺檢測雙模型語言系統;Dual-Model Language Systems for Reliable Knowledge Generation and Hallucination Detection in Semiconductor Applications abstract: 半導體產業是臺灣經濟的重要核心，其中任何微小的製程參數錯誤或推論偏差，都可能導致昂貴的後續失誤。雖然大型語言模型 (LLMs) 逐漸展現出在半導體工程中的應用潛力，但其生成內容仍可能出現不可靠或與事實不符的輸出，使其在高風險情境中難以直接部署。因此，提升模型在領域專屬任務上的「可靠性」成為關鍵研究課題。本論文提出一套旨在提高事實可靠性的雙模型架構。系統包含兩個核心組件： (1) 生成器：以 Qwen2.5-14B 為基礎，透過二億 token 半導體語料進行持續預訓練 (Continued Pretraining)，並利用 Chat Vector 進行推理能力增強，以強化領域知識內化； (2) 驗證器 (Verifier)：以約 8,000 筆領域問答資料進行微調，用於攔截任何與標準答案不符的輸出，採取召回導向的安全過濾策略，以降低錯誤資訊外洩的風險。在 1,000 題測試中，本系統優於業界標準 RAG 基準。純生成器模型透過知識內化，在準確率上超越 RAG (82.0% 對比 75.8%)；完整系統則優先考量安全性，將錯誤率壓低至 9.5% (顯著低於 RAG 的 24.2%)。雖然驗證器的保守過濾降低了覆蓋率，但有效攔截了不安全輸出並維持低延遲，符合半導體工程對高事實精準度的嚴格需求。綜合而言，本研究貢獻包含：(1) 提出首個專為半導體領域設計的生成器—驗證器雙模型可靠性框架；(2) 建構領域專屬的持續預訓練語料與驗證資料集；(3) 建立強調端到端安全性的評估方法。實驗結果顯示，召回優先的安全過濾機制能有效提升模型可靠性，為大型語言模型在半導體等高風險工程場域中的可信部署提供可行路徑。;The semiconductor industry is a cornerstone of Taiwan’s economy, where even small mistakes in process parameters or fabrication reasoning can cause costly downstream fail- ures. Although large language models (LLMs) are increasingly capable, their outputs may still contain incorrect or unverifiable statements, limiting safe deployment in semi- conductor engineering. Enhancing reliability in such high-stakes, domain-specific settings is therefore essential. This thesis proposes a dual-model framework to improve factual reliability in semi- conductor related LLM outputs. The system comprises: (1) a generator model, based on Qwen2.5-14B, adapted through continued pretraining on a 200M-token semiconductor cor- pus and reasoning alignment via Chat Vector; and (2) a lightweight Verifier fine-tuned on ∼8,000 domain QA pairs to filter outputs that deviate from ground-truth references. The Verifier follows a recall-oriented design that prioritizes intercepting potentially incorrect answers. On a 1,000-QA benchmark, the system outperformed an industry-standard RAG baseline. Specifically, the Generator-only model surpassed RAG in accuracy (82.0% vs. 75.8%) via knowledge internalization, while the full system prioritized safety, reducing the error rate to 9.5%—significantly lower than RAG’s 24.2%. Although conservative filtering reduced coverage, this trade-off effectively minimized unsafe outputs while maintaining practical latency. Overall, this work contributes: (1) a reliability-focused generator–verifier architec- ture for semiconductor engineering, (2) domain-specific datasets for continued pretraining and verification, and (3) an evaluation framework centered on safety metrics. The find- ings show that recall-oriented verification offers a viable path toward trustworthy LLM deployment in semiconductor workflows where factual correctness is critical.