DSpace community: 軟體工程研究所

基於視覺語言模型之中文手寫文稿識別：垂直轉水平佈局重整;Vertical to Horizontal Layout Rectification for VLM-based OCR of Handwritten Chinese Manuscripts

title: 基於視覺語言模型之中文手寫文稿識別：垂直轉水平佈局重整;Vertical to Horizontal Layout Rectification for VLM-based OCR of Handwritten Chinese Manuscripts abstract: 在文件處理（Document AI）領域中，針對以手機拍攝之稿紙中文手寫文章進行辨識仍具相當挑戰性。其主要原因在於，手寫稿紙常見的垂直且密集之網格佈局，與現代視覺語言模型（VLM）基於水平文本預訓練的歸納偏差存在嚴重的領域不匹配。儘管近年 VLM 在語意理解方面展現良好能力，直接應用於此類文件時，仍常受到解析度瓶頸與嚴重的閱讀順序幻覺等問題影響。為此，本研究提出 V2H-Rectify，一個無需訓練的前處理框架，將佈局重整視為一種顯式的視覺提示（Visual Prompting）策略。本框架的關鍵設計原則為將佈局重整與語意識別加以解耦，使前處理模組可作為即插即用的元件，相容於任意下游 OCR 引擎或 VLM，無需重新訓練或 LoRA 微調。V2H-Rectify 包含三個主要創新：(1) 集成傾斜估計（ESE）演算法，一個訊號驅動的感知模組，用以消除幾何失真；(2) 深度特徵引導的佈局分析方法，利用 CRAFT 區域分數推斷文件的邏輯拓樸結構；以及 (3) 閱讀順序重建機制，透過理解稿紙規則（由上而下、由右而左）來決定正確的文字拼接順序，將垂直排列的視覺標記重組為標準化的水平序列，有效地以符合模型預訓練分佈的表示形式進行「視覺提示」。我們在手機拍攝稿紙（MCGP）基準資料集（共 2,826 筆樣本，因學生隱私考量暫不公開）上驗證本方法。結果顯示，在 V2H-Rectify 的輔助下，一個 30 億參數的專用模型可達到 21.59\% 的字元錯誤率（CER）與 0.891 的結構相似度，相較於未經前處理之 Gemini 3 Pro 基線（54.72\% CER，相似度 0.596）有顯著改善。此外，當 V2H-Rectify 應用於 Gemini 3 Pro 時，其 CER 可進一步降低至 12.75\%，殘餘錯誤主要歸因於物理模糊與極端草書筆跡。實驗結果證實，顯式的文本線性化在處理非標準文件佈局時較單純擴展模型參數更為有效，可作為釋放基礎模型在分佈外文件場景中潛力的有效策略。;The recognition of mobile-captured Chinese handwritten essays on \textbf{grid paper} (\textit{稀紙}) remains a persistent challenge in Document AI, primarily due to the severe domain misalignment between vertical, dense layouts and the horizontal inductive biases of contemporary Vision-Language Models (VLMs). While VLMs possess strong semantic reasoning capabilities, their direct application to this domain suffers from resolution bottlenecks and severe reading-order hallucinations. To bridge this gap, we introduce V2H-Rectify, a training-free preprocessing framework that treats layout rectification as a form of explicit Visual Prompting. A critical design principle is the decoupling of layout rectification from semantic recognition, enabling the preprocessing module to operate as a plug-and-play component compatible with any downstream OCR engine or VLM, without retraining or LoRA adaptation. V2H-Rectify incorporates three key innovations: (1) the Ensemble Skew Estimation (ESE) algorithm, a signal-driven perception module that neutralizes geometric distortions; (2) a deep feature-guided layout analysis algorithm that leverages CRAFT region scores to robustly infer logical topology; and (3) reading order reconstruction, a text linearization mechanism that understands the grid paper rules (top-to-bottom, right-to-left) to reorganize vertical visual tokens into a standardized horizontal format, effectively ``prompting′′ the VLM with an in-distribution representation. We validate our approach on the Mobile-Captured Grid Paper (MCGP) benchmark ($N=2,826$), noting that the dataset is private due to student privacy constraints. Empirical results demonstrate that V2H-Rectify enables a specialized 3B-parameter model to achieve a Character Error Rate (CER) of 21.59\% and a structural Sequence Similarity of 0.891, significantly outperforming the Gemini 3 Pro baseline (54.72\% CER, 0.596 Ratio). Furthermore, when applied to Gemini 3 Pro, V2H-Rectify reduces CER to 12.75\% with residual errors attributed primarily to physical blur and extreme cursive handwriting. These findings confirm that explicit text linearization is a more effective lever than parameter scaling for unlocking foundation model capabilities in out-of-distribution document scenarios.

結合未來自我與專家外衣之AI學習教練系統設計與實證研究;Design and Empirical Study of an AI Learning Coach System Integrating Future Self and Mantle of the Expert

title: 結合未來自我與專家外衣之AI學習教練系統設計與實證研究;Design and Empirical Study of an AI Learning Coach System Integrating Future Self and Mantle of the Expert abstract: 在AI與教育深度融合的趨勢下，現行學習管理系統（LMS）多著重於知識面的學習支持，卻普遍忽略學生在情意層面，如時間管理、學習動機與面對挫折的支持需求。為回應此挑戰，本研究設計並實作一套結合「未來自我」與「專家外衣」理論之AI數位學習教練系統。該系統整合劇本式提示設計、提示工程技術與檢索增強生成（Retrieval-Augmented Generation, RAG）機制，提供具備情境脈絡與長期學習歷程的個人化教練回饋，以強化學生的學習投入與自我調節能力。本研究採準實驗設計，招募101位大學生修習日式餐旅實務課程，並隨機分為實驗組與對照組。兩組皆使用相同LMS平台進行學習，惟實驗組額外使用本研究設計之AI學習教練模組。透過前後測評量、問卷與系統紀錄分析，研究結果顯示：實驗組學生在學習成效、時間管理、恆毅力與成長型思維等情意面向均顯著優於對照組，並展現出更高的參與度與持續性。本研究提出之「雙重身份共振模型」，結合「現在的專業角色」與「未來的專業自我」的雙重激勵，透過AI教練角色設計實踐於LMS中，不僅提升學生的學習動機與管理能力，更提供教育現場一套可行且具擴充性的AI輔助教學解方，具有理論創新與實務應用之價值。 ;With the deep integration of artificial intelligence (AI) and education, current learning management systems (LMS) primarily focus on supporting cognitive learning but often overlook students’ affective needs, such as time management, learning motivation, and coping with setbacks. To address this gap, this study designs and implements an AI-based digital learning coach system that integrates the theories of Future Self and Mantle of the Expert (MoE). The system incorporates script-based prompt design, prompt engineering techniques, and a Retrieval-Augmented Generation (RAG) mechanism to provide personalized coaching feedback embedded within contextual and longitudinal learning trajectories, thereby enhancing students’ learning engagement and self-regulation skills. A quasi-experimental design was adopted, involving 101 undergraduate students enrolled in a Japanese hospitality practice course, who were randomly assigned to either the experimental or control group. Both groups used the same LMS platform; however, the experimental group had access to the AI learning coach module developed in this study. Data were collected through pre- and post-tests, questionnaires, and system log analysis. The results indicate that students in the experimental group significantly outperformed those in the control group in learning outcomes, time management, grit, and growth mindset, while also demonstrating higher levels of engagement and persistence. This study proposes the Dual Identity Resonance Model (DIRM), which combines the motivation derived from students’ “current professional roles” and their “future professional selves.” By embedding this dual-identity framework into the LMS via the AI coach design, the system not only enhances students’ learning motivation and self-management abilities but also provides a scalable and practical AI-assisted teaching solution. The findings contribute both theoretical innovation and practical implications for the integration of AI into education.

運動在提升壓力韌性中的角色：基於虛擬實境足球情境下的生理與認知表現;The Role of Exercise in Enhancing Stress Resilience: Physiological and Cognitive Performance in VR-Simulated Soccer

title: 運動在提升壓力韌性中的角色：基於虛擬實境足球情境下的生理與認知表現;The Role of Exercise in Enhancing Stress Resilience: Physiological and Cognitive Performance in VR-Simulated Soccer abstract: 在高壓運動情境中，有效的壓力調節對維持認知與生理表現至關重要。本研究探討習慣性運動是否能增強壓力韌性，透過虛擬實境（VR）足球模擬環境，觀察受試者的生理、神經與認知反應。60 名受試者依其運動習慣分為運動員組、規律運動組與一般組，並於不同壓力程度下執行休息、射門與守門任務。研究過程中同步記錄心率變異度（HRV）、腦電波（EEG）與眼動訊號。結果顯示，相較於缺乏運動者，規律運動者與運動員在壓力下展現更穩定的自律神經調節、注意力控制與任務表現，尤以額葉 theta 活性與 HRV 為主要差異指標。此外，機器學習模型亦能中等準確地辨識不同組別與任務的生理狀態，展現壓力自動化判讀的潛力。本研究強調運動對壓力管理的保護效果，並驗證 VR 在近真實情境中進行認知與生理評估的可行性。;Effective stress regulation is fundamental for maintaining cognitive and physiological performance in high-pressure sports scenarios. This study explores how habitual exercise influences stress resilience by examining physiological, neural, and cognitive responses in a virtual reality (VR)-based soccer environment. Sixty participants were classified into Athlete, Recreational, and General groups based on their exercise history. During rest, shooting, and goalkeeping tasks under varying stress levels, multimodal signals—including heart rate variability (HRV), electroencephalography (EEG), and eye-tracking—were continuously recorded. Results showed that individuals with regular physical training demonstrated better autonomic regulation, enhanced attentional control, and improved task performance under stress compared to non-exercisers. Frontal theta activity and HRV markers were especially effective in differentiating stress responses among groups. Machine learning classifiers further identified group- and task-specific patterns with moderate accuracy, suggesting the potential for automated stress profiling. These findings highlight the buffering effect of regular exercise on stress and demonstrate the utility of VR simulations for assessing cognitive performance in realistic yet controlled environments.

使用大型語言模型構建自動化問答系統;Automated Question-Answering System Using Large Language Models

title: 使用大型語言模型構建自動化問答系統;Automated Question-Answering System Using Large Language Models abstract: 語言模型資料集的人工標註一直是一件耗費人力的事，但隨著最近幾年開源大語言模型不斷的更迭，有越來越多人使用大型語言模型來協助資料集的產生。因此，本研究提出了一個全開源模型的架構，並以Gemma 2-27B模型來做為主要的語言模型，目的是能夠自動化的產生語言模型的訓練資料集，以達到節省人力的目的，並提升語言模型在量化標準上的表現。本研究將會驗證在進行微調和進行RAG的排列組合中，何種訓練方法的量化分數會最高，並會在實驗過程中加入思考鏈會不會提升量化分數。並且本研究將會以餘弦相似度和LLM-as-judge的指標來做為評量，並會與市面上的其他資料集作為比較。最後，會將此系統藉由ngrok技術部屬於Line bot上，以實現人機互動介面以及利用Prompt實現的簡易MCP Tool Calling，並能夠通過UI來靈活切換模型。;Manual generation of datasets for language models has long been a labor-intensive task. However, with the rapid evolution of open-source large language models in recent years, more and more researchers have begun leveraging LLMs to assist in dataset generation. Therefore, this study proposes a fully open-source architecture that leverages the Gemma 2-27B model as the core language model. The primary goal is to automate the generation of training datasets for large language models, thereby reducing human effort and improving performance on quantitative evaluation metrics. This research will explore which training strategies across combinations of fine-tuning and retrieval-augmented generation (RAG) will yield the highest quantitative scores. It will also examine whether incorporating chain-of-thought (CoT) reasoning during generation improves the results. Evaluation will be conducted using cosine similarity and LLM-as-a-judge metrics, and results will be compared against existing public datasets. Finally, the system will be deployed to a LINE Bot via ngrok, enabling a human-AI interactive interface and a lightweight MCP tool calling mechanism using prompt-based control. The user interface will also support dynamic model switching for flexible operation.