本研究提出一套青少年自殺風險的多模態篩檢框架,將模型推論的序列文本分數與行為及生理訊號進行融合。系統透過一套 CORN 架構的序列回歸模型,將受試者的自由文字回答轉換為對應問卷項目的嚴重程度分數,藉此避免人工填寫 PHQ/GAD 等量表,同時保留臨床可解釋性。上述文本分數與語音韻律、臉部表情、心率變異度(HRV)與眼動等模態特徵,經由注意力式聚合網路共同融合以進行風險預測。為解決類別不平衡問題,訓練過程中採用了加權損失與加權抽樣策略。在虛擬心理健康訪談系統(VMHI)所蒐集的真實青少年資料集上進行實驗,結果顯示二分類任務中準確率達 90%,macro F1 分數為 0.90;三分類設定下亦達到 90% 準確率與 0.84 的 macro F1。消融實驗進一步證實,整合序列文本分數能顯著提升分類效能。研究結果顯示,將語義對齊的文字表示與多模態行為訊號結合,有助於建立一套具備可擴展性、可解釋性,且能降低臨床負擔的早期自殺風險篩檢工具,具備應用於不同場域及早識別潛在風險之潛力。;This paper proposes a multimodal suicide risk screening framework for adolescents that integrates model-inferred ordinal text scores with behavioral and physiological signals. Free-text responses are mapped to questionnaire-aligned severity scores using a CORN-based ordinal regression model, eliminating the need for manual PHQ/GAD scoring while preserving clinical interpretability. These scores are fused with features from prosodic speech, facial expressions, heart rate variability (HRV), and eye movement patterns via an attention-based aggregation network. To address class imbalance, the model employs weighted loss and a weighted sampler during training. Experiments on a real-world adolescent dataset collected via a Virtual Mental-Health Interviewer (VMHI) demonstrate robust performance: 90% accuracy and 0.90 macro F1 in binary classification, and 90% accuracy and 0.84 macro F1 in the three-class setting. Ablation studies show that integrating ordinal text scores significantly enhances classification. These results suggest that combining question-aligned language representations with multimodal behavioral data provides a scalable, interpretable, and low-burden approach to early suicide risk screening. This approach also has the potential to reduce clinical workload and facilitate earlier mental health intervention.