隨著多模態技術的進步,多模態情感分析 (Multimodal Sentiment Analysis, MSA) 的概念已被提出並證明在多個應用中具有潛在價值。為了增強 MSA 模型的穩健性,資料增強是實現該目標的一個可行選項。然而,目前大多數增強方法主要集中在數據層面的增強。這些方法產生的增強數據在多模態情境中缺乏不同模態之間的隱藏互補信息,並且增強方法的靈活性也受到模態本身的限制。因此,我們提出了多模態遮罩變換器 (Multimodal Masking Transformer, MMT),這是一種用於嵌入層面的多模態資料擴增編碼器-解碼器網絡,用來增強現有的 MSA 任務數據。MMT 能夠捕捉不同模態之間的隱藏互補信息並克服模態之間的限制,為增強方法提供更高的靈活性。在本研究中,我們將 MMT 與多種 MSA 模型進行整合,並將 MMT 與最先進的嵌入層面的多模態資料擴增方法進行比較評估。此外,我們還進行了關於 MMT 增強影響的敏感性分析,以展示 MMT 在提高 MSA 任務效果方面的有效性。;With the advancement of multimodal techniques, the concepts of multimodal sentiment analysis (MSA) have been proposed and proven to have potential value in several applications. To enhance the robustness of models in MSA, augmentation is one of available options to achieve the goal. However, most of current augmentation methods focus on data-level augmentation. Such methods will generate augmented data lack of hidden information in multimodal scenarios, and also the flexibility of augmentation method is constrained by modalities. Thus, we propose the Multimodal Masking Transformer (MMT), an encoder-decoder network for embedding-level multimodal augmentation, to augment the existing data for current MSA task. The MMT is capable of capturing hidden information and overcoming the constraints among modalities, providing higher flexibility to the augmentation method. In this study, we integrate the MMT with multiple MSA models and evaluate the MMT against the state-of-the-art embedding-level multimodal augmentation methods. In addition, a sensitivity analysis about augmentation impact of MMT is conducted to demonstrate how effectively the MMT can improve MSA task.