隨著深度學習技術地快速發展,對行動惡意程式的偵測任務有了突破性的進展。然而,基於時間序列的深度學習模型,在輸入長序列特徵時,仍然會因為遞歸神經網路的記憶限制,產生梯度消散的問題。因此,後續有許多研究針對長序列特徵提出特徵壓縮、提取方法,但目前尚未發現有研究能在壓縮序列的同時,仍能涵蓋原始序列的完整特徵資訊與語意的時序關係。因此,本研究提出一個多模型惡意程式偵測架構,著重在涵蓋全局特徵的前提下,壓縮特徵間仍能保有部份的時序關係,並在整合多頭注意力(Multi-head Attention)機制後,改善遞歸神經網路的記憶問題。模型分為兩個階段執行:前處理階段,主要針對Android底層操作碼(Dalvik Opcode)進行分段、統計,後續輸入 Bi-LSTM進行語意萃取,此階段有助於將原始Opcode序列進行壓縮,產生富有時序意義的語意區塊序列,作為下游分類器的分類特徵;在分類階段,本研究改良Transformer模型,由Multi-head Attention機制對序列特徵進行有效率的專注,後續加入全局池化層(Global Pooling Layer),強化模型對數據的敏感度,並進行降維,減少模型的過度擬合。實驗結果顯示在多家族分類的偵測準確率達99.30%,且二元分類、小樣本分類效能相比現有研究皆有顯著的提升,此外,本研究亦進行多項消融測試證實各個模型在整體架構中的重要性。;With the rapid development of deep learning technology, the task of detecting mobile malware has made breakthrough progress. However, the deep learning model based on time series still has the problem of gradient vanishing due to the memory limitation of the recurrent neural net-work when inputting long sequence features. Many researchers have proposed feature com-pression and extraction methods for processing the long sequence features, but no research has been found that can compress the sequence while retaining the global features of the original sequence and the semantic relationship. Therefore, we propose a multi-model malware detection architecture that focuses on holding the whole global features while retaining partial timing rela-tionships among compressed features. We also apply the Multi-head Attention mechanism to improve the memory problem of the recurrent neural network. The model is executed in two stages: the pre-processing stage, which mainly performs segmentation and statistics for the An-droid underlying operation code (Dalvik Opcode), and then enters Bi-LSTM for semantic ex-traction. This stage helps to compress the original Opcode sequence to generate Semantic block sequences feature rich in temporal significance are used as the classification features of down-stream classifiers; in the classification stage, this research improves the Transformer model, and uses the Multi-head Attention mechanism to focus on block sequence features efficiently, and then adds the global pooling layer (Global Pooling Layer), strengthen the sensitivity of the model to the block feature, and reduce the dimensionality to reduce the over-fitting of the model. Experimental results show that the detection accuracy of multi-family classification is 99.30%, and the performance of binary classification and small sample classification have been signifi-cantly improved. In addition, this study also conducted multiple ablation tests to confirm the importance of each model in the overall architecture.