深度類神經網路(Deep neural network)在自然語言處理的領域中有驚人的效果,機器翻譯是自然語言處理中一個重要項目,主要依賴於兩種類神經網路架構,卷積神經網路(Convolutional neural network)與遞迴神經網路(Recurrent neural network),但機器翻譯的結果好壞取決於翻譯語言的詞彙、文法結構,往往深度模型翻譯出來的句子都存在著文法不通順、雙語語句或詞彙無法對齊等問題,而近年來Google團隊提出不使用卷積神經網路與遞迴類神經網路的注意力模型—Transformer,只使用編碼器與解碼器模型加上注意力(Attention)機制在機器翻譯上就有顯著結果,本論文提出的架構就是以Transformer為基底模型,模型由多層式的編碼器與解碼器組成,利用多頭注意力機制,將翻譯的來源語言句子與目標語言句子進行相似度配對,來對齊兩語言各個詞彙,本論文的目標為提升翻譯模型的質量,使翻譯的結果更加精進,所以本論文提出的架構對transformer做了改進,將殘差與密集連接運用於transformer中,避免在模型計算注意力時,因為多層的傳遞造成資訊遺失,因此將前層的資訊與後層做連接,藉此優化訓練的模型。最後在實驗上,會將提出的方法與transformer原始模型應用於英中翻譯系統,並使用機器翻譯評估方法BLEU與WER進行比較,結果顯示提出注意力模型的翻譯效果比transformer模型好。;Deep neural network (DNN) has performed impressively in the natural language processing. Machine Translation is one of the important project in natural language processing. It depends on two kinds of neural network architectures, convolutional neural network (CNN) and recurrent neural network(RNN). But the result of machine translation is based on the vocabulary and grammatical structure, the sentences translated by the deep learning model may cause some problems such as grammar errors and bilingual vocabulary misalignment. In recent years, the Google team propose attention model--transformer which does not use convolutional neural network and recurrent neural network, and get significant result by using the attention mechanism on encoder and decoder. The architecture proposed in this paper is based on transformer. The model consists of multilayer encoder and decoder. Using multi-head attention to match the sentence of source language with the sentence of target language and align the vocabulary of two languages. The goal in this paper is to improve the quality of translation results, so we propose the architecture which applies residual and dense connection on transformer to avoid information loss. Therefore, back layer is connected with the previous layer to optimize the model. Finally, we will apply proposed architecture and baseline model on English-Chinese translation system in the experiment, and use BLEU and WER to compared two translate sentence. And the translation result of proposed attention architecture is better than baseline model.