博碩士論文 109521001 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:32 、訪客IP:18.118.28.200
姓名 陳子捷(TZU-CHIEH CHEN)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 用於動作辨識中的時空圖卷積網路之可重構硬體架構設計
(A Reconfigurable Hardware Architecture for Spatial Temporal Graph Convolutional Network in action recognition)
相關論文
★ 即時的SIFT特徵點擷取之低記憶體硬體設計★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2025-8-31以後開放)
摘要(中) 人類的動作辨識可以使用各種不同的數據當作輸入,主要有RGB圖像或是骨架數據等。傳統方法使用RGB圖像當作輸入,並採用了卷積神經網路(CNN)或循環神經網路(RNN)模型。然而,由於圖片中存在各種噪音,導致這些方法的準確性很低。為了提高辨識的準確性,一些研究探討了使用骨架數據做為替代的輸入。儘管如此,CNN或RNN模型無法充分利用骨架數據中存在的空間和時間關係,因此限制了模型的有效性。近年來,圖卷積網路(GCN)因為其在社會網路分析以及推薦系統等任務中的廣泛適用性而獲得了極大的關注。GCN特別適用於處理非歐基里德的數據,像是人體骨骼關節。與RGB圖像不同的是,它不會受到環境因素的影響。然而,由於GCN的計算複雜性以及數據稀疏性,當使用在CPU或是GPU平台上時,往往會導致高延遲以及低功率效率。為了面對這些挑戰,設計專門的硬體加速器起到了至關重要的作用。時空圖卷積網路(ST-GCN)是一個廣泛用於人類動作辨識的模型。在本文中,我們為ST-GCN提出了一個高度平行化運算且靈活的架構。我們的架構包含了通用型的處理元件(PE),它們可以被歸類為組合引擎和聚合引擎來計算GCN層。這些PE也可以在處理TCN層時相互連接。根據我們提出的方法,此加速器還具有良好的可擴展性。我們在ASIC和FPGA平台上都實現了該硬體設計。與其他一樣為ST-GCN實現硬體設計的論文相比,我們提出的方法實現了高達39.5%的延遲降低以及高達2.23倍的功率效率提高。
摘要(英) Human action recognition can leverage various data inputs, including RGB images and skeleton data. Traditional approaches utilize RGB images as input and employ convolutional neural network (CNN) or recurrent neural network (RNN) models. However, these methods suffer from low accuracy due to the presence of various background noises in the images. In an attempt to enhance accuracy, some studies have explored the use of skeleton data as an alternative input. Nevertheless, CNN or RNN models are unable to fully exploit the spatial and temporal relationships inherent in the skeleton data, limiting their effectiveness. In recent years, Graph Convolutional Networks (GCNs) have gained significant attention due to their wide applicability in tasks such as social network analysis and recommendation systems. GCNs are particularly suitable for processing non-Euclidean data, such as human skeleton joints, which remain unaffected by environmental factors unlike RGB images. However, the computational complexity and data sparsity of GCNs often result in high latency and low power efficiency when deployed on CPU or GPU platforms. To address these challenges, dedicated hardware accelerators play a crucial role. In this paper, we propose a highly parallelized and flexible architecture for Spatial-Temporal Graph Convolutional Networks (ST-GCN), a widely used model in human action recognition. Our architecture incorporates general Processing Elements (PEs) that can be grouped into combination engines and aggregation engines to compute GCN layers. These PEs can also be interconnected while processing TCN layers. The accelerator also has high scalability due to our proposed method. We implemented our design on both ASIC and FPGA platforms. Compared to other works that also implement hardware designs for ST-GCN, our proposed method achieves up to 39.5% reduction in latency and up to 2.23x improvement in power efficiency.
關鍵字(中) ★ 圖卷積網路
★ 硬體加速器
★ 可重構架構
★ 特殊應用積體電路
★ 現場可程式化邏輯閘陣列
關鍵字(英) ★ graph convolutional neural network
★ hardware accelerator
★ reconfigurable architecture
★ ASIC
★ FPGA
論文目次 摘要 I
ABSTRACT II
1.序論 1
1.1研究背景與動機 1
1.2論文架構 5
2.文獻探討 6
2.1時空圖卷積網路(ST-GCN) 6
2.2ST-GCN硬體加速器 11
3.硬體架構設計 16
3.1整體硬體架構 16
3.2組合引擎模組設計 18
3.3聚合引擎模組設計 20
3.4組合引擎與聚合引擎的合作方式 22
3.5處理TCN層的運算 24
4.硬體實現結果 25
4.1逐層分析 25
4.2與相關ASIC加速器比較 28
4.3與相關FPGA加速器比較 31
5.結論 35
參考文獻 36
參考文獻 [1] Yin, Jun, et al. "A skeleton-based action recognition system for medical condition detection." 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS). IEEE, 2019.
[2] Lin, Weiyao, et al. "Human activity recognition for video surveillance." 2008 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2008.
[3] Ma, Zaosheng. "Human action recognition in smart cultural tourism based on fusion techniques of virtual reality and SOM neural network." Computational Intelligence and Neuroscience 2021 (2021).
[4] Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in neural information processing systems 27 (2014).
[5] Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.
[6] Zhang, Pengfei, et al. "EleAtt-RNN: Adding attentiveness to neurons in recurrent neural networks." IEEE Transactions on Image Processing 29 (2019): 1061-1073.
[7] BANERJEE, Avinandan; SINGH, Pawan Kumar; SARKAR, Ram. Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition. IEEE transactions on circuits and systems for video technology, 2020, 31.6: 2206-2216.
[8] DING, Zewei, et al. Investigation of different skeleton features for cnn-based 3d action recognition. In: 2017 IEEE International conference on multimedia & expo workshops (ICMEW). IEEE, 2017. p. 617-622.
[9] DU, Yong; FU, Yun; WANG, Liang. Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, 2015. p. 579-583.
[10] DU, Yong; WANG, Wei; WANG, Liang. Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1110-1118.
[11] ZHANG, Songyang, et al. Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Transactions on Multimedia, 2018, 20.9: 2330-2343.
[12] LI, Chuankun, et al. Skeleton-based action recognition using LSTM and CNN. In: 2017 IEEE International conference on multimedia & expo workshops (ICMEW). IEEE, 2017. p. 585-590.
[13] KIPF, Thomas N.; WELLING, Max. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[14] CHIANG, Wei-Lin, et al. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019. p. 257-266.
[15] PEI, Hongbin, et al. Geom-gcn: Geometric graph convolutional networks. arXiv preprint arXiv:2002.05287, 2020.
[16] ABU-EL-HAIJA, Sami, et al. N-gcn: Multi-scale graph convolution for semi-supervised node classification. In: uncertainty in artificial intelligence. PMLR, 2020. p. 841-851.
[17] YOU, Yuning, et al. L2-gcn: Layer-wise and learned efficient training of graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 2127-2135.
[18] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." Thirty-second AAAI conference on artificial intelligence. 2018.
[19] LIANG, Shengwen, et al. Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Transactions on Computers, 2020, 70.9: 1511-1525.
[20] YAN, Mingyu, et al. Hygcn: A gcn accelerator with hybrid architecture. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020. p. 15-29.
[21] GENG, Tong, et al. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2020. p. 922-936.
[22] KININGHAM, Kevin; LEVIS, Philip; RÉ, Christopher. GRIP: A graph neural network accelerator architecture. IEEE Transactions on Computers, 2022.
[23] CHEN, Ming, et al. Simple and deep graph convolutional networks. In: International conference on machine learning. PMLR, 2020. p. 1725-1735.
[24] LI, Jiajun, et al. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021. p. 775-788.
[25] Ding, Luchang, Zhize Huang, and Gengsheng Chen. "An FPGA implementation of GCN with sparse adjacency matrix." 2019 IEEE 13th international conference on ASIC (ASICON). IEEE, 2019.
[26] Pei, Songwen, et al. "STARS: Spatial Temporal Graph Convolution Network for Action Recognition System on FPGAs." 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2021.
[27] WU, Weiwei, et al. STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.
[28] Kay, Will, et al. "The kinetics human action video dataset." arXiv preprint arXiv:1705.06950 (2017).
[29] Shahroudy, Amir, et al. "Ntu rgb+ d: A large scale dataset for 3d human activity analysis." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[30] HE, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.
指導教授 蔡宗漢(TSUNG-HAN TSAI) 審核日期 2023-8-9
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明