基於深度座標卷積與自動編碼器給予行人實時路徑及終點位置精確預測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：43

、訪客IP：18.119.140.45

姓名

陳穎慶(Ying-Ching Chen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

基於深度座標卷積與自動編碼器給予行人實時路徑及終點位置精確預測
(Real-Time Path and Endpoint Precise Prediction of Pedestrian Trajectory Using Deep Coordinate Convolution and Autoencoder)

相關論文

★ 基於適應性徑向基神經網路與非奇異快速終端滑模控制結合線上延遲估測器應用於二軸機械臂運動軌跡精確控制	★ 新型三維光學影像量測系統之設計與控制
★ 新型雙紐線軌跡設計與進階控制實現壓電平台快速與精確定位	★ 修正式雙紐線軌跡結合自適應積分終端滑動模態控制與逆模型遲滯補償實現壓電平台精確追蹤
★ 以粒子群最佳化-倒傳遞類神經網路-比例積分微分控制器和影像金字塔轉換融合方法實現三維光學顯微影像系統	★ 以局部熵亂度分布與模板匹配方法結合自適應ORB特徵提取達成多影像精確拼接
★ 低扭矩機械手臂機構開發與脈寬調變進階控制器設計	★ 使用時域門控與梅森增益公式構建四埠夾具的散射參數表徵

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-1以後開放)

摘要(中)

隨著深度學習電腦視覺技術在自動駕駛系統與機器人視覺的應用，使得執
行速度與判斷準確度要求日益增加，行人路徑預測已經開始成為當前研究焦點，它可以透過預測影像中行人的移動路徑，在自動駕駛安全系統上可以給予系統更多的反應時間，同時也希望可以減少誤判的機率。另外，在機器人應用領域上可以讓機器人預測並理解人類的移動路徑，進而使其與人有更好的互動與協作能力。然而，行人的真實移動具有其物理性質，當行人在行走時，會考慮到周遭環境，例如: 車道、圍牆、草皮、路樹等，同時也會受其他行人影響自身的移動路徑，上述的問題是很難或無法將其量化或數學化。為了有效解決上述問題，本論文會從時間與空間兩個面向來探討與處理，首先時間的部份，透過卷積神經網路(CNN)來學習行人在不同環境的移動特徵，用以預測行人後續的移動路徑，在空間的部份，透過加入環境資訊圖來協助模型能在預測路徑時達成生成合理的預測。簡言之，本論文將結合座標卷積與自動編碼器運用於行人軌跡預測，所提出之方法可以改善預測的準確度，並同時達成實時預測的要求。最後，藉由大量的測試資料來證明所提出方法明顯優於使用時間序列類型之預測模型方法。

摘要(英)

With the increasing demand for execution speed and judgment accuracy of deep learning computer vision technology applicated in the automatic driving system and robot vision, the task of pedestrian trajectory prediction has become the research focus bin predicting the moving trajectory of pedestrians by the frame image. The automatic driving safety system can give the system more response time and reduce the probability of prediction error. And in robotics, the robot can understand human movement trajectory so that it can cooperate better. However, the movement of pedestrians is not rigid body motion that exists in the physical properties. When pedestrians walk, they should consider the surrounding environment, such as lanes, walls, lawns, roadside trees, etc., and also interact with other pedestrians. However, those mentioned above cannot be quantified or mathematical, which is currently a critical problem to overcome.
To effectively address the issues mentioned earlier, this thesis will handle them from time and spatial aspects. For the time aspect, we utilize CNN to learn the pedestrian’s trajectory features and predict their future trajectory in different environments. For the spatial aspect, we incorporate scene information maps to assist the model in generating more reasonable results during the trajectory prediction process. In brief, this work proposes a novel method that combines CoordConv with autoencoders for pedestrian trajectory prediction, and it can improve accuracy and generate pedestrian trajectories efficiently, achieving real-time prediction levels. Finally, we demonstrate the feasibility of the proposed method through extensive testing data, and the results are superior to many RNN-based model predictions.

關鍵字(中)

★ 深度學習
★ 行人預測路徑
★ 座標卷積
★ 自動編碼器

關鍵字(英)

★ Deep learning
★ pedestrian trajectory prediction
★ CoordConv
★ autoencoder

論文目次

摘要 i
ABSTRACT ii
誌謝 iv
Table of Content v
List of Figures vii
List of Tables ix
Explanation of Symbols x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Literature Survey 3
1.2.1 Human Trajectory 3
1.2.2 Trajectory Prediction 6
1.3 Contribution 11
1.4 Thesis Organization 13
Chapter 2 Preliminaries 15
2.1 Pedestrian Trajectory 15
2.1.1 Obstacle Effect 15
2.1.2 Trajectory Multimodal 17
2.2 Convolution Neural Networks (CNN) 18
2.3 Coordinate Convolution Neural Networks 21
2.4 Convolution Autoencoder 25
Chapter 3 Pedestrians Trajectory Processes 28
3.1 The Pre-Process of the Input Data 28
3.1.1 Scene Information Map 29
3.1.2 Trajectory Gradient Heatmap 30
3.1.3 The Downsample for the Input Data 33
3.2 The Label Data Forms 35
3.3 The Data Augmentation 38
Chapter 4 Deep CoordConv Autoencoder Network for Trajectory Prediction 40
4.1 Input Data and Label Data Form 40
4.2 Design of CoordConv Autoencoder Network Structure 42
4.3 Deep CoordConv Autoencoder Network 47
4.4 Multimodality Endpoint Sampling 49
4.5 Loss Function 56
4.6 Implement Detail 57
Chapter 5 Experiments 61
5.1 Datasets 61
5.2 Trajectory Prediction Result 63
5.3 Evaluation Metrics 70
5.4 Comparison of Predicted Trajectories 71
5.5 Ablation Experiment 75
5.6 Long-term and Short-term Trajectory Prediction 78
Chapter 6 Conclusions 82
Reference 83

參考文獻

[1] D. Helbing, I. Farkas, and T. Vicsek, “Simulating dynamical features of escape panic,” Nature, vol. 407, no. 6803, pp. 487-490, 2000.
[2] J. van den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” Robotics Research, Berlin, Germany: Springer, pp. 3-19, 2011.
[3] Y. Yao, E. Atkins, M. J. Roberson, R. Vasudevan, and X. Du “BiTraP: bi-directional pedestrian trajectory prediction with Multi-modal goal estimation,” IEEE Robotics and Automation Letters, vol. 6, pp. 1463-1470, 2021.
[4] R. Akabane and Y. Kato, “Pedestrian trajectory prediction based on transfer learning for human-following mobile robots,” IEEE International Conference on Big Data, pp. 3453-3458, 2020.
[5] X. Song, K. Chen, X. Li, J. Sun, B. Hou, Y. Cui, B. Zhang, G. Xiong, and Z. Wang, “Pedestrian trajectory prediction based on deep convolutional lstm network,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3285-3302, 2021.
[6] G. Agrim, J. Justin, F. F. Li, S. Silvio, and A. Alexandre, “Social gan: socially acceptable trajectories with generative adversarial networks,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2255-2264, 2018.
[7] S. Kim, S. J. Guy, W. Liu, R. W. Lau, M. C. Lin, and D. Manocha, “Predicting pedestrian trajectories using velocity-space reasoning,” Algorithmic Foundations of Robotics X, pp. 609-623, 2013.
[8] Z. Chen, L. Wang, and N. H. C. Yung, “Adaptive human motion analysis and prediction,” Science Direct Pattern Recognition, vol. 44, no. 12, pp. 2902-2914, 2011.
[9] C. Barate, J. C. Nascimento, J. M. Lemos, and J. S. Marques, “Sparse motion fields for trajectory prediction,” Science Direct Pattern Recognition, vol. 110, 1107631, 2021.
[10] A. Alexandre, G. Kratarth, R. Vignesh, R. Alexandre, F. F. Li, and S. Silvio, “Social lstm: human trajectory prediction in crowded spaces,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 961-971, 2016.
[11] I. J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, pp. 2672-2680, 2014.
[12] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” International Conference on Learning Representations, pp. 1-14, 2017.
[13] A. Vemula, K. Muelling, and J. Oh, “Social attention: modeling attention in human crowds,” IEEE International Conference on Robotics and Automation, pp. 4601-4607, 2018.
[14] F. Bartoli, G. Lisanti, L. Ballan, and A. D. Bimbo, “Context-aware trajectory prediction,” 24th International Conference on Pattern Recognition, pp. 1941-1946, 2018.
[15] Z. Huang, J. Wang, L. Pi, X. Song, L. Yang, “LSTM based trajectory prediction model for cyclist utilizing multiple interactions with environment,” Science Direct Pattern Recognition, vol. 112, 107800, 2021.
[16] M. Pfeiffer, G. Paolo, H. Sommer, J. Nieto, and R. Siegwart, “A data-driven model for interaction-aware pedestrian motion prediction in object cluttered environments,” IEEE International Conference on Robotics and Automation, pp. 1-8, 2018.
[17] K. Mangalam, Y. An, H. Girase, and J. Malik, “From goals, waypoints & paths to long term human trajectory forecasting,” IEEE/CVF International Conference on Computer Vision, pp. 15213-15222, 2021.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
[19] K. Mangalam, H. Girase, S. Agarwal, K. H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: end-point conditioned trajectory prediction,” European Conference on Computer Vision, pp. 759-776, 2020.
[20] Y. Lecun, P. Haffner, L. Bottou, and Y. Bengio, “Object recognition with gradient-based learning,” Shape, Contour and Grouping in Computer Vision, pp. 319–345, 1999.
[21] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” International Conference on Learning Representations, pp. 1-15, 2015.
[22] R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coorconv solution,” Neural Information Processing Systems, pp. 9628-9639, 2018.
[23] P. Baldi, “Autoencoders, unsupervised learning and deep architectures,” Unsupervised and Transfer Learning workshop, vol. 27, pp. 37-50, 2011.
[24] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–17, 2020.
[25] A. C. Bovik, “Bilinear interpolation,” The Essential Guide to Image Processing, pp.43-68, 2009.
[26] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[27] D. A. Ckevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus), ” International Conference on Learning Representations, pp. 1-14, 2016.
[28] S. Pellegrini, A. Ess, K. Schindler, and K. Schindler, “You′ll never walk alone: modeling social behavior for multi-target tracking,” IEEE 12th International Conference on Computer Vision, pp. 261-268, 2009.
[29] A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Computer Graphics Forum, vol. 26, no. 3, pp. 655-664, 2007.
[30] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: human trajectory understanding in crowded scenes,” European Conference on Computer Vision, pp.549-565, 2016.
[31] H. Caesar, J. Uijlings, and V. Ferrari, “Coco-stuff: thing and stuff classes in context,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1209-1218, 2018.
[32] R. Liang, Y. Li, X. Li, Y. Tang, J. Zhou, and W. Zou, “Temporal pyramid network for pedestrian trajectory prediction with multi-supervision,” 35th AAAI Conference on Artificial Intelligence, pp. 2029-2037, 2021.
[33] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional lstm network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, vol. 28, pp. 1-9, 2015.
[34] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” IEEE International Conference on Image Processing, pp. 3645-3649, 2017.
[35] A. Bewley, G. Zongyuan, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” IEEE International Conference on Image Processing, pp. 3464-3468, 2016.
[36] B. Benfold and I. Reid, “Stable multi-target tracking in real-time surveillance video,” Conference on Computer Vision and Pattern Recognition, pp.3457-3464, 2011.
[37] J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, L. F. Fei, “Peeking into the future: predicting future person activities and locations in videos,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5718-5727, 2019.
[38] Y. Yuan, X. Weng, Y. Ou, and K. Kitani, “Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting,” IEEE/CVF International Conference on Computer Vision, pp. 9793-9803, 2021.
[39] A. Vaswani, N. Shazzer, N. Parmar, J. Uskoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Neural Information Processing Systems, pp. 1-11, 2017.

指導教授

吳俊緯(Jim-Wei Wu)

審核日期

2023-8-14

推文