基於雙流卷積神經網路的三百六十度視訊等距長方投影之行人追蹤

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：155

、訪客IP：3.15.197.123

姓名

黃郁婷(Yu-Ting Huang) 查詢紙本館藏

畢業系所

通訊工程學系

論文名稱

基於雙流卷積神經網路的三百六十度視訊等距長方投影之行人追蹤
(Pedestrian Tracking Based on Two-flow Convolutional Neural Network for Equirectangular Projection of 360-degree Videos)

相關論文

★ 應用於車內視訊之光線適應性視訊壓縮編碼器設計	★ 以粒子濾波法為基礎之改良式頭部追蹤系統
★ 應用於空間與CGS可調性視訊編碼器之快速模式決策演算法	★ 應用於人臉表情辨識之強健式主動外觀模型搜尋演算法
★ 結合Epipolar Geometry為基礎之視角間預測與快速畫面間預測方向決策之多視角視訊編碼	★ 基於改良式可信度傳遞於同質區域之立體視覺匹配演算法
★ 以階層式Boosting演算法為基礎之棒球軌跡辨識	★ 多視角視訊編碼之快速參考畫面方向決策
★ 以線上統計為基礎應用於CGS可調式編碼器之快速模式決策	★ 適用於唇形辨識之改良式主動形狀模型匹配演算法
★ 以運動補償模型為基礎之移動式平台物件追蹤	★ 基於匹配代價之非對稱式立體匹配遮蔽偵測
★ 以動量為基礎之快速多視角視訊編碼模式決策	★ 應用於地點影像辨識之快速局部L-SVMs群體分類器
★ 以高品質合成視角為導向之快速深度視訊編碼模式決策	★ 以運動補償模型為基礎之移動式相機多物件追蹤

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

對等距長方圖投影(equirectangular mapping projection, ERP)進行的行人追蹤時，因 ERP各區域不同程度的幾何失真，使多數現有追蹤器準確率降低。另外，360度視訊的高畫面率與高空間解析度導致高計算複雜度。因此，本論文提出採用雙流卷積神經網路 (two-flow convolutional neural network)為追蹤架構，且因不須於線上再訓練與更新神經網路參數，而可以高速對360度視訊進行追蹤，目前畫面的搜索視窗及目標模版之輸入，以卷積神經網路(convolutional neural network, CNN)各擷取階層式特徵，使卷積特徵兼具空間及多層特徵資訊。因應目標物於ERP影像不同區域的不均勻幾何失真，網路預測的邊界框(bounding, box)與目標模版的相似度為目標模板更新之標準。其中，相似度計算僅採用目標模版的強健特徵，以提升相似度量測的可靠性。此外，訓練採用的損失函數(loss function) 將依據預測座標狀態而採用L1與GIoU (generalized intersection over union, GIoU)，透過採用GIoU loss降低神經網路對目標物大小之敏感度。實驗結果顯示本論文提出之方案，在目標有小幅度的縮放時，有著比SiamFC追蹤器更好的追蹤效果。

摘要(英)

Non-uniform geometric distortions of the equirectangular projection (ERP) of 360-degree videos decreases tracking accuracy of most existing trackers. In addition, the high frame rate and spatial resolution of 360-degree videos cause high computational complexity. Hence, this thesis proposes a two-flow convolutional neural network that measures similarity of two inputs for pedestrian tracking on 360-degree videos. High-speed tracking is achieved since on-line re-training and update of the neural network model is not applied. Both the hierarchically spatial and convolutional features are extracted from the search window of the current frame and the target template to improve tracking accuracy. The tracker will update the target template by the similarity between the bounding box of the network prediction and the target template. In addi-tion, to improve the reliability of the similar measurement, the similarity calculation only uses the robust features of the target template. At the training stage, the loss function considers either the L1 loss or the generalized intersection over union (GIoU) according to the predicted location of the bounding box of the target. Experimental results show that the proposed scheme has a better tracking effect than the SiamFC tracker when the target has a small zoom.

關鍵字(中)

★ 行人追蹤
★ 三百六十度視訊
★ 等距長方圖投影
★ 雙流卷積網路
★ 損失函數

關鍵字(英)

★ pedestrian tracking
★ 360-degree videos
★ equirectangular projection (ERP)
★ two-flow convolutional neural network
★ loss function

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 ix
第一章緒論 1
1.1 前言 1
1.2 研究動機 1
1.3 研究方法 2
1.4論文架構 3
第二章基於雙流卷積神經網路之視覺追蹤技術介紹 4
2.1基於雙流網路架構之視覺追蹤(Visual Tracking Based on Two-flow Network Architecture) 4
2.2總結 7
第三章基於等距長方圖投影之360度視訊視覺追蹤技術介紹 8
3.1 等距長方圖投影原理 8
3.2 基於等距長方圖投影之視覺追蹤(Visual Tracking Based on Equirectangular Mapping Projection) 10
3.3 總結 10
第四章本論文所提出之三百六十度視訊等距長方圖投影的行人追蹤方案 11
4.1系統架構 12
4.2本論文所提出之網路架構(Network Architecture) 12
4.3相似度量測 15
4.4 追蹤階段(Tracking Stage) 18
4.5 訓練階段(Training Stage) 19
4.5.1 訓練資料 20
4.5.2 數據擴增(Data Augmentation) 21
4.5.3 損失函數(Loss Function) 25
4.6 總結 26
第五章實驗結果與討論 27
5.1 實驗參數與測試影片規格與SiamFC簡介 27
5.2 追蹤系統實驗結果 29
5.2.1基於均方根誤差(Root Mean Square Error)之追蹤準確率 30
5.2.2 重疊率(Overlap Ratio)之追蹤準確率 43
5.2.3 時間複雜度(Time Complexity) 46
5.3 總結 47
第六章結論與未來展望 48
參考文獻 49

參考文獻

[1] M. Zhou, AHG8: A study on equi-angular cubemap projection (EAC), Joint Video Explo-ration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: JVET-G0056, Torino, IT, Jul. 2017.
[2] R. G. Youvalari, A. Aminlou, M. M. Hannuksela, and M. Gabbouj, “Efficient coding of 360-degree pseudo-cylindrical panoramic video for virtual reality applications,” in Pro-ceedings of IEEE International Symposium on Multimedia, pp.525-528, Dec. 2016.
[3] K. C. Liu, Y. T. Shen, and L. G. Chen, “Simple online and realtime tracking with spherical panoramic camera,” in Proceedings of IEEE International Conference on Consumer Elec-tronics, pp. 1-6, Jan. 2018.
[4] J. Redmon, and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263-7271, July 2017.
[5] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep as-sociation metric,” in Proceedings of IEEE International Conference on Image Processing, pp. 3645-3649, Sep. 2017.
[6] D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proceedings of European Conference on Computer Vision, pp. 749-765, Springer, Sep. 2016.
[7] R. Tao, E. Gavves, and A. W. M. Smeulders, “Siamese instance search for tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420-1429, Jun. 2016.
[8] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961-2969, Jan. 2018.
[9] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: a metric and a loss for bounding box regression.” arXiv preprint arXiv:1902.09630, Apr. 2019.
[10] D. G. Lin, TensorFlow+Keras 深度學習人工智慧實務應用, Taiwan : DrMaster Press, pp. 2-8, May 2017.
[11] N. Wang and D. Y. Yeung, “Learning a deep compact image representation for visual tracking,” in Proceedings of Advances in Neural Information Processing Systems, pp. 809-817, Dec. 2013.
[12] M. Chao, J. B. Huang, X. Yang, and M. H. Yang , “Hierarchical convolutional features for visual tracking,” in Proceedings of IEEE International Conference on Computer Vision, pp. 3074-3082, Dec. 2015.
[13] K. Chen and W. Tao, “Once for all: A two-flow convolutional neural network for visual tracking,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, No. 12, pp. 3377-3386, Dec. 2018.
[14] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Ful-ly-convolutional siamese networks for object tracking,” in Proceedings of European Con-ference on Computer Vision, pp. 850-865, Springer, Sep. 2016.
[15] Y. Tang, Y. Li, S. S. Ge, J. Luo, and H. Ren, ”Parameterized distortion-invariant feature for robust tracking in omnidirectional vision,” IEEE Transactions on Automation Science and Engineering, Vol 13, No. 2, pp. 743-756, Apr. 2016.
[16] Y. Ye, E. Alshina, and J. Boyce, Algorithm description of projection format conversion and video quality metrics in 360lib version 4, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 7th Meeting: Torino, IT, 13-21 July 2017.
[17] B. S. Kim and J. S. Park, “Estimating deformation factors of planar patterns in spherical panoramic images,” Multimedia Systems, Vol. 23, DOI 10.1007/s00530-016-0513-x, pp. 607-625, Apr. 2016.
[18] C. Huang, “Pedestrain tracking using ORB feature for equirectangular projection of 360-degree video,” M. S. Thesis, Department of Communication Engineering National Central University, Jun. 2018.
[19] Shum sir , “360 VR,” Shum sir Rubik′s Cube, 2017.
https://www.youtube.com/playlist?list=PLqWWr7VYAA5NFXVpZBR5jIrJB69psGPDq
[20] X. Corbillon, F. De Simone, and G. Simon, “360-degree video head movement dataset,” in Proceedings of ACM Multmedia System, pp. 199-204, June 2017.
[21] https://www.mettle.com/360vr-master-series-free-360-downloads-page.
[22] F. Duanmu, Y. Mao, S. Liu, S. Srinivasan, and Y. Wang, “A subjective study of viewer navigation behaviors when watching 360-degree videos on computers,” in Proceedings of IEEE International Conference on Multimedia Expo, pp. 1-6, July 2018.
[23] Y. Wu, J. Lim, and M. H. Yang, “Online object tracking: A benchmark,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, June 2013.

指導教授

唐之瑋(Chih-Wei Tang)

審核日期

2019-7-30

推文