基於卷積神經網路之注視區塊估測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：68

、訪客IP：3.136.22.68

姓名

陳子傑(Zih-Jie Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於卷積神經網路之注視區塊估測
(CNN-based Gaze Block Estimation)

相關論文

★ 多尺度可變形卷積對齊網路應用於影片超解析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-7-24以後開放)

摘要(中)

視覺是人類接收外界資訊最重要的感官之一，藉由視覺能夠幫助我們探索世界和接收新知，也能透過視覺來達到人機互動。隨著非接觸式的人機互動模式不斷地發展，以注視行為進行溝通的技術已成為該領域的亮點，在教育、廣告、看護、娛樂或虛擬實境等領域也已經有了不少應用，一般而言，眼動追蹤設備大多需事先進行校正，或是需固定頭部，在使用規範上仍有不少限制。
為解決上述問題，本論文利用ResNet模型作為分類的核心，建立注視區塊估測模型(Gaze Block Estimation Model, GBE Model)，在不需校正的情況下，即可估算出使用者注視的區塊，且所需之取像設備僅需使用一般無深度RGB攝影機，如一般網路攝影機、筆記型電腦內建的攝影機或者是手機的前鏡頭即可進行應用。惟因深度學習為data driven，需要大量且正確標記的訓練資料，才能夠訓練出穩定且符合需求的模型，但現有的視覺行為公開資料集，會因為應用情境的不同，導致資料並無法適用於所有應用場域。因此，本論文自行收集並建立多達三十萬張視覺影像的資料集(LabGaze)。
由實驗結果得知，GBE Model在不進行校正且允許頭部移動的情況，仍然能對使用者的注視區塊進行估測，即便在即時測試的情境下，其準確度可達 85.1%，其實驗證明本論文所提出之方法能使用注視區塊進行畫面控制，達成人機互動的應用情境。

摘要(英)

The visual is one of the most important senses that a human receives outside information. The visual helps us explore the world, receive new knowledge, and communicate with computer. As contactless human-computer interaction (HCI) model continues to develop, the technology of communicating with gaze behavior has become a highlight in this field. There have been many applications in the fields of education, advertising, nursing, entertainment or virtual reality. In general, most of the eye tracking devices need calibration in advance or fixing head. There are still many restrictions on usage specification.
To solve the above problems, this study uses the ResNet model as the core of classification to construct Gaze Block Estimation Model (GBE Model). It can estimate the gaze block of the user without calibration process. Moreover, only an RGB camera device without depth information is used to capture the image, such as a webcam, a built-in camera on a laptop, or front-facing camera of a smartphone. The deep learning approach is data-driven. It needs a large amount of correctly labeled training data to train a stable and compliant model. However, the existing public dataset of visual behavior has different application scenarios. Resulting in images of the dataset does not apply to all application domains. Therefore, this study collects and builds up to a dataset of eye images of up to 300000 images.
According to the experimental results, the GBE Model can estimate gaze block of the user without calibration process and allow the head moving. Even in the real-life testing, it can reach 85.1% accuracy. The experimental results prove the proposed method can let user use gaze block to control the screen, and achieve the goal of HCI application scenario.

關鍵字(中)

★ 卷積神經網路
★ 殘差網路
★ 眼動追蹤
★ 注視區塊
★ 人機互動

關鍵字(英)

★ Convolutional Neural Network
★ Residual Network
★ Eye Tracking
★ Gaze Block
★ Human-Computer Interaction

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vii
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 論文架構 3
第二章相關研究 4
2.1 視覺行為(Visual Behavior) 4
2.2 眼動追蹤(Eye Tracking) 7
2.3 卷積神經網路(Convolutional Neural Networks) 11
2.3.1 卷積層 12
2.3.2 池化層 14
2.3.3 全連接層 15
2.3.4 卷積神經網路種類 17
2.4 人機互動(Human-Computer Interaction) 22
第三章研究方法 24
3.1 系統架構 24
3.2 資料收集(Data Collection) 25
3.2.1 MPIIGaze 25
3.2.2 LabGaze 27
3.3 特徵擷取(Feature Extraction) 29
3.3.1 人臉檢測 30
3.3.2 資料清理 34
3.3.3 眼部擷取 35
3.4 訓練模型 36
3.4.1 正規化 37
3.4.2 神經網路架構 37
3.5 測試流程 39
第四章實驗設計與結果 40
4.1 實驗環境 40
4.2 實驗結果 41
4.2.1 單眼與雙眼模型 41
4.2.2 多分類模型 47
4.2.3 混合資料集模型 54
4.3 眼動畫面控制 55
第五章結論與未來展望 57
參考文獻 58

參考文獻

[1] D.E. Broadbent, “Perception and communication,” Elsevier, 2013.
[2] M.F. Land and B.W. Tatler, “Looking and Acting: Vision and Eye Movements in Natural Behaviour,” Oxford University Press, 2009.
[3] Keith Rayner, “Eye movements in reading and information processing: 20 years of research,” Psychological bulletin, pp. 372-422, 1998.
[4] Keith Rayner, “Visual selection in reading, picture perception, and visual search: A tutorial review,” Attention and Performance, Vol. 10, pp. 67-96, 1984.
[5] Ami Klin, Warren Jones, Robert Schultz, and Fred Volkmar, “The Enactive Mind, or from Actions to Cognition: Lessons from Autism,” Philosophical Transactions of the Royal Society, Vol. 358, pp. 345-60, 2003.
[6] Edmund Burke Huey, “The psychology and pedagogy of reading,” New York: MacMillan, 1908.
[7] M. Monnier and H. J. Hufschmidt, “Electrooculography and electronystagmography in man,” Helvetica physiologica et pharmacologica acta, pp. C30-2, 1950.
[8] Takeshi Takegami, Toshiyuki Gotoh, Seiichiro Kagei, and Reiko Tachino, “A Hough Based Eye Direction Detection Algorithm without On-site Calibration,” Proceedings of the Seventh International Conference on Digital Image Computing: Techniques and Applications, DICTA 2003, pp. 459-468, 2003.
[9] X. Wu, J. Li, Q. Wu, and J. Sun, “Appearance-based gaze block estimation via CNN classification,” 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), pp. 1-5, 2017.
[10] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba, “Eye Tracking for Everyone,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2176-2184, 2016.
[11] Chi Zhang, Rui Yao, and Jinpeng Cai, “Efficient Eye Typing with 9-direction Gaze Estimation,” arXiv preprint arXiv:1707.00548, 2017.
[12] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, pp. 2278-2324, 1998.
[13] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al., “ImageNet Large Scale Visual Recognition Challenge,” arXiv preprint arXiv:1409.0575, 2014.
[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet classification with deep convolutional neural networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, pp. 1097-1105, Lake Tahoe, Nevada: Curran Associates Inc., 2012.
[15] Karen Simonyan and Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] C. Szegedy, Liu Wei, Jia Yangqing, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, et al., “Going Deeper with Convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, 2015.
[18] 森思眼動. [Accessed: 01-Apr-2019]. Available from: www.senseet.com.
[19] 七鑫易維. [Accessed: 01-Apr-2019]. Available from: www.7invensun.com.
[20] Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling, “Appearance-Based Gaze Estimation in the Wild,” arXiv preprint arXiv:1504.02863, 2015.
[21] Tereza Soukupová and Jan Cech, “Real-Time Eye Blink Detection using Facial Landmarks,” 21st Computer Vision Winter Workshop, 2016.

指導教授

范國清高巧汶(Kuo-Chin Fan Chiao-Wen Kao)

審核日期

2019-7-24

推文