基於卷積神經網路之注視區塊估測;CNN-based Gaze Block Estimation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/81141

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/81141

Title:	基於卷積神經網路之注視區塊估測;CNN-based Gaze Block Estimation
Authors:	陳子傑;Chen, Zih-Jie
Contributors:	資訊工程學系
Keywords:	卷積神經網路;殘差網路;眼動追蹤;注視區塊;人機互動;Convolutional Neural Network;Residual Network;Eye Tracking;Gaze Block;Human-Computer Interaction
Date:	2019-07-24
Issue Date:	2019-09-03 15:36:54 (UTC+8)
Publisher:	國立中央大學
Abstract:	視覺是人類接收外界資訊最重要的感官之一，藉由視覺能夠幫助我們探索世界和接收新知，也能透過視覺來達到人機互動。隨著非接觸式的人機互動模式不斷地發展，以注視行為進行溝通的技術已成為該領域的亮點，在教育、廣告、看護、娛樂或虛擬實境等領域也已經有了不少應用，一般而言，眼動追蹤設備大多需事先進行校正，或是需固定頭部，在使用規範上仍有不少限制。為解決上述問題，本論文利用ResNet模型作為分類的核心，建立注視區塊估測模型(Gaze Block Estimation Model, GBE Model)，在不需校正的情況下，即可估算出使用者注視的區塊，且所需之取像設備僅需使用一般無深度RGB攝影機，如一般網路攝影機、筆記型電腦內建的攝影機或者是手機的前鏡頭即可進行應用。惟因深度學習為data driven，需要大量且正確標記的訓練資料，才能夠訓練出穩定且符合需求的模型，但現有的視覺行為公開資料集，會因為應用情境的不同，導致資料並無法適用於所有應用場域。因此，本論文自行收集並建立多達三十萬張視覺影像的資料集(LabGaze)。由實驗結果得知，GBE Model在不進行校正且允許頭部移動的情況，仍然能對使用者的注視區塊進行估測，即便在即時測試的情境下，其準確度可達 85.1%，其實驗證明本論文所提出之方法能使用注視區塊進行畫面控制，達成人機互動的應用情境。 ;The visual is one of the most important senses that a human receives outside information. The visual helps us explore the world, receive new knowledge, and communicate with computer. As contactless human-computer interaction (HCI) model continues to develop, the technology of communicating with gaze behavior has become a highlight in this field. There have been many applications in the fields of education, advertising, nursing, entertainment or virtual reality. In general, most of the eye tracking devices need calibration in advance or fixing head. There are still many restrictions on usage specification. To solve the above problems, this study uses the ResNet model as the core of classification to construct Gaze Block Estimation Model (GBE Model). It can estimate the gaze block of the user without calibration process. Moreover, only an RGB camera device without depth information is used to capture the image, such as a webcam, a built-in camera on a laptop, or front-facing camera of a smartphone. The deep learning approach is data-driven. It needs a large amount of correctly labeled training data to train a stable and compliant model. However, the existing public dataset of visual behavior has different application scenarios. Resulting in images of the dataset does not apply to all application domains. Therefore, this study collects and builds up to a dataset of eye images of up to 300000 images. According to the experimental results, the GBE Model can estimate gaze block of the user without calibration process and allow the head moving. Even in the real-life testing, it can reach 85.1% accuracy. The experimental results prove the proposed method can let user use gaze block to control the screen, and achieve the goal of HCI application scenario.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	152	View/Open

社群 sharing

Loading...