基於嘴唇影像序列之生物認證

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.141.42.216

姓名

郭政言(Zheng-Yan Guo) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於嘴唇影像序列之生物認證
(Lip-image-sequence-based Biometric Authentication)

相關論文

★ 結合平行殘差雙融合特徵金字塔網路及自注意力機制之交通燈號辨識

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-27以後開放)

摘要(中)

生物認證系統在近年來已被廣泛的用在生活中，而如何提升安全性一直是重要的課題。以往基於靜態生物特徵的認證方法在各種偽造方法推陳出新之下已越來越容易被破解。而基於生物特徵序列的方法由於需以序列資料進行驗證，因此相對難以偽造，進而提升了安全性。
本文以神經網路實現了基於嘴唇影像及關鍵點序列的身分認證模型，並以本文提出的資料集進行訓練及測試。在一般認證實驗中，本文訓練的模型得到了 8.86% HTER的結果，證明了此模型對嘴唇影像及關鍵點序列資料的有效性。而為了測試輸入序列資料是否能達到提升安全性的目的，本文以靜態序列作為偽造資料輸入模型，得到了84.09% FAR 的結果，顯示了直接輸入序列資料對於提升安全性是沒有幫助的。為了抵
抗靜態序列攻擊，本文計算影像序列的影格差值作為輸入，最後在一般認證實驗中得到了 6.53% HTER 的結果，在靜態序列攻擊實驗中得到了 9.09% FAR 的結果，證明了嘴唇影像序列的影格差值在認證問題中的有效性及安全性。

摘要(英)

In recent years, biometric authentication systems have been widely used in daily life, and how to improve security has always been an important topic. In the past, authentication methods based on static biometrics have become more and more easily cracked under various forgery
methods. However, methods based on sequential biometric need to be verified with sequential data, so it is relatively difficult to forge, thereby improving the security.
In this paper, an identity authentication model based on lip image and key point sequence is implemented by neural network, and the data set proposed in this paper is used for training and testing. In the general authentication experiment, the model trained in this paper obtained a
result of 8.86% HTER, which proved the effectiveness of this model for lip image and key point sequence data. In order to test whether the sequence data can achieve the purpose of improving security, we input static sequences as the fake data to the model, and obtains a result of 84.09% FAR, which shows that directly inputting sequence data is not helpful for improving security. In order to resist the static sequence attack, we calculate the frame difference of the image sequence as input. Finally, the result of 6.53% HTER is obtained in the general authentication experiment, and the result of 9.09% FAR is obtained in the static sequence attack experiment, which proves the validity and safety of the frame difference of the lip image sequence in the authentication problem.

關鍵字(中)

★ 嘴唇
★ 影像
★ 序列
★ 生物認證

關鍵字(英)

★ Lip
★ Image
★ Sequence
★ Biometric Authentication

論文目次

中文摘要............................................................................................................................ i
英文摘要............................................................................................................................ ii
誌謝.................................................................................................................................... iii
目錄.................................................................................................................................... iv
圖目錄................................................................................................................................ vi
表目錄................................................................................................................................ vii
一、緒論............................................................................................................................ 1
1-1 研究動機................................................................................................................ 1
1-2 研究目的................................................................................................................ 1
1-3 研究方法................................................................................................................ 2
1-4 研究貢獻................................................................................................................ 2
二、相關文獻探討............................................................................................................ 3
2-1 文獻回顧................................................................................................................ 3
三、研究內容與方法........................................................................................................ 4
3-1 系統架構................................................................................................................ 4
3-2 神經網路架構........................................................................................................ 5
3-2-1 LF-Net........................................................................................................ 8
3-2-2 Time Distributed 2D Convolution Residual Unit ...................................... 9
3-2-3 3D Convolution Residual Unit................................................................... 9
3-2-4 1D Convolution Residual Unit................................................................... 10
3-2-5 3D 卷積層.................................................................................................. 10
3-2-6 殘差模組.................................................................................................... 11
3-2-7 空洞卷積.................................................................................................... 12
3-2-8 影格差值.................................................................................................... 14
3-2-9 GRU........................................................................................................... 14
3-2-10 二元交叉熵................................................................................................ 17
3-3 資料集.................................................................................................................... 17
3-3-1 資料收集.................................................................................................... 17
3-3-2 前處理........................................................................................................ 18
3-3-3 資料增強.................................................................................................... 20
3-3-4 影格差值.................................................................................................... 20
3-4 實驗........................................................................................................................ 21
3-4-1 實驗流程.................................................................................................... 21
3-4-2 靜態序列攻擊實驗.................................................................................... 22
3-4-3 評估指標.................................................................................................... 23
3-4-4 實驗環境.................................................................................................... 24
3-4-5 網路訓練參數............................................................................................ 24
四、實驗結果與討論........................................................................................................ 26
4-1 嘴唇影像與關鍵點................................................................................................ 26
4-2 影格差值抵抗靜態序列攻擊................................................................................ 27
4-3 嘴唇影像與關鍵點的合併.................................................................................... 27
4-4 GRU....................................................................................................................... 28
五、結論............................................................................................................................ 29
參考文獻............................................................................................................................ 30
附錄一................................................................................................................................ 33

參考文獻

[1] P. Jourlin, J. Luettin, D. Genoud, and H. Wassner, “Acoustic-labial speaker verification,”
Pattern Recognition Letters, vol. 18, no. 9, pp. 853–858, 1997. Audio- and Video-Based
Person Authentication.
[2] T. Wark, D. Thambiratnam, and S. Sridharan, “Person authentication using lip information,” in TENCON ’97 Brisbane - Australia. Proceedings of IEEE TENCON ’97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162), vol. 1, pp. 153–156 vol.1, 1997.
[3] R. Frischholz and U. Dieckmann, “Biold: a multimodal biometric identification system,”
Computer, vol. 33, no. 2, pp. 64–68, 2000.
[4] C. C. Broun, X. Zhang, R. M. Mersereau, and M. Clements, “Automatic speechreading with application to speaker verification,” in 2002 IEEE International Conference on
Acoustics, Speech, and Signal Processing, vol. 1, pp. I–685–I–688, 2002.
[5] H. Cetingul, Y. Yemez, E. Erzin, and A. Tekalp, “Discriminative analysis of lip motion
features for speaker identification and speech-reading,” IEEE Transactions on Image Processing, vol. 15, no. 10, pp. 2879–2891, 2006.
[6] U. Sanchez and J. Kittler, “Fusion of talking face biometric modalities for personal identity verification,” in 2006 IEEE International Conference on Acoustics Speech and Signal
Processing Proceedings, vol. 5, 2006.
[7] M. Faraj and J. Bigun, “Motion features from lip movement for person authentication,” in
18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 1059–1062,
2006.
[8] S. A. Samad, D. A. Ramli, and A. Hussain, “Lower face verification centered on lips using
correlation filters,” Information Technology, vol. 6, no. 8, pp. 1146–1151, 2007.
[9] Y.-F. Liu, C.-Y. Lin, and J.-M. Guo, “Impact of the lips for biometrics,” IEEE transactions
on image processing, vol. 21, pp. 3092–101, 06 2012.
[10] C. H. Chan, B. Goswami, J. Kittler, and W. Christmas, “Local ordinal contrast pattern
histograms for spatiotemporal, lip-based speaker authentication,” IEEE Transactions on
Information Forensics and Security, vol. 7, no. 2, pp. 602–612, 2012.
[11] S.-L. Wang and A. W.-C. Liew, “Physiological and behavioral lip biometrics: A comprehensive study of their discriminative power,” Pattern Recognition, vol. 45, no. 9, pp. 3328–
3335, 2012. Best Papers of Iberian Conference on Pattern Recognition and Image Analysis
(IbPRIA’2011).
[12] X. Liu and Y.-m. Cheung, “Learning multi-boosted hmms for lip-password based speaker
verification,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 2,
pp. 233–246, 2014.
[13] J.-Y. Lai, S.-L. Wang, A. W.-C. Liew, and X.-J. Shi, “Visual speaker identification and authentication by joint spatiotemporal sparse coding and hierarchical pooling,” Information
Sciences, vol. 373, pp. 219–232, 2016.
[14] X.-X. Shi, S.-L. Wang, and J.-Y. Lai, “Visual speaker authentication by ensemble learning
over static and dynamic lip details,” in 2016 IEEE International Conference on Image
Processing (ICIP), pp. 3942–3946, 2016.
[15] F. Cheng, S.-L. Wang, and A. W.-C. Liew, “Visual speaker authentication with random
prompt texts by a dual-task cnn framework,” Pattern Recognition, vol. 83, pp. 340–352,
2018.
[16] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action
recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35,
no. 1, pp. 221–231, 2013.
[17] T. Stafylakis and G. Tzimiropoulos, “Combining residual networks with lstms for lipreading,” CoRR, vol. abs/1703.04105, 2017.
[18] T. Afouras, J. S. Chung, A. W. Senior, O. Vinyals, and A. Zisserman, “Deep audio-visual
speech recognition,” CoRR, vol. abs/1809.02108, 2018.
[19] S. Petridis, T. Stafylakis, P. Ma, F. Cai, G. Tzimiropoulos, and M. Pantic, “End-to-end
audiovisual speech recognition,” CoRR, vol. abs/1802.06424, 2018.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
CoRR, vol. abs/1512.03385, 2015.
[21] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image
segmentation with deep convolutional nets and fully connected crfs,” 2016.
[22] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,”
2016.
[23] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. van den Oord, A. Graves, and
K. Kavukcuoglu, “Neural machine translation in linear time,” 2017.
[24] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” 2014.
[25] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,
pp. 1735–80, 12 1997.
[26] J. Luettin and G. Maître, “Evaluation protocol for the extended M2VTS database
(XM2VTSDB),” Idiap-Com Idiap-Com-05-1998, IDIAP, 1998.

指導教授

范國清莊啟宏(Kuo-Chin Fan Chi-Hung Chuan)

審核日期

2021-8-2

推文