適用於數位視訊中移動字幕之偵測、定位以及擷取之方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：3.135.192.215

姓名

陳永健(Yung-Chien Chen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

適用於數位視訊中移動字幕之偵測、定位以及擷取之方法
(A Comprehensive Motion VideotextDetection、 Localizaiton and Extraction Method)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

對於以內容為主的多媒體資訊索引與摘要的需求愈來愈高時，擷取出有內涵意義的特徵值變成為一份重要的課題。於數位視訊中，畫面的文字即是十分有用的特徵值，它不僅可以清楚表達出該影片的內涵，而且並不難以擷取。再者，相較於語音辨識或是視覺影像分析的不完善，文字辨識系統卻已趨近成熟而完整。因此，大多數的視訊索引系統研究一開始以文字辨識為濫觴。
在此篇論文，我們提出針對於移動文字之偵測與擷取演算法。相較於固定字幕的演算法而言，少有研究針對於移動文字。我們先利用Sobel detector找出可能為文字邊緣的像素，再使用垂直與水平統計表定位出正確的文字區域，最後採用Otsu Method 決定出臨界值以區分出文字與背景。不幸地，此方法仍有少數非文字的像素被辨識為文字。在此，我們使用提出的modified seed-fill演算法消除錯誤辨識的非文字區塊以提升辨識率。根據實驗結果，所提出的演算法對於不同類型視訊都能提供不錯的結果。

摘要(英)

Text in video is a very compact and accurate clue for video indexing and summarization. Most video text detection and extraction methods deal with the static videotext on video frames. Few methods can handle motion videotext well since motion videotext may hardly be extracted well. In this thesis, we propose a low computation load text detection and localization method to detect and localize the scrolling videotexts which provide much information for us. We also propose a videotext extraction method to extract the videotext. The detection method is carried out by edge detection, and the projection profile method is used to localize the text region well. The extraction method consists of adaptive thresholding, and our proposed modified seed-fill algorithm. Experimental results on a large number of video images are reported in detail.

關鍵字(中)

★ 內涵式視訊搜尋系統
★ 字幕擷取

關鍵字(英)

★ videotext extraction
★ content-based video retireval

論文目次

Chapter 1 INTRODUCTION
1.1 Motivation...................................................................................................1
1.2 MPEG-7 Standard.......................................................................................4
1.2.1 Structural Elements of Videotext DS………….........………..………4
1.2.2 Semantic Elements of Videotext DS…………………………………6
1.3 Thesis Organization…………………………………………….…………8
Chapter 2 Background and Related Work
2.1 Background…………………………………………………….………….10
2.2 Text Detection Method…………………………………...……………….12
2.2.1 Texture Based Method………………………………………………12
2.2.2 Color Based Method………………………………………………….13
2.2.3 Edge Based Method…………………………………………………..15
2.2.3.1 Under Compressed Domain………………………………15
2.2.3.2 Under Pixel Domain………………………………………..18
2.3 Text Localization Method…………………………………………………20
2.3.1 First Approach ……………………………………………………….20
2.3.2 Second Approach……………………………………………………..20
2.3.2.1 SSD-Based Module Image Match…………………………21
2.3.2.2 Contour-Based Text Stabilization…………………………22
2.3.3 Third Approach……………………………………………………….23
2.4 Text Extraction Method…………………………………………………...25
2.4.1 Multiple Frame Integration…………………………………………25
2.4.2 Interpolation………………………………………………………….26
Chapter 3 Proposed Videotext Detection, Localization and Extraction Algorithm
3.1 Overview of Proposed Algorithm…………………………………………30
3.1.1 Design Strategy……………………………………………………..31
3.1.2 Flowchart of the Proposed Algorithm………………………………31
3.2 Videotext Detection and Localization Method……………………………33
3.3 Videotext Extraction Method……………………………………………...38
Chapter 4 Experimental Result
4.1 Experimental Environment………………………………………………44
4.2 Experimental Result……………………………………………………….45
Chapter 5 Conclusions
Reference..................................................................................................................54

參考文獻

[1] Q. Huang, Z. Liu, A. Rosenberg, D. Gibbon, and B. Shahraray, “Automated generation of news content hierarchy by integrating audio, video, and text information,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 6, 1999, pp. 3025–3028.
[2] W. Qi, L. Gu, H. Jiang, X.-R. Chen, and H.-J. Zhang, “Integrating visual, audio and text analysis for news video,” in Proc. Int. Conf. Image Process., vol. 3, 2000, pp. 520–523.
[3] MPEG-7 Description Schemes, ISO/IEC/JTC1/SC29/WG11/N2844, July 1999.
[4] MPEG Requirements Group, MPEG-7 Requirements Document, Doc. ISO/MPEG N2461, MPEG Atlantic City Meeting, October 1998
[5] MPEG-7 Description Schemes (V0.6), ISO/IEC/JTC1/SC29/WG11/M5040, Version 0.6-a, September 1999.
[6] C. Dorai, R. Bolle, N. Dimitrova, L. Agnihotri, “MPEG-7 Videotext Description Scheme, Doc. ISO/MPEG M5206, MPEG Melbourne Meeting”, October 1999.
[7] H. Li, D. Doermann, and O. Kia, “Automatic text detection and tracking in digital video,” IEEE Trans. Image Process., vol. 9, no. 1, Jan. 2000, pp. 147–156.
[8] Y. Zhong, H.-J. Zhang, and A. K. Jain, “Automatic caption localization in compressed video,” in Proc. Int. Conf. Image Process., vol. 2, 1999, pp. 96–100.
[9] R. Lienhart and A. Wernicke, “Localizing and segmenting text in images, videos and web pages,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 4, Apr. 2002, pp. 256–268.
[10] I. Sobel, “An isotropic 3_3 image gradient operator,” in Machine Vision for Three-Dimensional Scenes, H. Freeman, Ed. New York: Academic, 1990, pp. 376–379.
[11] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, Cybernet., vol. SMC-9, no. 1, Jan. 1979, pp. 62–66.
[12] N. Dimitrova, L. Agnihotri, C. Dorai, and R. Bolle, “MPEG-7 Videotext Descriptor for Superimposed Text in Images and Video”, Signal Processing: Image Communication, 16 (2000), October 2000, pp. 137-155.
[13] T. Sato, T. Kanade, E. K. Hughes, and M. A. Smith, “Video OCR for digital news archive,” in Proc. IEEE Workshop Content-Based Access Image Video Database, 1998, pp. 52–60.
[14] A. K. Jain and B. Yu, “Automatic text location in images and video frames,” Pattern Recognit., vol. 31, no. 12, 1998, pp. 2055–2076.
[15] L. Agnihotri and N. Dimitrova, “Text detection for video analysis,” in Proc. IEEE Workshop Content-Based Access Image Video Libraries, 1999, pp. 109–113.
[16] V. Y. Mariano and R. Kasturi, “Locating uniform-colored text in video frames,” in Proc. 15th Int. Conf. Pattern Recognit., vol. 4, 2000, pp. 539–542.
[17] D. Chen, K. Shearer, and H. Bourlard, “Text enhancement with asymmetric filter for video OCR,” in Proc. 11th Int. Conf. Image Anal. Process., 2001, pp. 192–197.
[18] B. T. Chun, Y. Bae, and T.-Y. Kim, “Text extraction in videos using topographical features of characters,” in Proc. IEEE Int. Fuzzy Syst. Conf., vol. 2, 1999, pp. 1126–1130.
[19] X. Gao and X. Tang et al., “Automatic news video caption extraction and recognition,” in Proc. LNCS 1983: 2nd Int. Conf. Intell. Data Eng. Automated Learning Data Mining, Financial Eng., Intell. Agents, K. S. Leung et al., Eds., Hong Kong, 2000, pp. 425–430.
[20] V. Wu, R. Manmatha, and E. M. Riseman, “Textfinder: An automatic system to detect and recognize text in images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 11, Nov. 1999, pp. 1224–1229.
[21] A. Wernicke and R. Lienhart, “On the segmentation of text in videos,” in Proc. IEEE Int. Conf. Multimedia Expo, vol. 3, Jul. 2000, pp. 1511–1514.
[22] M. Cai, J. Song, and M. R. Lyu, “A new approach for video text detection,” in Proc. Int. Conf. Image Process., Rochester, NY, Sep. 2002, pp. 117–120.
[23] C. Wolf, J.-M. Jolion, F. Chassaing, “Text localization, enhancement and binarization in multimedia documents” Pattern Recognition, 2002. Proceedings. 16th International Conference on, Volume 2, 11-15 Aug. 2002, pp. 1037 – 1040.
[24] S. Antani, D. Crandall, and R. Kasturi, “Robust extraction of text in video,” in Proc. 15th Int. Conf. Pattern Recognit., vol. 1, 2000, pp. 831–834.
[25] Lyu, M.R., Jiqiang Song, Min Cai, “A comprehensive method for multilingual video text detection, localization, and extraction”, IEEE Trans. Circuits Syst. Video Technol., Volume 15, Issue 2, Feb. 2005, pp. 243 – 255.
[26] S. Kwak, K. Chung, Y. Choi, “Video Caption Image Enhancement for an Efficient Character Recognition”, in Proc. 15th Int. Conf. Pattern Recognit., vol. 2, 2000, pp. 2606–2609.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2005-7-20

推文