在數位科技漸趨成熟的今日,大量的影音資訊藉由數位化與日益進步的壓縮技術而得到廣泛的傳遞與永久的保存。現今的使用者能夠藉由不同的管道取得大量的多媒體資訊,但龐大的多媒體資料若需以人工方式搜尋或加註以分類則是相當耗時的。因此,如何協助使用者有效率地搜尋及萃取多媒體資訊的技術與工具成為一個相當重要的研究議題。 本研究針對新聞視訊提出協助內容擷取與分類的工具。在新聞視訊內容中,文字是最重要的特徵之一,少許的幾個文字可為新聞內容給予精確的註解,若能對新聞中的文字進行有效的識別,將有助於對新聞內容的認識與了解。然而,在台灣的新聞頻道中,畫面文字包括了新聞標題、氣象預報、股市行情與跑馬燈,內容繁複,且文字字體與字型及其大小格式不一,而目前的文字識別軟體僅能針對少數已訓練過字型做識別,無法作用於台灣多數新聞頻道中的文字,如何從複雜的新聞畫面中擷取出利於分析的區域,便成為待解決的問題。此外,穿插於新聞播報中的廣告會使得內容分析受到影響,因此我們必須予以有效剔除以利分析。本研究將針對有代表性意義的文字區域進行偵測擷取及相關處理,並對上述問題提出解決的方法。 With the Proliferation of multimedia data, requests for effective and efficient video retrieval are growing. Among the various kinds of digital videos, TV news videos play an important role in broadcasting nowadays and may also serve as a major source of daily information for people these days. In Taiwan, there are several TV news stations and duplicated news videos are repeated again and again. Watching them may be a waste of time. Considering that the digital recording facilities are widely available now, we propose a classification scheme that can cluster the recorded TV news video segments so that the viewers may choose to watch the related archived news and even retrieve the useful information from them. In the proposed scheme, we make use of the text in TV news for clustering videos. It should be noted that the text analysis in Taiwan’s TV news needs further processing since the text areas in Taiwan’s TV news may include various information including the caption, weather report, and stock market indices etc. It’s challenging to locate the area where we are really interested in. Furthermore, video OCR is not mature enough and does not work quite well in Taiwan’s TV news broadcasting because of the special and different text fonts used in each TV news channel. We apply the low-level feature extraction and SVM to locate the possible region of interest, which should help to differentiate new segments from commercials. Then the anchorperson scene will be located to divide a piece of news into two parts, one part with the anchorperson describing the news and the other part related to the news content itself. Next, we extract the caption in the second part, in which the text is more stable and representative. After refining the extracted text areas, a cross-correlation process is used to find the similar pattern in captions of video segments to relate them together. Experimental results will be shown to demonstrate the feasibility of this potential solution.