在影像搜尋中,一般使用者使用圖片搜尋引擎(如:Google、Flickr),基本是以文字為基礎的圖像檢索方法(Text-Based Image Retrieval, TBIR)為主要查詢方式。使用者輸入關鍵字作查詢,仰賴的是資料庫中對於圖片的說明文字,但在現實狀況中,圖片提供者很少針對圖片內容做進一步的標籤註解建置,導致圖片資訊過少,召回率低下。為了解決此問題,發展了自動標籤的研究來改進人工建置作業。
演化至今,在人工智慧備受重視的時代,賦予圖像具語意概念的資訊是目前圖像相關研究的重點。因此本研究旨在自動圖像註解領域發展一個整合應用視覺詞與文字語意的方法,應用圖像檢索熱門方法 Bag-of-Visual-Words 模型作為提取圖像特徵的依據,以TF-IDF 加權圖像的視覺詞頻率,找出對圖像來說具重要性的視覺詞。語意部分,加入Word2Vec 模型計算字詞的語意概念,將視覺詞對應語意概念來找出適當的標籤字詞。本研究使用多標籤圖像集LabelMe 戶外街景圖片進行訓練與實驗,並探討本研究方法可行性,以準確率(Precision)、召回率(Recall)、FB值衡量本系統產生自動註解的效能。;In common image searching scenarios, Image Search Engines like Google Image and Flickr that most people using usually are built on Text-Based Image Retrieval
techniques. By searching with keywords that user provide, Text-Based Image Retrieval techniques extremely rely on the describing context tag on images within the database. However, the practical data image uploader seldom provides detailed image tags or context description that make it even harder for Text-Based Image Retrieval to identify the correct image. To solve this problem, the development of Automatic Image Annotation is aimed to improve the process of manual construction.
How to effectively accomplish image retrieval and management has become a popular research topic in IT field since massive image data are now available in digital era. We propose an Automatic Image annotation approach integrating visual words and semantic words. Using popular image retrieval method Bag-of-Visual-Words to extract image features and combining with TF-IDF to calculate weighted visual word’s frequency, we can identify the most representative visual words for image. Furthermore, we apply Word2Vec model to conceptualize the meaning of context and generate image tags with proper semantic meaning. In this study, we use multi-label outdoor image dataset LabelMe to perform model training and experiments and discuss about the practicability and efficiency of this approach via Precision Rate, Recall Rate, and F1-measure.