基於關鍵點篩選於袋字模型之影像分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：78

、訪客IP：18.119.120.59

姓名

陳冠宇(Kuan-Yu Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

基於關鍵點篩選於袋字模型之影像分類
(Keypoint Selection for Bag-of-Words Based Image Classification)

相關論文

★ 利用資料探勘技術建立商用複合機銷售預測模型	★ 應用資料探勘技術於資源配置預測之研究-以某電腦代工支援單位為例
★ 資料探勘技術應用於航空業航班延誤分析-以C公司為例	★ 全球供應鏈下新產品的安全控管-以C公司為例
★ 資料探勘應用於半導體雷射產業-以A公司為例	★ 應用資料探勘技術於空運出口貨物存倉時間預測-以A公司為例
★ 使用資料探勘分類技術優化YouBike運補作業	★ 特徵屬性篩選對於不同資料類型之影響
★ 資料探勘應用於B2B網路型態之企業官網研究-以T公司為例	★ 衍生性金融商品之客戶投資分析與建議-整合分群與關聯法則技術
★ 應用卷積式神經網路建立肝臟超音波影像輔助判別模型	★ 基於卷積神經網路之身分識別系統
★ 能源管理系統電能補值方法誤差率比較分析	★ 企業員工情感分析與管理系統之研發
★ 資料淨化於類別不平衡問題: 機器學習觀點	★ 資料探勘技術應用於旅客自助報到之分析—以C航空公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

隨著人們所能接觸到的影像資料大量增加，影像檢索的發展是非常重要的課題，但目前影像檢索辨識率仍不夠高，因為有著低階和高階意涵上的鴻溝問題，因此，自動影像註解開始發展並不斷改良影像萃取方式和分類方法等等來增加影像註解的準確率，近幾年更發展出了BOW(Bag of Words)的袋字模型方法，此方法原本是用於文字探勘上，但目前很常被使用在影像註解的領域上，目前不少研究針對BOW做改良，像是SPM(Spatial Pyramid Matching Bag Of Words)就是針對BOW加上空間資訊所形成，這些方法中會用到所謂的keypoint，keypoint為BOW方法中所偵測出的影像特徵，目前少有研究針對於keypoint作處理，通常萃取的keypoint數量非常龐大，因此不但可能消耗CPU的運算且可能影響訓練的model結果，進而使影像註解效果不佳，因此本研究將使用一種新的非監督演算法叫做IKS( Iterative Keypoint Selection)來做keypoint的篩選。
　　IKS主要概念是利用距離來做keypoint的篩選，在IKS中認為在一定的範圍距離內只需要有一個keypoint來代表這個區域或物件，因此從原始keypoints中挑出這些具代表性的keypoints來實行keypoints的篩選動作，另外可根據挑選方法分成兩種，IKS1使用隨機的方法，IKS2則採用分群的方法來進行。
　　本研究採用Caltech101以及Caltech256兩種資料集來進行實驗，透過IKS來做為keypoints的篩選方法，並比較經過IKS篩選和未經過篩選的BOW和SPM所產生的影像註解效果，評估的分類方法採用SVM(Support Vector Machines)。
　　實驗結果顯示，透過IKS的篩選，能夠將具有代表性的keypoints留下，不管是針對較少類別的Caltech101或是較多類別的Caltech256，IKS皆能對BOW和改良過的SPM產生作用，使得SVM所產生的分類率有提高的效果。

摘要(英)

To search images from large image databases, image retrieval is the major technique to retrieve similar images based on users’ queries. In order to allow users to provide keyword-based queries, automatically annotating images with keywords has been extensively studied. In particular, BOW(Bag Of Words)and SPM(Spatial Pyramid Matching Bag Of Words) are two well-known methods to represent image content as the image feature descriptors. To extract the BOW or SPM features, some keypoints must be detected from each image. However, the number of the detected keypoints is usually very large and some of them are unhelpful to describe the image content, such as background and similar keypoints in different classes. In addition, the computational cost of the vector quantization step heavily depends on the amount of detected keypoints.
　　Therefore, in this thesis I introduce a new algorithm called IKS(Iterative Keypoint Selection), whose aim is to select representative keypoints for generating the BOW and SPM features. The main concept of IKS is based on identifying some representative keypoints and the distance to select useful keypoints. Specifically, IKS can be divided into IKS1 and IKS2 according to the strategy of identifying representative keypoints. While IKS1 focuses on randomly selecting a keypoint from an image as the representative keypoint, IKS2 uses the k-means to generate the cluster centroids to find the representative keypoints that is closest to them.
　　Our experimental results based on the Caltech101 and Caltech256 datasets demonstrate that performing keypoint selection by IKS1 and IKS2 can allow the SVM classifier to provide better classification accuracy than the baseline BOW and SPM without keypoint selection. More specifically, IKS2 is more appropriate than IKS1 for image annotation since it performs better than IKS1 when the larger dataset, i.e. Caltech 256, is used.

關鍵字(中)

★ keypoint selection
★ 袋字模型
★ 影像註解
★ 影像分類

關鍵字(英)

★ keypoint selection
★ bag of words
★ image annotation
★ image classification

論文目次

摘要　　　　　　　　　　　　　　　　　　　　　　　　i
Abstract　　　　　　　　　　　　　　　　　　　　　　 ii
致謝辭　　　　　　　　　　　　　　　　　　　　　　　iii
目錄　　　　　　　　　　　　　　　　　　　　　　　　iv
圖目錄　　　　　　　　　　　　　　　　　　　　　　　vi
表目錄　　　　　　　　　　　　　　　　　　　　　　　viii
第一章　緒論　　　　　　　　　　　　　　　　　　　　1
　1-1 研究背景　　　　　　　　　　　　　　　　　　　1
　1-2 研究動機　　　　　　　　　　　　　　　　　　　2
　1-3 研究目的　　　　　　　　　　　　　　　　　　　4
　1-4 研究貢獻　　　　　　　　　　　　　　　　　　　5
　1-5 論文架構　　　　　　　　　　　　　　　　　　　5
第二章　文獻探討　　　　　　　　　　　　　　　　　　6
　2-1 影像註解　　　　　　　　　　　　　　　　　　　6
　2-2 袋字模型(Bag Of Words)　　　　　　　　　　　　　8
　2-3 SPM(Spatial pyramid matching Bag Of Words)　　　　11
　2-4 樣本選取　　　　　　　　　　　　　　　　　　　17
　2-5 文獻討論　　　　　　　　　　　　　　　　　　　20
第三章　關鍵點篩選(Keypoint Selection)　　　　　　　　 22
　3-1 Keypoint Selection定義　　　　　　　　　　　　　 22
　3-2 Iterative Keypoint Selection (IKS)　　　　　　　　　24
　　3-2-1 IKS1　　　　　　　　　　　　　　　　　　　25
　　3-2-2 IKS2　　　　　　　　　　　　　　　　　　　28
第四章　實驗結果　　　　　　　　　　　　　　　　　 34
　4-1 實驗設計　　　　　　　　　　　　　　　　　　　34
　　4-1-1 資料集　　　　　　　　　　　　　　　　　　34
　　4-1-2 Baseline　　　　　　　　　　　　　　　　　34
　　　4-1-2-1 Bow features　　　　　　　　　　　　　34
　　　4-1-2-2 Keypoint Selection Method　　　　　　　　36
　　4-1-3 分類方法　　　　　　　　　　　　　　　　　36
　4-2 Caltech101　　　　　　　　　　　　　　　　　　 36
　　4-2-1 參數設定　　　　　　　　　　　　　　　　　37
　　　4-2-1-1 IKS1參數　　　　　　　　　　　　　　　 37
　　　4-2-1-2 IKS2參數　　　　　　　　　　　　　　　38
　　4-2-2 IB3參數　　　　　　　　　　　　　　　　　39
　　4-2-3 visual words數量　　　　　　　　　　　　　40
　　4-2-4 比較結果　　　　　　　　　　　　　　　　　 41
　4-3 Caltech256　　　　　　　　　　　　　　　　　　43
　　4-3-1 參數設定　　　　　　　　　　　　　　　　　43
　　　4-3-1-1 IKS1參數　　　　　　　　　　　　　　　44
　　　4-3-1-2 IKS2參數　　　　　　　　　　　　　　　45
　　4-3-2 IB3參數　　　　　　　　　　　　　　　　　47
　　4-3-3 visual words數量　　　　　　　　　　　　　47
　　4-3-4 比較結果　　　　　　　　　　　　　　　　　48
　4-4 分析與討論　　　　　　　　　　　　　　　　　 51
第五章　結論與未來研究方向　　　　　　　　　　　　60
　5-1 總結　　　　　　　　　　　　　　　　　　　　 60
　5-2 未來研究方向　　　　　　　　　　　　　　　　 60
參考文獻　　　　　　　　　　　　　　　　　　　　　 62
附錄一　　　　　　　　　　　　　　　　　　　　　　 66
附錄二　　　　　　　　　　　　　　　　　　　　　　 68
附錄三　　　　　　　　　　　　　　　　　　　　　　 72
附錄四　　　　　　　　　　　　　　　　　　　　　　 75

參考文獻

[1] Veltkamp, R. and Tanase, M. (2000) Content-Based Image Retrieval Systems: A Survey. Technical report.Department of Computing Science, Utrecht University.
[2] Eakins, J.P., and Graham, M.E. (1999) Content-based image retrieval：a report to the JISC technology application programme. Technical report. Institute for Image Data Research, University of Northumbria at Newcastle, UK, Available at: http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc
[3] Deb, S. and Zhang, Y. (2004) An Overview of Content-based Image Retrieval Techniques. The International Conference on Advanced Information Networking and Applications, Vol. 1, pp. 59-64.
[4] Sivic, J. and Zisserman, A. (2003) Video Google: a text retrieval approach to object matching in videos. IEEE International Conference on Computer Vision, pp. 1470-1477.
[5] Lowe, D. (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110.
[6] Lowe, D. (September 1999) Object recognition from local scale-invariant features. International Conference on Computer Vision, Corfu, Greece, pp. 1150-1157.
[7] Lazebnik, S., Schmid, C. ,and Ponce, J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2169-2178.
[8] Mori, Y., Takahashi, H., and Oka, R. (1999) Image-to-word Transformation Based on Dividing and Vector Quantizing Images with Words. International Workshop on Multimedia Intelligent Storage and Retrieval Management.
[9] Oliva, A. and Torralba, A. (2001) Modeling the Shape of the Scene: A Holistic Representation of the spatial envelope. International Journal of Computer Vision, vol. 42, no. 3, pp. 145-175.
[10] Siagian, C. and Itti, L. (2007) Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 300-312.
[11] Torralba, A., Murphy, K., Freeman, W., and Rubin, M.(2003) Context-based vision system for place and object recognition. IEEE International Conference on Computer Vision. vol. 1, pp. 273–280.
[12] Tsai, C.-F. and Hung, C. (2008) Automatically Annotating Images with Keywords: A Review of Image Annotation Systems. Recent Patents on Computer Science, vol. 1, no. 1, pp. 55-68.
[13] Shi, J. and Malik, J, (2000) Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22 No. 8, pp. 888-905.
[14] Murphy, K., Torralba, A. , Eaton1, D., and Freeman, W. (2006) Object detection and localization using local and global features. Towards Category-Level Object Recognition, vol. 1, pp. 1-20.
[15] Qi, X. and Han, Y. (2007) Incorporating multiple SVMs for automatic image annotation. Pattern Recognition, Vol. 40, No. 2, pp. 728-741.
[16] Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., and Freeman, W.T. (2005) Discovering object categories in image collection. IEEE International Conference on Computer Vision, pp. 2254-2261.
[17] Fergus, R., Fei-Fei, L., Perona, P., and Zisserman, A. (2005) Learning object categories from google’s image search. IEEE International Conference on Computer Vision, pp. 1816-1823.
[18] Luo, H.-L., Wei, H., and Lai, L.L. (2011) Creating efficient visual codebook ensembles for object categorization. IEEE Transactions n Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41, no. 2, pp. 238-253.
[19] Horster, E. and Lienhart, R. (2007) Fusing local image descriptors for large-scale image retrieval. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1-8.
[20] Jegou, H., Douze, M., and Schmid, C. (2010) Improving bag-of-features for large scale image search. International Journal of Computer Vision, vol. 87, pp. 316-336.
[21] Harris, C. and Stephens, M. (1988) A combined corner and edge detector. The 4th Alvey Vision Conference, pp. 147-151.
[22] Kadir, T. and Brady, M. (2001) Scale, saliency and image description. International Journal of Computer Vision, vol. 45, no. 2, pp. 83–105.
[23] Nowak, E., Jurie, F., and Triggs, B. (2006) Sampling strategies for bag-of-features image classification. European Conference on Computer Vision, pp. 490–503.
[24] Mikolajczyk, K. and Schmid, C. (2005) A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630.
[25] Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., and Tuytelaars, T. (2007) A thousand words in a scene. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1575-1589.
[26] MacQueen, J. B. (1967) Some Methods for classification and Analysis of Multivariate Observations. The 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-297.
[27] Jiang, Y.-G., Yang, J., Ngo, C.-W., and Hauptmann, A.G. (2010) Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia, vol. 12, no. 1, pp. 42-53.
[28] Salton, G., Fox, E. and Wu, H. (1983) Extended Boolean information retrieval. Communications of the ACM, Vol.26, 1022-1036.1
[29] Salton, G. and Buckley, C. (1988) Term-weighting approaches in automatic text retrieval. Information Processing & Management, vol. 24, no. 5, pp. 513-523.
[30] Grauman, K. and Darrell, T. (2005) The Pyramid Match Kernel：Discriminative Classification with Sets of Image Features. IEEE International Conference on Computer Vision, vol. 2, pp. 1458-1465.
[31] Swain, M. and Ballard, D. (1991) Color Indexing. International Journal of Computer Vision, vol. 7, no. 1, pp. 11-32.
[32] Derrac, J., García, S., and Herrera, F. (2010) a survey on evolutionary instance selection and Generation. International Journal of Applied Metaheuristic Computing, vol. 1, no. 1, pp. 60-92.
[33] Wilson, D.L. (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactionson on Systems, Man and Cybernetics, vol. SMC-2, no. 3, pp. 408-421.
[34] Aha, D.W., Kibler, D., and Albert, M.K. (1991) Instance-Based Learning Algorithms. Machine Learning, vol. 6, no. 1, pp. 37-66.
[35] Brightion, H. and Mellish, C. (2002) Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery, vol. 6, pp. 153–172.
[36] Wilson, D.R. and Martinez, T.R. (2000) Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning, vol. 38, pp. 257–286.
[37] Cameron-Jones, R.M. (1992) Minimum description length instance-based learning. The Fifth Australian Joint Conference on Artificial Intelligence, Hobart, Australia, pp. 368-373.
[38] Zhang, J. (1992) Selecting typical instances in instance-based learning. The Ninth International Machine Learning Conference, Aberdeen, Scotland, pp. 470-479.
[39] Jankowski, N. and Grochowski, M. (2004) Comparison of instances selection algorithms I: algorithms survey. International Conference on Artificial Intelligence and Soft Computing, pp. 598-603.
[40] Chin, T.-J., Suter, D., and Wang, H. (2011) Boosting histograms of descriptor distances for scalable multiclass specific scene recognition. Image and Vision Computing, vol. 29, pp. 241-250.
[41] Opelt, A., Pinz, A., Fussenegger, M., and Auer, P. (2006) Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 416-431.
[42] Dorko, G. and Schmid, C. (2003) Selection of scale-invariant parts for object class recognition. IEEE International Conference on Computer Vision, pp. 634-639.
[43] Mikolajczyk, K., and Schmid, C. (2001) Indexing based on scale invariant interest points. IEEE International Conference on Computer Vision, vol. 1, pp. 525-531.
[44] Chang, C.-C., Li, Y.-C., and Yeh, J.-B. (2006) Fast codebook search algorithms based on tree-structured vector quantization. Pattern Recognition Letters, vol. 27, no. 10, pp. 1077-1086.
[45] Moosmann, F., Triggs, B., and Jurie, F. (2006) Fast discriminative visual codebooks using randomized clustering forests. International Conference on Neural Information Processing Systems, pp. 985-992.
[46] Uijlings, J.R.R., Smeulders, A.W.M., and Scha, R.J.H. (2009) Real-time bag of words, approximately. ACM International Conference on Image and Video Retrieval.
[47] Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008) Speeded-up robust features (SURF). Computer Vision and Image Understanding, vol.110, pp. 346-359.
[48] Van de Sande, K.E.A., Gevers, T., and Snoek, C.G.M. (2011) Empowering visual categorization with the GPU. IEEE Transactions on Multimedia, vol. 13, no. 1, pp. 60-70.
[49] Reinartz, T. (2002) A unifying view on instance selection. Data Mining and Knowledge Discovery, vol. 6, pp. 191-210.
[50] Liu, H. and Motoda, H. (2001) Instance selection and construction for data mining. Springer.
[51] Elﬁky, N.M., Khan, F.S., Weijer, J., Gonz`alez, J. (2012) Discriminative compact pyramids for object and scene recognition. Pattern Recognition, vol. 45, no 4, pp. 1627-1636.
[52] Jiang, Y.-G., Yang, J., Ngo, C.-W., and Hauptmann, A.G. (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Transactions on Multimedia, vol. 12, no. 1, pp. 42-53.
[53] Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C. (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, vol. 73, no. 2, pp. 213-238.

指導教授

蔡志豐(Chih-Fong Tsai)

審核日期

2012-7-2

推文