博碩士論文 965202056 詳細資訊

姓名 宋哲偉(Che-wei Sung)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 以類神經網路為基礎的電影場景情緒分類
(Affective Classification of Movie Scenes Based on Artificial Neural Network)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 隨著科技的進步,電影的製作也日益更新,逐年增加,要如何在這龐大的資料量中幫助使用者快速尋找所欲瀏覽的影片內容,成為一個值得探討的議題。過去針對影片內容分析主要包含物件分類、類型(genre)分類與事件分類,但隨著情意計算(affective computing)的興起,情緒分類也逐漸為人重視。尤其在電影的拍攝手法中,從視覺的色彩、光線明暗等到聽覺的音樂曲調等,往往包含了導演所想表達的情感和場景氣氛,適合作為情緒分類上的輸入特徵。
摘要(英) With the development of technology, digital video collections are growing rapidly in recent years. More and more movies are released around the world and play an important role in our life. How to analyze the huge content to help viewers search a specific type of video effectively becomes one of major issues. In general, earlier video content-based analysis includes object-based classification, genre-based classification and event-based classification. With the growing of affective computing, emotion-based classification is also emphasized because the audiovisual cues in movies are helpful for affective content.
The purpose of this study is to construct an affective classification of movie scenes through video content-based analysis. First, a dataset of 119 different scenes from eleven movies were labeled manually and each scene can be described by multiple emotional labels, instead of single label as earlier studies. Fifty audiovisual features were extracted from all scenes for our classifier, self-organizing feature map. Then the hierarchical agglomerative algorithm was employed to merge similar clusters into groups. We implement the classification result to construct a retrieval system such that users can view movie scenes with similar emotion content.
The experiments showed that the average recall and average precision achieves 70%. It was turned out our study is an efficient way.
關鍵字(中) ★ 自我組織特徵映射圖網路
★ 影片內容式分析
★ 情意計算
關鍵字(英) ★ video content-based analysis
★ affective computing
論文目次 摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VII
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 2
1.4 論文架構 2
第二章 文獻探討 3
2.1 影片內容式分析 3
2.2 情意計算 5
2.3 自我組織特徵映射圖網路 8
第三章 系統實作 10
3.1系統架構 10
3.2 特徵擷取 11
3.2.1 視覺特徵 11
3.2.2 聽覺特徵 20
3.2.3 特徵擷取小結 24
3.3 SOM網路分群 25
3.3.1 晶格狀初始網路鍵結值 25
3.3.2 SOM網路學習 26
3.3.3 合併鄰近群 27
3.4 相似度計算 28
第四章 實驗結果與討論 29
4.1 測試資料與相關參數設定 29
4.2 得勝者公式之比較 32
4.3 階層式聚合演算法合併結果 36
4.4 與其他演算法之比較 40
第五章 總結 43
5.1 結論 43
5.2 研究貢獻 43
5.3 未來工作 43
參考文獻 45
參考文獻 Bolle, R. M., Yeo, B. L., & Yeung, M. M. (1997). Content-based digital video retrieval. International Broadcasting Convention, 160-165.
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.
Brezeale, D., & Cook, D. J. (2006). Using closed captions and visual features to classify movies by genre. Poster Session of the Seventh International Workshop on Multimedia Data Mining (MDM/KDD2006),
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal component analysis of facial expressions. Vision Research, 41(9), 1179-1208.
Carpenter, G. A., & Grossberg, S. (1987). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics, 26(23), 4919-4930.
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. Fourth International Conference on Spoken Language Processing, 1970-1973.
Dietz, R., & Lang, A. (1999). Affective agents: Effects of agent affect on arousal, attention, liking and learning. Proceedings of Cognitive Technology Conference, San Francisco, CA.
Ekman, P. (1992). Are there basic emotions. Psychological Review, 99(3), 550-553.
Ekman, P. (1999). Basic emotions. In Dalgleish, T., & Power, M. J. (Eds.). Handbook of Cognition and Emotion (pp. 45-60). England: Wiley.
Fischer, S., Lienhart, R., & Effelsberg, W. (1995). Automatic recognition of film genres. Proc. ACM Multimedia, 295-304.
Gianetti, L. D. (2005). Understanding movies (10th ed.) Pearson Education Canada Inc.
Gobl, C., & Nı́ Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1-2), 189-212.
Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 22-30.
Hanjalic, A. (2004). Content-based analysis of digital video. Norwell, MA: Kluwer Academic Publishers.
Hanjalic, A. (2006). Extracting moods from pictures and sounds: Towards truly personalized TV. IEEE Signal Processing Magazine, 23(2), 90-100.
Hanjalic, A., & Xu, L. Q. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143-154.
Huang, H. Y., Shih, W. S., & Hsu, W. H. (2007). Movie classification using visual effect features. 2007 IEEE Workshop on Signal Processing Systems, 295-300.
Kang, H. B. (2003). Affective content detection using HMMs. Proceedings of the Eleventh ACM International Conference on Multimedia, 259-262.
Kobayashi, H., & Hara, F. (1992). Recognition of six basic facial expression and their strength byneural network. IEEE International Workshop on Robot and Human Communication, 1992. Proceedings. 381-386.
Kohonen, T. (1989). Self-organization and associative memory (3rd ed.). Berlin New York: Springer-Verlag.
Kohonen, T. (1995). Self-organizing maps. Berlin New York: Springer-Verlag.
Li, D., Sethi, I. K., Dimitrova, N., & McGee, T. (2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5), 533-544.
Li, Y., Narayanan, S., & Kuo, C. C. J. (2004). Content-based movie analysis and indexing based on audiovisual cues. IEEE Transactions on Circuits and Systems for Video Technology, 14(8), 1073-1085.
Liu, Z., Huang, J., & Wang, Y. (1998). Classification TV programs based on audio information using hiddenMarkov model. 1998 IEEE Second Workshop on Multimedia Signal Processing, 27-32.
Liu, Z., Wang, Y., & Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. The Journal of VLSI Signal Processing, 20(1), 61-79.
Ortony, A. A., & Collins, A. A. (1988). The cognitive structure of emotions. Cambridge, MA: Cambridge university press.
Pavlovic, V. I., Sharma, R., & Huang, T. S. (1997). Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 677-695.
Pfeiffer, S., Fischer, S., & Effelsberg, W. (1997). Automatic audio content analysis. Proceedings of the Fourth ACM International Conference on Multimedia, 21-30.
Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT press.
Rasheed, Z., & Shah, M. (2002). Movie genre classification by exploiting audio-visual features of previews. Proceedings of IEEE International Conference on Pattern Recognition, 2, 1086-1089.
Rasheed, Z., & Shah, M. (2003). Scene detection in hollywood movies and TV shows. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 343-348.
Rasheed, Z., Sheikh, Y., & Shah, M. (2005). On the use of computable features for film classification. IEEE Transactions on Circuits and Systems for Video Technology, 15(1), 52-64.
Reilly, W. S. N. (1996). Believable social and emotional agents. Department of Computer Science, Carnegie Mellon University).
Roach, M. J., Mason, J. D., & Pawlewski, M. (2001). Video genre classification using dynamics. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1557-1560.
Rui, Y., Huang, T. S., & Mehrotra, S. (1998). Exploring video structure beyond the shots. IEEE International Conference on Multimedia Computing and Systems, 1998. Proceedings. 237-240.
Satoh, S., Nakamura, Y., & Kanade, T. (1999). Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1), 22-35.
Saunders, J., Co, L. M., & Nashua, N. H. (1996). Real-time discrimination of broadcast speech/music. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing , 2, 993-996.
Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 1331-1334.
Shearer, K., Dorai, C., & Venkatesh, S. (2000). Incorporating domain knowledge with video and voice data analysis in news broadcasts. ACM International Conference on Knowledge Discovery and Data Mining, 46-53.
Snoek, C. G. M., & Worring, M. (2005). Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1), 5-35.
Soleymani, M., Chanel, G., Kierkels, J., & Pun, T. (2008). Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia, 228-235.
Su, M. C., Liu, T. K., & Chang, H. T. (2002). Improving the self-organizing feature map algorithm using an efficient initialization scheme. Tamkang Journal of Science and Engineering, 5(1), 35-48.
Sudhir, G., Lee, J. C. M., & Jain, A. K. (1998). Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE International Workshop on Content-Based Access of Image and Video Database, 81-90.
Sugano, M., Furuya, M., Nakajima, Y., & Yanagihara, H. (2004). Shot classification and scene segmentation based on MPEG compressed movie analysis. IEEE Pacific Rim Conf. on Multimedia (PCM) 2004, 271-279.
Sugano, M., Isaksson, R., Nakajima, Y., & Yanagihara, H. (2003). Shot genre classification using compressed audio-visual features. Proceedings of IEEE International Conference Image Processing, 2, 17-20.
Tao, J., & Tan, T. (2005). Affective computing: A review. Proceedings of the First International Conference on Affective Computing & Intelligent Interaction (ACII’05). LNCS 3784. Springer, 981-995.
Vasconcelos, N., & Lippman, A. (2000). Statistical models of video structure for content analysis and characterization. IEEE Transactions on Image Processing, 9(1), 3-19.
Wactlar, H. D. (2001). The challenges of continuous capture, contemporaneous analysis, and cstomized summarization of video content. Defining a Motion Imagery Research and Development Program Workshop,
Wang, H. L., & Cheong, L. F. (2006). Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 689-704.
Wang, Y., Liu, Z., & Huang, J. C. (2000). Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine, 17(6), 12-36.
Wei, C. Y., Dimitrova, N., & Chang, S. F. (2004). Color-mood analysis of films based on syntactic and psychological models. IEEE International Conference on Multimedia and Expo, 2, 831-834.
Xiong, Z., Zhou, X. S., Tian, Q., Rui, Y., & Huang, T. S. (2006). Semantic retrieval of video. IEEE Signal Processing Magazine, 23(2), 18-27.
Yeung, M., Yeo, B. L., & Liu, B. (1998). Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 71(1), 94-109.
Yoo, H. W. (2008). Retrieval of movie scenes by semantic matrix and automatic feature weight update. Expert Systems with Applications, 34(4), 2382-2395.
Zhang, H. J., Wu, J., Zhong, D., & Smoliar, S. W. (1997). An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30(4), 643-658.
Zhang, S., Tian, Q., Jiang, S., Huang, Q., & Gao, W. (2008). Affective MTV analysis based on arousal and valence features. IEEE International Conference on Multimedia and Expo, 1369-1372.
Zhang, T., & Kuo, C. C. J. (1999). Hierarchical classification of audio data for archiving andretrieving. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 6, 3001-3004.
Zhao, L., Qi, W., Wang, Y. J., Yang, S. Q., & Zhang, H. J. (2001). Video shot grouping using best-first model merging. Proc. 13th SPIE Symposium on Electronic Imaging--Storage and Retrieval for Image and Video Databases, 262-269.
廖家慧 (2007)。 基於電影拍攝手法之電影場景情緒探勘。國立政治大學資訊科學研究所碩士論文。
蔡其澂 (2008)。 開發場景導向之影片式語料庫檢索系統輔助英語口語體理解。國立中央大學網路學習科技研究所碩士論文。
蘇木春、張孝德 (1999)。 機器學習:類神經網路、模糊系統以及基因演算法則。 台北市:全華科技圖書股份有限公司。
指導教授 楊接期(Jie-chi Yang) 審核日期 2009-7-17
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡