以類神經網路為基礎的電影場景情緒分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：6

、訪客IP：18.221.187.121

姓名

宋哲偉(Che-wei Sung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

以類神經網路為基礎的電影場景情緒分類
(Affective Classification of Movie Scenes Based on Artificial Neural Network)

相關論文

★ 數位遊戲式學習環境對能源節約之影響評估	★ Using Digital Board Game to Enhance Student Engagementin Learning
★ 從人因與互動行為模式的觀點探討數位遊戲式學習輔助能源知識	★ 探討認知風格於數位遊戲式英語學習環境對遊戲行為與學習成效之影響
★ 由空間能力探討遊戲式英語學習如何影響學習者之遊戲行為和遊戲表現	★ 探討先備知識及學習風格在角色扮演遊戲中對英語字彙習得成效與行為模式之影響
★ 從全面性的角度探討先備知識對同儕互評中受評與評分之影響	★ 從認知風格的角度探討同儕互評分組對遊戲製作與評量之影響
★ 探討創作媒介、個別差異、範例式教學及創作模式對九年級學生音樂創作的學習動機及成效之影響	★ 探討個別差異與回饋形式在數位遊戲式學習系統中對學習動機、學習成效與遊戲表現之影響：以九年級國文學習為例
★ 探討趨向表現目標與逃避表現目標對於學習成效與表現目標採取之影響－以數位遊戲式英語字彙為例	★ 探討英語焦慮與先備知識對英語發音學習成效、獎章成效、遊戲成效、學習動機及遊戲心流之影響──以大型多人線上角色扮演遊戲為例
★ 探討認知風格及遊戲心流對英語字彙學習成效、遊戲成效與自我效能之影響—以多人線上角色扮演遊戲為例	★ 從認知風格的角度探討同儕互評對遊戲式學習系統製作與評量之影響
★ 電腦輔助教師回饋於外語寫作情境之研究：成果與觀感	★ 探討英語閱讀遊戲對印尼高中英語學習者的影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著科技的進步，電影的製作也日益更新，逐年增加，要如何在這龐大的資料量中幫助使用者快速尋找所欲瀏覽的影片內容，成為一個值得探討的議題。過去針對影片內容分析主要包含物件分類、類型(genre)分類與事件分類，但隨著情意計算(affective computing)的興起，情緒分類也逐漸為人重視。尤其在電影的拍攝手法中，從視覺的色彩、光線明暗等到聽覺的音樂曲調等，往往包含了導演所想表達的情感和場景氣氛，適合作為情緒分類上的輸入特徵。
本研究旨在透過影片的內容式分析，建置一個電影場景的情緒自動分類法。先以人工標記119部電影場景，且不同於傳統單一情緒標記方式，改以多情緒方式加標，試圖達到多情緒分類。之後自所有場景中擷取事先定義的視覺與聽覺特徵值，共50維的特徵向量。利用自我組織特徵映射圖網路演算法作場景分群，再以階層式聚合演算法合併相近群聚，改善群聚過多情況。最後以上述分類方式，實作於影片檢索系統上，在使用者瀏覽影片時，同時回饋內容情緒相似的場景，以達快速觀看同類型的影片。
透過實驗發現，本研究所提出的分類方式，最後所得的分類結果其平均recall、precision超過70%，為不錯的表現。

摘要(英)

With the development of technology, digital video collections are growing rapidly in recent years. More and more movies are released around the world and play an important role in our life. How to analyze the huge content to help viewers search a specific type of video effectively becomes one of major issues. In general, earlier video content-based analysis includes object-based classification, genre-based classification and event-based classification. With the growing of affective computing, emotion-based classification is also emphasized because the audiovisual cues in movies are helpful for affective content.
The purpose of this study is to construct an affective classification of movie scenes through video content-based analysis. First, a dataset of 119 different scenes from eleven movies were labeled manually and each scene can be described by multiple emotional labels, instead of single label as earlier studies. Fifty audiovisual features were extracted from all scenes for our classifier, self-organizing feature map. Then the hierarchical agglomerative algorithm was employed to merge similar clusters into groups. We implement the classification result to construct a retrieval system such that users can view movie scenes with similar emotion content.
The experiments showed that the average recall and average precision achieves 70%. It was turned out our study is an efficient way.

關鍵字(中)

★ 自我組織特徵映射圖網路
★ 影片內容式分析
★ 情意計算

關鍵字(英)

★ video content-based analysis
★ affective computing

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VII
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 2
1.4 論文架構 2
第二章文獻探討 3
2.1 影片內容式分析 3
2.2 情意計算 5
2.3 自我組織特徵映射圖網路 8
第三章系統實作 10
3.1系統架構 10
3.2 特徵擷取 11
3.2.1 視覺特徵 11
3.2.2 聽覺特徵 20
3.2.3 特徵擷取小結 24
3.3 SOM網路分群 25
3.3.1 晶格狀初始網路鍵結值 25
3.3.2 SOM網路學習 26
3.3.3 合併鄰近群 27
3.4 相似度計算 28
第四章實驗結果與討論 29
4.1 測試資料與相關參數設定 29
4.2 得勝者公式之比較 32
4.3 階層式聚合演算法合併結果 36
4.4 與其他演算法之比較 40
第五章總結 43
5.1 結論 43
5.2 研究貢獻 43
5.3 未來工作 43
參考文獻 45

參考文獻

Bolle, R. M., Yeo, B. L., & Yeung, M. M. (1997). Content-based digital video retrieval. International Broadcasting Convention, 160-165.
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771.
Brezeale, D., & Cook, D. J. (2006). Using closed captions and visual features to classify movies by genre. Poster Session of the Seventh International Workshop on Multimedia Data Mining (MDM/KDD2006),
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal component analysis of facial expressions. Vision Research, 41(9), 1179-1208.
Carpenter, G. A., & Grossberg, S. (1987). ART 2: Self-organization of stable category recognition codes for analog input patterns. Applied Optics, 26(23), 4919-4930.
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. Fourth International Conference on Spoken Language Processing, 1970-1973.
Dietz, R., & Lang, A. (1999). Affective agents: Effects of agent affect on arousal, attention, liking and learning. Proceedings of Cognitive Technology Conference, San Francisco, CA.
Ekman, P. (1992). Are there basic emotions. Psychological Review, 99(3), 550-553.
Ekman, P. (1999). Basic emotions. In Dalgleish, T., & Power, M. J. (Eds.). Handbook of Cognition and Emotion (pp. 45-60). England: Wiley.
Fischer, S., Lienhart, R., & Effelsberg, W. (1995). Automatic recognition of film genres. Proc. ACM Multimedia, 295-304.
Gianetti, L. D. (2005). Understanding movies (10th ed.) Pearson Education Canada Inc.
Gobl, C., & Nı́ Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1-2), 189-212.
Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 22-30.
Hanjalic, A. (2004). Content-based analysis of digital video. Norwell, MA: Kluwer Academic Publishers.
Hanjalic, A. (2006). Extracting moods from pictures and sounds: Towards truly personalized TV. IEEE Signal Processing Magazine, 23(2), 90-100.
Hanjalic, A., & Xu, L. Q. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143-154.
Huang, H. Y., Shih, W. S., & Hsu, W. H. (2007). Movie classification using visual effect features. 2007 IEEE Workshop on Signal Processing Systems, 295-300.
Kang, H. B. (2003). Affective content detection using HMMs. Proceedings of the Eleventh ACM International Conference on Multimedia, 259-262.
Kobayashi, H., & Hara, F. (1992). Recognition of six basic facial expression and their strength byneural network. IEEE International Workshop on Robot and Human Communication, 1992. Proceedings. 381-386.
Kohonen, T. (1989). Self-organization and associative memory (3rd ed.). Berlin New York: Springer-Verlag.
Kohonen, T. (1995). Self-organizing maps. Berlin New York: Springer-Verlag.
Li, D., Sethi, I. K., Dimitrova, N., & McGee, T. (2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22(5), 533-544.
Li, Y., Narayanan, S., & Kuo, C. C. J. (2004). Content-based movie analysis and indexing based on audiovisual cues. IEEE Transactions on Circuits and Systems for Video Technology, 14(8), 1073-1085.
Liu, Z., Huang, J., & Wang, Y. (1998). Classification TV programs based on audio information using hiddenMarkov model. 1998 IEEE Second Workshop on Multimedia Signal Processing, 27-32.
Liu, Z., Wang, Y., & Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. The Journal of VLSI Signal Processing, 20(1), 61-79.
Ortony, A. A., & Collins, A. A. (1988). The cognitive structure of emotions. Cambridge, MA: Cambridge university press.
Pavlovic, V. I., Sharma, R., & Huang, T. S. (1997). Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 677-695.
Pfeiffer, S., Fischer, S., & Effelsberg, W. (1997). Automatic audio content analysis. Proceedings of the Fourth ACM International Conference on Multimedia, 21-30.
Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT press.
Rasheed, Z., & Shah, M. (2002). Movie genre classification by exploiting audio-visual features of previews. Proceedings of IEEE International Conference on Pattern Recognition, 2, 1086-1089.
Rasheed, Z., & Shah, M. (2003). Scene detection in hollywood movies and TV shows. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2, 343-348.
Rasheed, Z., Sheikh, Y., & Shah, M. (2005). On the use of computable features for film classification. IEEE Transactions on Circuits and Systems for Video Technology, 15(1), 52-64.
Reilly, W. S. N. (1996). Believable social and emotional agents. Department of Computer Science, Carnegie Mellon University).
Roach, M. J., Mason, J. D., & Pawlewski, M. (2001). Video genre classification using dynamics. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1557-1560.
Rui, Y., Huang, T. S., & Mehrotra, S. (1998). Exploring video structure beyond the shots. IEEE International Conference on Multimedia Computing and Systems, 1998. Proceedings. 237-240.
Satoh, S., Nakamura, Y., & Kanade, T. (1999). Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1), 22-35.
Saunders, J., Co, L. M., & Nashua, N. H. (1996). Real-time discrimination of broadcast speech/music. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing , 2, 993-996.
Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 1331-1334.
Shearer, K., Dorai, C., & Venkatesh, S. (2000). Incorporating domain knowledge with video and voice data analysis in news broadcasts. ACM International Conference on Knowledge Discovery and Data Mining, 46-53.
Snoek, C. G. M., & Worring, M. (2005). Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1), 5-35.
Soleymani, M., Chanel, G., Kierkels, J., & Pun, T. (2008). Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia, 228-235.
Su, M. C., Liu, T. K., & Chang, H. T. (2002). Improving the self-organizing feature map algorithm using an efficient initialization scheme. Tamkang Journal of Science and Engineering, 5(1), 35-48.
Sudhir, G., Lee, J. C. M., & Jain, A. K. (1998). Automatic classification of tennis video for high-level content-based retrieval. Proc. IEEE International Workshop on Content-Based Access of Image and Video Database, 81-90.
Sugano, M., Furuya, M., Nakajima, Y., & Yanagihara, H. (2004). Shot classification and scene segmentation based on MPEG compressed movie analysis. IEEE Pacific Rim Conf. on Multimedia (PCM) 2004, 271-279.
Sugano, M., Isaksson, R., Nakajima, Y., & Yanagihara, H. (2003). Shot genre classification using compressed audio-visual features. Proceedings of IEEE International Conference Image Processing, 2, 17-20.
Tao, J., & Tan, T. (2005). Affective computing: A review. Proceedings of the First International Conference on Affective Computing & Intelligent Interaction (ACII’05). LNCS 3784. Springer, 981-995.
Vasconcelos, N., & Lippman, A. (2000). Statistical models of video structure for content analysis and characterization. IEEE Transactions on Image Processing, 9(1), 3-19.
Wactlar, H. D. (2001). The challenges of continuous capture, contemporaneous analysis, and cstomized summarization of video content. Defining a Motion Imagery Research and Development Program Workshop,
Wang, H. L., & Cheong, L. F. (2006). Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 689-704.
Wang, Y., Liu, Z., & Huang, J. C. (2000). Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine, 17(6), 12-36.
Wei, C. Y., Dimitrova, N., & Chang, S. F. (2004). Color-mood analysis of films based on syntactic and psychological models. IEEE International Conference on Multimedia and Expo, 2, 831-834.
Xiong, Z., Zhou, X. S., Tian, Q., Rui, Y., & Huang, T. S. (2006). Semantic retrieval of video. IEEE Signal Processing Magazine, 23(2), 18-27.
Yeung, M., Yeo, B. L., & Liu, B. (1998). Segmentation of video by clustering and graph analysis. Computer Vision and Image Understanding, 71(1), 94-109.
Yoo, H. W. (2008). Retrieval of movie scenes by semantic matrix and automatic feature weight update. Expert Systems with Applications, 34(4), 2382-2395.
Zhang, H. J., Wu, J., Zhong, D., & Smoliar, S. W. (1997). An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30(4), 643-658.
Zhang, S., Tian, Q., Jiang, S., Huang, Q., & Gao, W. (2008). Affective MTV analysis based on arousal and valence features. IEEE International Conference on Multimedia and Expo, 1369-1372.
Zhang, T., & Kuo, C. C. J. (1999). Hierarchical classification of audio data for archiving andretrieving. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 6, 3001-3004.
Zhao, L., Qi, W., Wang, Y. J., Yang, S. Q., & Zhang, H. J. (2001). Video shot grouping using best-first model merging. Proc. 13th SPIE Symposium on Electronic Imaging--Storage and Retrieval for Image and Video Databases, 262-269.
廖家慧 (2007)。基於電影拍攝手法之電影場景情緒探勘。國立政治大學資訊科學研究所碩士論文。
蔡其澂 (2008)。開發場景導向之影片式語料庫檢索系統輔助英語口語體理解。國立中央大學網路學習科技研究所碩士論文。
蘇木春、張孝德 (1999)。機器學習：類神經網路、模糊系統以及基因演算法則。台北市：全華科技圖書股份有限公司。

指導教授

楊接期(Jie-chi Yang)

審核日期

2009-7-17

推文