應用於安全監控之深度學習多媒體處理技術;Deep-Learning-Based Multimedia Processing and Its Applications to Surveillance

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/74746

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/74746

題名:	應用於安全監控之深度學習多媒體處理技術;Deep-Learning-Based Multimedia Processing and Its Applications to Surveillance
作者:	王建堯;Wang, Chien-Yao
貢獻者:	資訊工程學系
關鍵詞:	深度學習;智慧型監控;Deep Learning;Intelligent Surveillance
日期:	2017-08-18
上傳時間:	2017-10-27 14:38:09 (UTC+8)
出版者:	國立中央大學
摘要:	安全監控系統日趨重要，在台灣由視訊監控系統破獲的刑事案件從2007年的1%到2016年第一季已達到19.83%。然而傳統的監控系統仰賴人工被動監視，這使得監控系統經常用作被動式的事後追查，而無法在緊急狀況發生時有效的遏止事故或犯罪的產生。另外，全球的監視攝影機將於2020年達到每秒300億張的資料量，而人力亦無法負擔及處理如此龐大的資料量。因此，開發一個有效的主動式智慧監控系統是極其重要的。深度學習在近年來於多媒體巨量資料分析上帶來了極大的成功，期能有效且快速的將龐大的資料話為有用的資訊。本論文將基於深度學習多媒體訊號處理技術，設計適合運用於智慧監控系統的技術。適用於主動式監控系統的感測器主要為攝影機與麥克風，在本論文中分別針對以聲音為基礎以及以視訊為基礎的監控系統開發智慧影音分析技術。以視訊為基礎的監控系統其優點為能夠明確的觀察到發生的事件，然其經常會有死角或較易受到環境變化的干擾。而以聲音為基礎的監控系統其優點則是能夠觀測到來自四面八方的聲音，並對其進行分析與辨識。本論文中開發了基於聲音的聲音事件辨識與偵測深度學習技術，以及基於視訊的影像切割、動作辨識、以及群體提取技術。在聲音事件辨識與偵測系統中，基於人類聽覺感知模型，本論文設計了聽覺感知二值化模式聲學特徵，並設計能夠階層式地提取有效抽象鑑別性特徵做分類的深度神經網路架構-階層式跳台型深度信念網路。在影像語意切割中，提出的階層式聯合引導網路運用了提出的物件邊界預測聯合學習網路得到的物件邊界資訊以提出的聯合引導與遮罩網路調適影像切割結果。於行為辨識系統中，提出的動態追蹤運動注意力模型考慮了物體在影片中的動態變化資訊用以做行為辨識。在群體提取系統中，使用非監督轉移學習方式結合物件性映射圖提取網路與物件追蹤網路達到影片中的動態群體提取。;Surveillance systems are becoming important. The criminal cases cracked by the video surveillance system, from 1% in 2007 to 19.83% in the first season of 2016. However, the traditional surveillance system relies on manual monitoring; this makes the surveillance system often used as a passive post-tracing, also cannot effectively prevent accidents or crimes when an emergency occurs. Otherwise, the global surveillance cameras will reach 30 billion frames per second by 2020; humans can’t afford to deal with such huge data. Therefore, it is important to develop an active intelligent surveillance system. Recently, deep learning brings great success in the multimedia data analysis; it can effectively and quickly turn a lot of data into useful information. This dissertation will be based on the deep learning multimedia signal processing technology to design for use in intelligent surveillance systems. Sensors suitable for active surveillance systems are cameras and microphones. In this dissertation, the surveillance system is based on the sound and vision to develop an intelligent sound and video analysis technology. The surveillance system based on the vision is able to clearly observe the occurrence of events. However, there is often a blind side or is susceptible to environmental changes. The surveillance system based on the sound is able to observe the sound from all directions, and analysis and recognition. In this dissertation, to develop a deep learning technology of the sound event recognition and detection based on the sound, and image segmentation, action recognition and group proposal technology based on the vision. For sound event recognition and detection, a new deep neural network system, called hierarchical-diving deep belief network (HDDBN), is proposed to classify and detect sound event. The proposed system learns several forms of abstract knowledge from proposed auditory-receptive-field binary pattern (ARFBP) visual audio descriptor that support the knowledge transfer from previously learned concepts to useful representations. For semantic image segmentation, proposed hierarchical joint-guided network (HJGN) using our designed object boundary prediction hierarchical joint learning convolutional network (OBP-HJLCN) to guide segmentation results. For action recognition, The proposed motion attention model, called the dynamic tracking attention model (DTAM), not only considers the information about motion but also perform dynamic tracking of objects in videos. For group proposal, an unsupervised group proposal network (GPN) is developed by combined proposed objectness map generation network and proposed object tracklet network.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	316	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....