Dynamic texture and scene recognition using deep CNN features from key frames and key segments키-프레임 및 키-세크먼트로부터 얻어진 심층 CNN 특징을 이용한 동적 텍스처 및 장면 인식

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 347
  • Download : 0
Recognizing dynamic textures or scenes is one of the fundamental problems in natural scene understanding, which categorizes moving scenes such as a forest fire, landslide, or avalanche. Over the past decade, considerable efforts have been devoted to these issues. While existing methods focus on reliable capturing of spatial and temporal information of moving patterns, few works have explored frame selection strategies. However, a sequence is likely to include irrelevant frames that appear suddenly or rarely in a particular texture or scene category. In this dissertation, we suggest a codebook-based dynamic texture descriptor that aggregates salient features on three orthogonal planes. Given a sequence, only those frame features that are highly correlated with each visual word are selected and aggregated from the perspective of non-Euclidean geometry. The proposed descriptor removes the feature from outlier frames that suddenly or rarely appear in a particular context, thus enhancing the emphasis of the salient features. By extending this study, we also propose a dynamic scene recognition framework using deep convolutional neural networks. Instead of using whole frames, random frames or partially consecutive frames as in conventional approaches, we used `key frames' and `key segments.' Key frames that reflect the feature distribution of the sequence with a small number are used for capturing salient static appearances. Key segments, which are captured from the area around each key frame, provide an additional discriminative power by dynamic patterns. A fully connected layer from deep convolutional neural networks is used to select the key frames and key segments, while the convolutional layer is used to describe them. Features from key frames and key segments are then aggregated separately and combined into an efficient video-level descriptor. The evaluation results on public dynamic texture and scene datasets demonstrated the state-of-the-art performance of the proposed methods.
Advisors
Yang, Hyun Seungresearcher양현승researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2018.2,[iv, 48 p. :]

URI
http://hdl.handle.net/10203/265335
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=734426&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0