Due to the explosive growth of video data archives, the need to accurately index, search, and localize the desired video information for the purpose of efficient manipulation has been increased. Traditional video database access techniques based on the textual information have many drawbacks. To solve these problems, visual content based video retrieval methods have been introduced and developed by many researchers in recent years. The first step commonly taken for content-based digital video representation and retrieval is to detect shot boundaries. After the shot boundaries are identified, key frames are selected in each shot. With these key frames, most of the existing works compute the visual similarity between two video segments based on the key frames in order to retrieve desired video segments or represent and browse a video in a scene-transitive way using such tools as video poster or scene-transition graph. Therefore, the two key issues in the manipulation of video data for the purpose of information retrieval are video data structuring and video database searching and retrieval, which are the main components of this thesis.
First, as for video database searching and retrieval, we present a new distance measure and frame-level searching framework. Although most of users will typically be interested in the shot-level search and retrieval of video, exact locations at the frame-level may be necessary in many situations. However, little work has been performed on the frame-level video search. Even some existing methods usually concentrate on the definition of visual similarity and searching framework has attracted almost no attention. In this thesis, we focus on the exact localization of the video segments of interest at the frame level. The uniqueness of our approach lies in the computation of visual similarity measures adequate for frame-level video search and the candidate video segment selection method based on the visual content variation.
Secondly, a...