Intelligent and interactive sports video analysis systems have shown significant progress in recent years. However, most of the improvements are done in detection and tracking algorithms. This thesis work addresses the problem of automatic player identification system in broadcast sports videos filmed with a single side-view medium distance camera. Player identification in this settings is a challenging task because visual cues such as faces and jersey numbers are not clearly visible. Thus, this task requires more sophisticated approaches to capture distinctive features from players to distinguish them.
For reliable identification system, it is necessary to find some features with high level semantic meanings. Because players’ appearance is very similar and confusing. To this end, we use powerful Convolutional Neural Networks (CNN) features with richer information extracted at multiple scales and encode them with Fisher Vector method which has ability to capture and magnify the small differences. We also investigate the distinguishing parts of the players and present Deformable Part Model (DPM) based pooling approach to use these distinctive feature points.
The resulting image representation is able to identify players even in difficult scenes. It achieves state-of-the-art results up to 96% on NBA basketball clips.