DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, Seong Hyeon | ko |
dc.contributor.author | Tack, Jihoon | ko |
dc.contributor.author | Heo, Byeongho | ko |
dc.contributor.author | Ha, Jung-Woo | ko |
dc.contributor.author | Shin, Jinwoo | ko |
dc.date.accessioned | 2023-03-28T06:00:19Z | - |
dc.date.available | 2023-03-28T06:00:19Z | - |
dc.date.created | 2023-03-08 | - |
dc.date.issued | 2022-10 | - |
dc.identifier.citation | 17th European Conference on Computer Vision (ECCV), pp.160 - 176 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.uri | http://hdl.handle.net/10203/305865 | - |
dc.description.abstract | For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transformers are built upon patches of video frames, we propose to sample patches rather than frames using the greedy K-center search, i.e., the farthest patch to what has been chosen so far is sampled iteratively. We then show that a transformer trained with the selected video patches can outperform its baseline trained with the video frames sampled in the traditional way. Furthermore, by adding a certain spatiotemporal structuredness condition, the proposed K-centered patch sampling can be even applied to the recent sophisticated video transformers, boosting their performance further. We demonstrate the superiority of our method on Something-Something and Kinetics datasets. | - |
dc.language | English | - |
dc.publisher | SPRINGER INTERNATIONAL PUBLISHING AG | - |
dc.title | K-centered Patch Sampling for Efficient Video Recognition | - |
dc.type | Conference | - |
dc.identifier.wosid | 000903538700010 | - |
dc.identifier.scopusid | 2-s2.0-85144529452 | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 160 | - |
dc.citation.endingpage | 176 | - |
dc.citation.publicationname | 17th European Conference on Computer Vision (ECCV) | - |
dc.identifier.conferencecountry | IS | - |
dc.identifier.conferencelocation | Tel Aviv | - |
dc.identifier.doi | 10.1007/978-3-031-19833-5_10 | - |
dc.contributor.localauthor | Shin, Jinwoo | - |
dc.contributor.nonIdAuthor | Heo, Byeongho | - |
dc.contributor.nonIdAuthor | Ha, Jung-Woo | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.