Segmenting 2K-Videos at 36.5 FPS with 24.3 GFLOPs: Accurate and Lightweight Realtime Semantic Segmentation Network

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 84
  • Download : 0
We propose a fast and lightweight end-to-end convolutional network architecture for real-time segmentation of high resolution videos, NfS-SegNet, that can segement 2K-videos at 36.5 FPS with 24.3 GFLOPS. This speed and computation-efficiency is due to following reasons: 1) The encoder network, NfS-Net, is optimized for speed with simple building blocks without memory-heavy operations such as depthwise convolutions, and outperforms state-of-the-art lightweight CNN architectures such as SqueezeNet [2], Mo- bileNet v1 [3] v2 [4] and ShuffleNet v1 [5] v2 [6] on image classification with significantly higher speed. 2) The NfS- SegNet has an asymmetric architecture with deeper encoder and shallow decoder, whose design is based on our empirical finding that the decoder is the main bottleneck in computation with relatively small contribution to the final performance. 3) Our novel uncertainty-aware knowledge distillation method guides the teacher model to focus its knowledge transfer on the most difficult image regions. We validate the performance of NfS-SegNet with the CITYSCAPE [1] benchmark, on which it achieves state-of-the-art performance among lightweight segementation models in terms of both accuracy and speed.
Institute of Electrical and Electronics Engineers Inc.
Issue Date

IEEE International Conference on Robotics and Automation, ICRA 2020, pp.3153 - 3160

Appears in Collection
RIMS Conference Papers
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0