About ESPNets

ESPNets are state-of-the-art CNN-based architectures that adhere to restrictive constraints, such as limited memory, limited computational power, and limited energy, of edge devices. ESPNets are built on a new convolutional unit, ESP (Efficient spatial pyramid), that decomposes a standard convolution into two steps: () point-wise convolutions and (2) spatial pyramid of dilated convolutions. The point-wise convolutions help in reducing the computation, while the spatial pyramid of dilated convolutions re-samples the feature maps to learn the representations from large effective receptive field. We show that ESPNets are more efficient than the state-of-the-art methods, including MobileNets and ShuffleNets, and achieves better performance across different computer vision tasks, including object classification, semantic segmentation, and object detection.

ESP Unit
Fig. 1 A kernel-level comparison between the standard convolution and the ESP unit.

ESPNet Papers

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

Sachin Mehta, Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi CVPR  2019

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi ECCV  2018

Qualitative Results

CODE AND MODELS

Our PyTorch implementation along with pre-trained models on different datasets, including the ImageNet dataset, are publicly available.