About ESPNets

ESPNets are state-of-the-art CNN-based architectures that adhere to restrictive constraints, such as limited memory, limited computational power, and limited energy, of edge devices. ESPNets are built on a new convolutional unit, ESP (Efficient spatial pyramid), that decomposes a standard convolution into two steps: () point-wise convolutions and (2) spatial pyramid of dilated convolutions. The point-wise convolutions help in reducing the computation, while the spatial pyramid of dilated convolutions re-samples the feature maps to learn the representations from large effective receptive field. We show that ESPNets are more efficient than the state-of-the-art methods, including MobileNets and ShuffleNets, and achieves better performance across different computer vision tasks, including object classification, semantic segmentation, and object detection.

ESP Unit
Fig. 1 A kernel-level comparison between the standard convolution and the ESP unit.

ESPNet Papers

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

Sachin Mehta, Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi CVPR  2019

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi ECCV  2018

Qualitative Results


Our PyTorch implementation along with pre-trained models on different datasets, including the ImageNet dataset, are publicly available.