Perceptual Reasoning and Interaction Research

About ESPNets

ESPNets are state-of-the-art CNN-based architectures that adhere to restrictive constraints, such as limited memory, limited computational power, and limited energy, of edge devices. ESPNets are built on a new convolutional unit, ESP (Efficient spatial pyramid), that decomposes a standard convolution into two steps: () point-wise convolutions and (2) spatial pyramid of dilated convolutions. The point-wise convolutions help in reducing the computation, while the spatial pyramid of dilated convolutions re-samples the feature maps to learn the representations from large effective receptive field. We show that ESPNets are more efficient than the state-of-the-art methods, including MobileNets and ShuffleNets, and achieves better performance across different computer vision tasks, including object classification, semantic segmentation, and object detection.

ESP Unit — **Fig. 1** A kernel-level comparison between the standard convolution and the ESP unit.

ESPNet Papers

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

Sachin Mehta, Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi • CVPR • 2019

PDF View PDF
Semantic Scholar View and cite on Semantic Scholar
arXiv View on arXiv

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi • ECCV • 2018

PDF View PDF
Semantic Scholar View and cite on Semantic Scholar
arXiv View on arXiv

Qualitative Results

CODE AND MODELS

Our PyTorch implementation along with pre-trained models on different datasets, including the ImageNet dataset, are publicly available.

ESPNets for Computer Vision

Efficient CNNs for Edge Devices