Non-local Neural Networks

Peng Liu June 23, 2018

Abstract

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Local & non-local

Local computing

Small Receptive field

Such as 3×3, 5×5 kernel

Focus On local region convolution computing

Learn More local features but less global information

Non-local Local computing

Fully connection convolutional layer is an example

Another way to increase the receptive field

Stack more convolutional layers

Use the large size of filters

Motivation & Main Idea

a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps

Related Work

Non-local means [4] is a classical filtering algorithm that computes a weighted mean of all pixels in an image.

CVPR 2005; A Non-local Algorithm for Image Denoising; Antoni Buades, Bartomeu Coll, Jean-Michel Morel

The non-local behavior in Eq.(1) is due to the fact that all positions (∀j) are considered in the operation

More details:Non-local Neural Networks__Club_v1

Code: https://github.com/titu1994/keras-non-local-nets

Reference

[1] Non-Local Neural Networks

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794-7803

[2] Videos as Space-Time Region Graphs

Xiaolong Wang, Abhinav Gupta Submitted to ECCV 2018 The Follow-up Wor

[3] zhuanlan.zhihu.com/p/33345791