Extremely Compact Non-local Representation Learning
Ansheng You,Xiangzeng Zhou,Yingya Zhang,Pan Pan,Yinghui Xu
In contrast to regular convolutions with local receptive fields, non-local operations have widely proven an effective method for modeling long-range dependencies. Although lots of prior works have been proposed, prohibitive computation and GPU memory occupation are still the major concerns. Different from that carrying out non-local operations pixel-wise or channel-wise in a computation intensive way, we argue that we can achieve effective non-local operation using a more compact high-order statistic, which can be computed more efficiently and may convey some high-level information. In this paper, we propose an extremely compact non-local learning module (CoNL) with high-order reasoning based on a graph convolution as the core. In our CoNL, a global Hadamard pooling (GHP) as a non-local operation is used to extract a compact second-order feature vector from the input tensor. With the help of a light-weight graph convolution network (GCN), this high-order compact vector is further refined with high-level reasoning. After the GCN refinement, the compact high-order vector intuitively indicates some global semantic characteristics, and is eventually applied to enhance the input tensor through a channel scaling operation. The CoNL module is designed easily pluggable to upgrade existing networks. Extensive experiments on a wide range of tasks demonstrate the effectiveness and efficiency of our work. The proposed CoNL can achieve comparable or superior performance over previous state-of-the-art baselines on video recognition, semantic segmentation, object detection and instance segmentation tasks. For a 96 x 96 x 2048 input, our block consumes 13.6 x less in computational cost than non-local block while 7.6 x smaller in GPU memory occupation.


