RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

Abstract

The visual relationship recognition (VRR) task aims at understanding the pairwise visual relationships between interacting objects in an image. This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. The method, called RelTransformer, represents each image as a fully-connected scene graph and restructures the whole scene into the relation-triplet and global-scene contexts.

Publication
The Conference on Computer Vision and Pattern Recognition 2022