RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny

March 2022

Preprint PDF Code

Abstract

The visual relationship recognition (VRR) task aims at understanding the pairwise visual relationships between interacting objects in an image. This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. The method, called RelTransformer, represents each image as a fully-connected scene graph and restructures the whole scene into the relation-triplet and global-scene contexts.

Type

Conference paper

Publication

The Conference on Computer Vision and Pattern Recognition 2022