1

OneFormer: One Transformer to Rule Universal Image Segmentation

Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. We propose OneFormer, a universal image …

Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand

In this paper, we revisited the core design ideas of stateof-the-art deep inpainting networks. We propose an intuitive and effective inpainting architecture that augments the powerful comodulated StyleGAN2 generator with the high receptiveness …

Language Guided Meta-Control for Embodied Instruction Following

Embodied Instruction Following (EIF) is a challenging problem requiring an agent to infer a sequence of actions to achieve a goal environment state from complex language and visual inputs. We propose a generalised Language Guided Meta-Controller …

Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks

In this paper, we address the problem of offensive language detection on Twitter, while also detecting the type and the target of the offence. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the …

On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules

Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering as models pre-trained on large source code corpora. Although adapters are known to facilitate adapting to many downstream tasks compared to …

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

The visual relationship recognition (VRR) task aims at understanding the pairwise visual relationships between interacting objects in an image. This paper shows that modeling an effective message-passing flow through an attention mechanism can be …

SeMask: Semantically Masked Transformers for Semantic Segmentation

Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the …

Exploring Long tail Visual Relationship Recognition with Large Vocabulary

Several approaches have been proposed in recent literature to alleviate the long-tail problem, mainly in object classification tasks. In this paper, we make the first large-scale study concerning the task of Long-Tail Visual Relationship Recognition …

Multimodal Multi-Task Financial Risk Forecasting

Stock price movement and volatility prediction aim to predict stocks' future trends to help investors make sound investment decisions and model financial risk. Companies' earnings calls are a rich, underexplored source of multimodal information for …

DEAP Cache: Deep Eviction Admission and Prefetching for Cache

Recent approaches for learning policies to improve caching, target just one out of the prefetching, admission and eviction processes. In contrast, we propose an end to end pipeline to learn all three policies using machine learning. We also take …