The Vision and Language Group, part of ACM IIT Roorkee Chapter, is a student run group that aims to foster a research-centric Deep Learning Community at IIT Roorkee. We regularly hold open discussions on various DL, CV, NLP papers presented in the latest conferences/journals and also on various general topics pertaining to the Deep Learning field. These discussions are open for anyone to join in.
Apart from this, the group members are also involved in various research based projects, sometimes in collaboration with other professors, with the ultimate goal to bring forth a positive impact in a sub-field we are interested in and also aim for some of the tier-1 conferences.
We are constantly looking for new collaborations, so do contact us if you find our work interesting. Also you can follow us up on Facebook and Twitter to receive updates about our activities.
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. We propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design.
In this paper, we revisited the core design ideas of stateof-the-art deep inpainting networks. We propose an intuitive and effective inpainting architecture that augments the powerful comodulated StyleGAN2 generator with the high receptiveness ability of FFC to achieve equally good performance on both textures and structures.
Embodied Instruction Following (EIF) is a challenging problem requiring an agent to infer a sequence of actions to achieve a goal environment state from complex language and visual inputs. We propose a generalised Language Guided Meta-Controller (LMC) for better language grounding in the large action space of the embodied agent. We additionally propose an auxiliary reasoning loss to improve the ‘conceptual grounding’ of the agent. Our empirical validation shows that our approach outperforms strong baselines on the Execution from Dialogue History (EDH) benchmark from the TEACh benchmark.
In this paper, we address the problem of offensive language detection on Twitter, while also detecting the type and the target of the offence. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence and semantic features in the form of word embeddings into a deep learning architecture using a Graph Convolutional Network. Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.
Pre-trained neural Language Models (PTLM), such as CodeBERT, are recently used in software engineering as models pre-trained on large source code corpora. Although adapters are known to facilitate adapting to many downstream tasks compared to fine-tuning the model that require retraining all of the models’ parameters – which owes to the adapters’ plug and play nature and being parameter efficient – their usage in software engineering is not explored.
The PyTorch codebase for DEAP Cache: Deep Eviction Admission and Prefetching for Cache.
Resources for DL
Repo containig summaries we read
GenZoo is a repository that provides implementations of generative models in various frameworks
Paper Implementation of a end-to-end model for jointly learning the scene and facial features of an image for group-level emotion recognition.
This PyTorch repository provides a reliable implementation of a Neural Turing Machine (NTM) for training, evaluating, and visualizing results across Copy, Repeat Copy, Associative Recall, and Priority Sort tasks, with results matching those reported in the paper.
Pytorch implementation of the paper Dynamic Memory Network for Visual and Textual Question Answering
Experiment to test a method to train neural networks inspired by the Forward-Forward Algorithm
In the NeurIPS 2022 SENSORIUM competition, we aimed to enhance the baseline model in the Sensorium+ track for predicting mouse primary visual cortex neuron activity based on natural images and behavioral data.