The Vision and Language Group, part of ACM IIT Roorkee Chapter, is a student run group that aims to foster a research-centric Deep Learning Community at IIT Roorkee. We regularly hold open discussions on various DL, CV, NLP papers presented in the latest conferences/journals and also on various general topics pertaining to the Deep Learning field. These discussions are open for anyone to join in.
Apart from this, the group members are also involved in various research based projects, sometimes in collaboration with other professors, with the ultimate goal to bring forth a positive impact in a sub-field we are interested in and also aim for some of the tier-1 conferences.
We are constantly looking for new collaborations, so do contact us if you find our work interesting. Also you can follow us up on Facebook and Twitter to receive updates about our activities.
In this paper, we extend the study of concept ablation within pre-trained models as introduced in ‘Ablating Concepts in Text-to-Image Diffusion Models’. Our work focuses on reproducing the results achieved by the different variants of concept ablation proposed through predefined metrics. We also introduce a novel variant of concept ablation—trademark ablation. This variant combines the principles of memorization and instance ablation to tackle the nuanced influence of proprietary or branded elements in model outputs. Further, our research contributions include an observational analysis of the model’s limitations. Moreover, we investigate the model’s behavior in response to ablation leakage-inducing prompts, which aim to indirectly ablate concepts, revealing insights into the model’s resilience and adaptability. We also observe the model’s performance degradation on images generated by concepts far from its target ablation concept.
The Common Objects in Context (COCO) dataset has been instrumental in benchmarking object detectors over the past decade. Like every dataset, COCO contains subtle errors and imperfections stemming from its annotation procedure. With the advent of high-performing models, we ask whether these errors of COCO are hindering its utility in reliably benchmarking further progress. In search for an answer, we inspect thousands of masks from COCO (2017 version) and uncover different types of errors such as imprecise mask boundaries, non-exhaustively annotated instances, and mislabeled masks. Due to the prevalence of COCO, we choose to correct these errors to maintain continuity with prior research. We develop COCO-ReM (Refined Masks), a cleaner set of annotations with visibly better mask quality than COCO-2017. We evaluate fifty object detectors and find that models that predict visually sharper masks score higher on COCO-ReM, affirming that they were being incorrectly penalized due to errors in COCO-2017. Moreover, our models trained using COCO-ReM converge faster and score higher than their larger variants trained using COCO-2017, highlighting the importance of data quality in improving object detectors. With these findings, we advocate using COCO-ReM for future object detection research. Our dataset is available at https://cocorem.xyz
In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point’s membership in a model’s training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being ‘fit’ to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model’s certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. We propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design.
In this paper, we revisited the core design ideas of stateof-the-art deep inpainting networks. We propose an intuitive and effective inpainting architecture that augments the powerful comodulated StyleGAN2 generator with the high receptiveness ability of FFC to achieve equally good performance on both textures and structures.
We aimed to research on removing specific classes of data from a pre-trained LLM model by using Adapter Based approaches and model pruning
A study on enhancing LLM performance in solving math problems through hints, while examining the impact of adversarial prompts.
LoRA-Unlearn introduces a new Machine Unlearning paradigm, using LoRA to fine-tune sparse models for class unlearning.
Analysis of importance of different attention mechanism in Image steganography within an auto encoder framework.
Experiment to test a method to train neural networks inspired by the Forward-Forward Algorithm
In the NeurIPS 2022 SENSORIUM competition, we aimed to enhance the baseline model in the Sensorium+ track for predicting mouse primary visual cortex neuron activity based on natural images and behavioral data.
The PyTorch codebase for DEAP Cache: Deep Eviction Admission and Prefetching for Cache.
Resources for DL
GenZoo is a repository that provides implementations of generative models in various frameworks
Paper Implementation of a end-to-end model for jointly learning the scene and facial features of an image for group-level emotion recognition.
This PyTorch repository provides a reliable implementation of a Neural Turing Machine (NTM) for training, evaluating, and visualizing results across Copy, Repeat Copy, Associative Recall, and Priority Sort tasks, with results matching those reported in the paper.
Pytorch implementation of the paper Dynamic Memory Network for Visual and Textual Question Answering
Repo containig summaries we read