The Vision and Language Group, part of ACM IIT Roorkee Chapter, is a student run group that aims to foster a research-centric Deep Learning Community at IIT Roorkee. We regularly hold open discussions on various DL, CV, NLP papers presented in the latest conferences/journals and also on various general topics pertaining to the Deep Learning field. These discussions are open for anyone to join in.
Apart from this, the group members are also involved in various research based projects, sometimes in collaboration with other professors, with the ultimate goal to bring forth a positive impact in a sub-field we are interested in and also aim for some of the tier-1 conferences.
We are constantly looking for new collaborations, so do contact us if you find our work interesting. Also you can follow us up on Facebook and Twitter to receive updates about our activities.
Large language models have been shown to memorize significant portions of their training data, which they can reproduce when appropriately prompted. This work investigates the impact of simple pruning techniques on this behavior. Our findings reveal that pruning effectively reduces the extent of memorization in LLMs, demonstrating its potential as a foundational approach for mitigating membership inference attacks.
Diffusion and GAN models have demonstrated remarkable success in synthesizing high-quality images propelling them into various real-life applications across different domains. However, it has been observed that they exhibit spectral biases that impact their ability to generate certain frequencies and makes it pretty straightforward to distinguish real images from fake ones. In this blog we analyze these models and attempt to explain the reason behind these biases.
Positional encoding has become an essential element in transformer models, addressing their fundamental property of permutation invariance and allowing them to understand sequential relationships within data. This blog post examines positional encoding techniques, emphasizing their vital importance in traditional transformers and their use with 2D data in Vision Transformers (ViT). We explore two contemporary methods—ALiBi (Attention with Linear Biases) and RoPE (Rotary Position Embedding)—analyzing their unique approaches to tackling the challenge of sequence length extrapolation during inference, a significant issue for transformers. Additionally, we compare these methods’ fundamental similarities and differences, assessing their impact on transformer performance across various fields. We also look into how interpolation strategies have been utilized to enhance the extrapolation capabilities of these methods; we conclude this blog with an empirical comparison of ALiBi and RoPE in Vision Transformers. To the best of our knowledge, this represents the first direct comparison of these positional encoding methods with those used in standard Vision Transformers.
The rapid advancements in large language models (LLMs) have revolutionized natural language processing, creating an increased need for efficient, task-specific fine-tuning methods. Traditional fine-tuning of LLMs involves updating a large number of parameters, which is computationally expensive and memory-intensive. Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters. However, while LoRA reduces the number of trainable parameters, LoRA modules still create significant storage challenges. We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts, with only the two inner matrices being trainable. This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA, addressing both computational and storage efficiency in LLM fine-tuning.
Urban planning faces a critical challenge in balancing city-wide infrastructure needs with localized demographic preferences, particularly in rapidly developing regions. Although existing approaches typically focus on top-down optimization or bottom-up community planning, only some frameworks successfully integrate both perspectives. Our methodology employs a two-tier approach: First, a deterministic solver optimizes basic infrastructure requirements in the city region. Second, four specialized planning agents, each representing distinct sub-regions, propose demographic-specific modifications to a master planner. The master planner then evaluates and integrates these suggestions to ensure cohesive urban development. We validate our framework using a newly created dataset comprising detailed region and sub-region maps from three developing cities in India, focusing on areas undergoing rapid urbanization. The results demonstrate that this hybrid approach enables more nuanced urban development while maintaining overall city functionality.
We aimed to research on removing specific classes of data from a pre-trained LLM model by using Adapter Based approaches and model pruning
A study on enhancing LLM performance in solving math problems through hints, while examining the impact of adversarial prompts.
LoRA-Unlearn introduces a new Machine Unlearning paradigm, using LoRA to fine-tune sparse models for class unlearning.
Analysis of importance of different attention mechanism in Image steganography within an auto encoder framework.
Experiment to test a method to train neural networks inspired by the Forward-Forward Algorithm
In the NeurIPS 2022 SENSORIUM competition, we aimed to enhance the baseline model in the Sensorium+ track for predicting mouse primary visual cortex neuron activity based on natural images and behavioral data.
The PyTorch codebase for DEAP Cache: Deep Eviction Admission and Prefetching for Cache.
Resources for DL
GenZoo is a repository that provides implementations of generative models in various frameworks
Paper Implementation of a end-to-end model for jointly learning the scene and facial features of an image for group-level emotion recognition.
This PyTorch repository provides a reliable implementation of a Neural Turing Machine (NTM) for training, evaluating, and visualizing results across Copy, Repeat Copy, Associative Recall, and Priority Sort tasks, with results matching those reported in the paper.
Pytorch implementation of the paper Dynamic Memory Network for Visual and Textual Question Answering
Repo containig summaries we read