MidnightTokensdeveloper portal
Sign In
Unit Study Document

LoRA: Low-Rank Adaptation Explained

7 min readβ€’Visual explainer included

Parameter-Efficient Fine Tuning (PEFT) πŸ“‰

Full fine-tuning updates all parameters in a pre-trained neural network. For a 7B parameter model, this requires storing gigabytes of gradients and optimizer states, demanding multiple enterprise A100 GPUs. LoRA (Low-Rank Adaptation) solves this by keeping the original model weights frozen and adding small adapter layers.

Rank Decomposition Math: During adaptation, we decompose the weight update matrix Ξ”W (dimension d Γ— k) into two low-rank matrices A (dimension d Γ— r) and B (dimension r Γ— k), where rank r β‰ͺ d. This reduces the number of trainable weights by over 99%.

Training Adapter Layers with PEFT

Here is how to initialize and wrap a pre-trained base model with a LoRA configuration using the HuggingFace PEFT library:

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

# 1. Load frozen base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8B")

# 2. Define LoRA adapter configuration
lora_config = LoraConfig(
 r=8, # Rank of adapter matrices (typically 8, 16, or 32)
 lora_alpha=32, # Scaling factor for adapter weights
 target_modules=["q_proj", "v_proj"], # Target specific attention modules
 lora_dropout=0.05,
 bias="none",
 task_type="CAUSAL_LM"
)

# 3. Inject adapter layers
peft_model = get_peft_model(base_model, lora_config)
peft_model.print_trainable_parameters()
# Only ~0.1% of the model parameters are now trainable!

Quantized LoRA (QLoRA)

While LoRA reduces trainable weights, we still need to load the full 7B base model into GPU memory. QLoRA reduces VRAM usage further by quantizing the base model weights to 4-bit NormalFloat (NF4) precision. This compresses the model memory from ~16GB to ~4.5GB, allowing developers to fine-tune 7B models on standard consumer graphics cards (like a single RTX 4090 or Google Colab T4!).

Fast Drill

Active Recalls

Card 1 of 2
Question

What is frozen during LoRA training?

Tap card to flip
Answer

The original base model weight parameters are frozen (unchanged).

Mastery: 0%
Knowledge Check

Quiz Practice

Question 1 of 1
How does QLoRA save memory compared to standard LoRA?

Chapter Scratchpad

Auto-saves immediately

Active Recall Cards

Review core concepts before doing the quiz

Fast Drill

Active Recalls

Card 1 of 2
Question

What is frozen during LoRA training?

Tap card to flip
Answer

The original base model weight parameters are frozen (unchanged).

Mastery: 0%

AI Study Buddy

Always online

Hi! I'm Spooky, your study buddy! Let's learn together.