A-Z Index:
R
Published:
Updated:

LoRA (Low-Rank Adaptation)

LoRA (Low-Rank Adaptation)

"LoRA" (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) technique that freezes the pre-trained weights of a Large Language Model and injects trainable rank decomposition matrices into each layer of the Transformer architecture.

By drastically reducing the number of trainable parameters (often by 99% or more), LoRA lowers GPU VRAM requirements and enables fine-tuning on consumer-grade hardware, making it a foundational technology for open-source AI developers.

Key Takeaways (30-Second Summary)
  • Minimal VRAM Overhead: Allows fine-tuning of multi-billion parameter models on single GPU setups, bypassing the need for massive computing clusters.
  • Compact File Size: The output adapter weights require only megabytes (MB) of storage rather than the gigabytes (GB) required for full-model weights.
  • Zero Inference Latency: Winnings from the low-rank adapters can be mathematically merged back into the base model weights prior to deployment, eliminating inference latency.

The Mathematical Principle of Low-Rank Decomposition

During adaptation, weights change by a delta value (ΔW). LoRA assumes that this update has a low "intrinsic dimension," meaning the parameters can be compressed into a lower-rank subspace. Instead of training ΔW directly, LoRA factorizes it into two smaller matrices, A and B, so that ΔW = W_0 + B x A. This factorization reduces the active parameter footprint from millions to thousands.

"LoRA" in Action: Dialogue Example

Developers preparing model training scripts

Developer A: "Training a 70B model with full parameters requires four H100 GPUs, which we don't have budget for."

Developer B: "Let's use **LoRA** instead. We freeze the base model and only optimize the adapter. We can complete the training on a single RTX 4090 GPU overnight."

Comparing Full Fine-Tuning vs. LoRA

Feature Full Fine-Tuning LoRA (低ランク適応)
Trainable Parameters 100% of model parameters. Typically less than 0.1% of parameters.

Community Best Practices and Sharing

When sharing adapters on repositories like Hugging Face, document the exact base model used (e.g., Llama-3-8B-Instruct) and the rank (r) and alpha values. This allows other users to merge your weights correctly into their local host instances.

About "LoRA (Low-Rank Adaptation)"

This page provides the English definition and usage guide for the professional term "LoRA (Low-Rank Adaptation)." If you have any suggestions, feedback, or corrections regarding our terminology articles, please feel free to reach out via our contact form.