LoRA (Low-Rank Adaptation)

"LoRA" (Low-Rank Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) technique that freezes the pre-trained weights of a Large Language Model and injects trainable rank decomposition matrices into each layer of the Transformer architecture.
By drastically reducing the number of trainable parameters (often by 99% or more), LoRA lowers GPU VRAM requirements and enables fine-tuning on consumer-grade hardware, making it a foundational technology for open-source AI developers.
- Minimal VRAM Overhead: Allows fine-tuning of multi-billion parameter models on single GPU setups, bypassing the need for massive computing clusters.
- Compact File Size: The output adapter weights require only megabytes (MB) of storage rather than the gigabytes (GB) required for full-model weights.
- Zero Inference Latency: Winnings from the low-rank adapters can be mathematically merged back into the base model weights prior to deployment, eliminating inference latency.
The Mathematical Principle of Low-Rank Decomposition
During adaptation, weights change by a delta value (ΔW). LoRA assumes that this update has a low "intrinsic dimension," meaning the parameters can be compressed into a lower-rank subspace. Instead of training ΔW directly, LoRA factorizes it into two smaller matrices, A and B, so that ΔW = W_0 + B x A. This factorization reduces the active parameter footprint from millions to thousands.
"LoRA" in Action: Dialogue Example
Developer A: "Training a 70B model with full parameters requires four H100 GPUs, which we don't have budget for."
Developer B: "Let's use **LoRA** instead. We freeze the base model and only optimize the adapter. We can complete the training on a single RTX 4090 GPU overnight."
Comparing Full Fine-Tuning vs. LoRA
| Feature | Full Fine-Tuning | LoRA (低ランク適応) |
|---|---|---|
| Trainable Parameters | 100% of model parameters. | Typically less than 0.1% of parameters. |
Community Best Practices and Sharing
When sharing adapters on repositories like Hugging Face, document the exact base model used (e.g., Llama-3-8B-Instruct) and the rank (r) and alpha values. This allows other users to merge your weights correctly into their local host instances.
About "LoRA (Low-Rank Adaptation)"
This page provides the English definition and usage guide for the professional term "LoRA (Low-Rank Adaptation)." If you have any suggestions, feedback, or corrections regarding our terminology articles, please feel free to reach out via our contact form.