OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models
Authors

Abstract
The advent of large language models (LLMs) has revolutionized natural language processing, enabling unprecedented capabilities in understanding and generating human-like text. However, the computational cost and convergence times associated with fine-tuning these models remain significant challenges. Low-Rank Adaptation (LoRA) has emerged as a promising method to mitigate these issues by introducing efficient fine-tuning techniques with a reduced number of trainable parameters. In this paper, we present OLoRA, an enhancement to the LoRA method that leverages orthonormal matrix initialization through QR decomposition. OLoRA significantly accelerates the convergence of LLM training while preserving the efficiency benefits of LoRA, such as the number of trainable parameters and GPU memory footprint.
Technical Overview: What is OLoRA?
OLoRA (Orthonormal Low-Rank Adaptation) is an advancement over the standard LoRA (Low-Rank Adaptation) method for efficiently fine-tuning large language models. It addresses key limitations in convergence speed and optimization stability while maintaining LoRA's parameter efficiency benefits.
The Core Innovation
OLoRA leverages orthonormal matrix initialization through QR decomposition to create a more favorable optimization landscape. Unlike standard LoRA which initializes adaptation matrices randomly, OLoRA directly approximates the final weight matrix using orthonormal bases derived from the pre-trained weights.
Standard LoRA
Adapts pre-trained weight matrix W
using:
Where B and A are initialized randomly and with zeros.
OLoRA
Decomposes W using QR factorization:
Wadapted = W + QrRr
Where Qr contains first r columns of Q (orthonormal), and Rr contains first r rows of R.
Mathematical Foundation
OLoRA's effectiveness stems from the preservation of spectral properties during adaptation. By using orthonormal bases derived from the original weights, OLoRA ensures that the adaptation stays within a well-conditioned subspace of the parameter space.
For a pre-trained weight matrix W ∈ ℝm×n:
- QR Decomposition: W = QR, where Q ∈ ℝm×m is orthogonal and R ∈ ℝm×n is upper triangular
- Low-Rank Approximation: Wr = QrRr, where Qr ∈ ℝm×r and Rr ∈ ℝr×n
- Adaptation: Wadapted = W + QrRr where Qr and Rr are trainable
Key Benefits
Faster Convergence
OLoRA demonstrates consistently faster training convergence across multiple model sizes and tasks.
Performance Gains
In majority of test cases (53 out of 60), OLoRA achieves higher final performance compared to standard LoRA.
Minimal Overhead
The QR decomposition is a one-time operation per layer during initialization, with negligible computational cost compared to training.
Compatibility
Works with existing LoRA implementations with minimal changes—just changing the initialization method.
Key Findings
Across five diverse LLMs (from 1.1B to 7B parameters) and six NLP benchmarks, OLoRA consistently outperformed standard LoRA in both convergence speed and final accuracy. The most significant improvements were observed on complex reasoning tasks like Arc-Challenge and OpenBookQA.
The orthonormal initialization appears to guide the optimization process toward more favorable parameter regions, resulting in models that generalize better to unseen data while requiring no additional parameters compared to standard LoRA.
Implementation
Implementation Examples
Quick Start
Basic example of how to use OLoRA with Hugging Face's PEFT library
import torch
from peft import LoraConfig, get_peft_model
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTConfig, SFTTrainer
from datasets import load_dataset
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
dataset = load_dataset("imdb", split="train[:1%]")
# Just specify init_lora_weights="olora" to use OLoRA
lora_config = LoraConfig(
init_lora_weights="olora"
)
peft_model = get_peft_model(model, lora_config)
training_args = SFTConfig(dataset_text_field="text", max_seq_length=128)
trainer = SFTTrainer(
model=peft_model,
train_dataset=dataset,
tokenizer=tokenizer,
)
trainer.train()
peft_model.save_pretrained("olora-opt-350m")
Using the Model
Loading and using an OLoRA model after training
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
# Load the saved OLoRA model
olora_model = PeftModel.from_pretrained(model, "olora-opt-350m")
# Now you can use it for inference
inputs = tokenizer("Hello, I am a", return_tensors="pt")
outputs = olora_model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Converting OLoRA to LoRA
For using multiple adapters simultaneously
base_model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
# Initialize with OLoRA
olora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
init_lora_weights="olora" # Use OLoRA initialization
)
olora_model = get_peft_model(base_model, olora_config)
# Save the untrained model
init_path = "path/to/save/untrained/model"
olora_model.save_pretrained(init_path)
# Train the model
# ... your training code here ...
# Save and convert to conventional LoRA
olora_model.save_pretrained(
"final_model_path",
path_initial_model_for_weight_conversion=init_path
)
With Quantization
Using OLoRA with 4-bit quantization
import torch
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# Configure quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=quantization_config,
device_map="auto"
)
# Prepare for training
model = prepare_model_for_kbit_training(model)
# Configure OLoRA
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
init_lora_weights="olora",
bias="none",
)
# Get PEFT model
peft_model = get_peft_model(model, lora_config)
How to Cite
@misc{büyükakyüz2024olora, title={OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models}, author={Kerim Büyükakyüz}, year={2024}, eprint={2406.01775}, archivePrefix={arXiv}, primaryClass={cs.CL} }