What is QLoRA?

Abbreviation: QLoRA

QLoRA is LoRA fine-tuning where the base model is quantised to 4-bit during training, cutting VRAM requirements roughly in half. osFoundry supports QLoRA as an alternative training path when your GPU is tight on memory.

Detail

Standard LoRA keeps the base model in full precision during training, which still needs significant VRAM for large models. QLoRA quantises the base to 4-bit using NF4 quantisation, letting you fine-tune 70B models on a single 24 GB consumer GPU.

Quality is comparable to standard LoRA in most cases, with slightly higher training perplexity. The adapter itself is unaffected — you can deploy the trained adapter onto a full-precision base later.

Related terms

lora
quantization
parameters

Related features

lora-fine-tuning
train-and-fine-tune