Home / Glossary / Hot-swap LoRA

What is Hot-swap LoRA?

Hot-swap LoRA is the ability to switch between LoRA adapters on a deployed base model at inference time without reloading the model. osFoundry hot-swaps up to 16 active adapters per base model on a single GPU endpoint.

Detail

A deployed LLM endpoint usually serves one model. Hot-swapping means you can host one base model (e.g. Llama 3.1 70B on an A100) and route per-request to different LoRA adapters — sub-second switching, no reload latency.

This collapses the cost of serving N specialised model variants from N model deployments down to one. Each user, persona, or domain can have its own fine-tuned adapter on a shared base.

How osFoundry approaches Hot-swap LoRA

osFoundry’s GPU endpoints support up to 16 hot-swappable adapters per base. Adapters trained inside osFoundry are auto-registered; external adapters can be uploaded.

Related terms

lora
qlora
self-hosting

Related features

hot-swap-lora
gpu-endpoint
lora-fine-tuning