Home / Glossary / Hot-swap LoRA
What is Hot-swap LoRA?
Hot-swap LoRA is the ability to switch between LoRA adapters on a deployed base model at inference time without reloading the model. osFoundry hot-swaps up to 16 active adapters per base model on a single GPU endpoint.
Detail
A deployed LLM endpoint usually serves one model. Hot-swapping means you can host one base model (e.g. Llama 3.1 70B on an A100) and route per-request to different LoRA adapters — sub-second switching, no reload latency.
This collapses the cost of serving N specialised model variants from N model deployments down to one. Each user, persona, or domain can have its own fine-tuned adapter on a shared base.
How osFoundry approaches Hot-swap LoRA
osFoundry’s GPU endpoints support up to 16 hot-swappable adapters per base. Adapters trained inside osFoundry are auto-registered; external adapters can be uploaded.
Related terms
Related features