Home / Features / Train and fine-tune
Train and fine-tune AI models on osFoundry
Fine-tune Llama, Mistral, or Qwen with LoRA on your data. Quantise for cheap inference. Hot-swap adapters at runtime.
osFoundry lets you fine-tune any open-weight LLM with LoRA on your own data, quantise the result for cheap inference, and hot-swap adapters at runtime — all without leaving the workspace. Training jobs run on your local GPU, in osFoundry cloud, or against your own infrastructure. Models you train are immediately available to Maestro and to every Room App in your workspace.
Quick answer
- LoRA fine-tuning on Llama 3, Mistral, Qwen, and 60+ other base models — UI-driven, no notebook required.
- Three training paths: local GPU, osFoundry cloud, or bring-your-own-server.
- Quantise trained adapters down to Q4/Q5 for cheap inference.
- Hot-swap LoRA adapters per request — no model reload, sub-second switch.
What it is
Most AI platforms either lock you into hosted models or hand you a notebook. osFoundry’s training pipeline is workspace-native: pick a base, point at a dataset (your KB, a public dataset, or an upload), choose LoRA rank, and ship. The trained adapter is registered in your model catalog automatically and routable from Maestro the moment it finishes.
Key capabilities
- LoRA + QLoRA fine-tuning on 60+ open-weight base models.
- Adapter download — pull the .safetensors out of osFoundry to deploy elsewhere.
- Quantisation to Q4_K_M, Q5_K_M, Q6_K, FP16 — convert in one click.
- Hot-swap up to 16 active LoRA adapters on a single base model.
- Train on your knowledge bases, uploaded JSONL/CSV, or any of 250K public datasets.
- Three training paths per job: local GPU, osFoundry cloud, or your own infrastructure.
How to do it in osFoundry
- Pick a base model — Browse /community/models, filter to open-weight (Llama, Mistral, Qwen, Phi, etc.), pick the size that fits your target GPU.
- Point at a dataset — Choose a knowledge base (auto-formatted as instruction pairs), upload a JSONL/CSV, or pick from 250K public datasets indexed in the catalog.
- Choose training config — LoRA rank (8/16/32/64), learning rate, epochs, target modules. Sensible defaults provided; tune from there.
- Pick where to train — Local GPU (free), osFoundry cloud (per-second GPU pricing), or BYO infrastructure (push job to your own cluster).
- Ship the adapter — When training finishes, the adapter is registered in your model catalog automatically. Hot-swap onto a base model endpoint and start routing requests in minutes.
How osFoundry compares
| Capability | osFoundry | Most other tools |
|---|
| Training UI | Workspace-native — no notebook, no command line. | Notebook or CLI required. |
| Adapter export | One-click .safetensors download with training config. | Locked to vendor, or manual export. |
| Where it runs | Local GPU, our cloud, or your own infrastructure. | Single venue, fixed pricing. |
| Routing post-train | Adapter immediately routable from Maestro and Room Apps. | Manual wiring into your app code. |
Use cases
- Customer-support team: Fine-tune Mistral 7B on 18 months of support transcripts. The agent answers in your tone, references your products, and stays on-brand.
- Legal ops: Train Llama 3.1 8B on a labelled contract corpus to redline new contracts in your firm’s style. Stays on-prem; adapter never leaves the workspace.
- Game studio: LoRA-tune Qwen 14B on your IP bible for in-game NPC dialogue. Hot-swap a different LoRA per character to keep voices distinct on one shared base model.
Frequently asked questions
How long does a LoRA fine-tune take on osFoundry?
A 7B model on a 50K-row dataset takes ~30 minutes on a single A100. A 70B model takes ~3 hours. Local M2/M3 Macs handle 7B in ~2 hours.
Can I export the LoRA adapter from osFoundry?
Yes — every trained adapter is downloadable as .safetensors and includes the training config. No lock-in.
Does osFoundry support full fine-tuning, not just LoRA?
LoRA + QLoRA are the recommended paths today. Full fine-tuning of >7B models is on the roadmap; for now, BYO infrastructure if you need it.
What datasets can I train on?
Your knowledge bases (auto-formatted as instruction pairs), uploaded JSONL/CSV/parquet, or 250K public datasets indexed from HuggingFace.
How much does training cost?
Local training is free (your hardware). Cloud training is billed per-second of GPU time at the same rates as inference endpoints. A 7B LoRA on A100 is roughly $2–3 per training run; 70B is $20–30.
Can I resume an interrupted training job?
Yes — checkpoints are saved every N steps (configurable). Resumption picks up from the last checkpoint, not from scratch.
Pricing
Local training: free (your hardware). Cloud training: per-second GPU billing at the same rates as inference endpoints (A10 / A100 / H100). Adapter storage is metered as workspace file storage.
Related features