Home / Features / Train and fine-tune

Train and fine-tune AI models on osFoundry

Fine-tune Llama, Mistral, or Qwen with LoRA on your data. Quantise for cheap inference. Hot-swap adapters at runtime.

osFoundry lets you fine-tune any open-weight LLM with LoRA on your own data, quantise the result for cheap inference, and hot-swap adapters at runtime — all without leaving the workspace. Training jobs run on your local GPU, in osFoundry cloud, or against your own infrastructure. Models you train are immediately available to Maestro and to every Room App in your workspace.

Quick answer

LoRA fine-tuning on Llama 3, Mistral, Qwen, and 60+ other base models — UI-driven, no notebook required.
Three training paths: local GPU, osFoundry cloud, or bring-your-own-server.
Quantise trained adapters down to Q4/Q5 for cheap inference.
Hot-swap LoRA adapters per request — no model reload, sub-second switch.

What it is

Most AI platforms either lock you into hosted models or hand you a notebook. osFoundry’s training pipeline is workspace-native: pick a base, point at a dataset (your KB, a public dataset, or an upload), choose LoRA rank, and ship. The trained adapter is registered in your model catalog automatically and routable from Maestro the moment it finishes.

Key capabilities

LoRA + QLoRA fine-tuning on 60+ open-weight base models.
Adapter download — pull the .safetensors out of osFoundry to deploy elsewhere.
Quantisation to Q4_K_M, Q5_K_M, Q6_K, FP16 — convert in one click.
Hot-swap up to 16 active LoRA adapters on a single base model.
Train on your knowledge bases, uploaded JSONL/CSV, or any of 250K public datasets.
Three training paths per job: local GPU, osFoundry cloud, or your own infrastructure.

How to do it in osFoundry

Pick a base model — Browse /community/models, filter to open-weight (Llama, Mistral, Qwen, Phi, etc.), pick the size that fits your target GPU.
Point at a dataset — Choose a knowledge base (auto-formatted as instruction pairs), upload a JSONL/CSV, or pick from 250K public datasets indexed in the catalog.
Choose training config — LoRA rank (8/16/32/64), learning rate, epochs, target modules. Sensible defaults provided; tune from there.
Pick where to train — Local GPU (free), osFoundry cloud (per-second GPU pricing), or BYO infrastructure (push job to your own cluster).
Ship the adapter — When training finishes, the adapter is registered in your model catalog automatically. Hot-swap onto a base model endpoint and start routing requests in minutes.

How osFoundry compares

Capability	osFoundry	Most other tools
Training UI	Workspace-native — no notebook, no command line.	Notebook or CLI required.
Adapter export	One-click .safetensors download with training config.	Locked to vendor, or manual export.
Where it runs	Local GPU, our cloud, or your own infrastructure.	Single venue, fixed pricing.
Routing post-train	Adapter immediately routable from Maestro and Room Apps.	Manual wiring into your app code.

Use cases

Customer-support team: Fine-tune Mistral 7B on 18 months of support transcripts. The agent answers in your tone, references your products, and stays on-brand.
Legal ops: Train Llama 3.1 8B on a labelled contract corpus to redline new contracts in your firm’s style. Stays on-prem; adapter never leaves the workspace.
Game studio: LoRA-tune Qwen 14B on your IP bible for in-game NPC dialogue. Hot-swap a different LoRA per character to keep voices distinct on one shared base model.

Frequently asked questions

How long does a LoRA fine-tune take on osFoundry?

A 7B model on a 50K-row dataset takes ~30 minutes on a single A100. A 70B model takes ~3 hours. Local M2/M3 Macs handle 7B in ~2 hours.

Can I export the LoRA adapter from osFoundry?

Yes — every trained adapter is downloadable as .safetensors and includes the training config. No lock-in.

Does osFoundry support full fine-tuning, not just LoRA?

LoRA + QLoRA are the recommended paths today. Full fine-tuning of >7B models is on the roadmap; for now, BYO infrastructure if you need it.

What datasets can I train on?

Your knowledge bases (auto-formatted as instruction pairs), uploaded JSONL/CSV/parquet, or 250K public datasets indexed from HuggingFace.

How much does training cost?

Local training is free (your hardware). Cloud training is billed per-second of GPU time at the same rates as inference endpoints. A 7B LoRA on A100 is roughly $2–3 per training run; 70B is $20–30.

Can I resume an interrupted training job?

Yes — checkpoints are saved every N steps (configurable). Resumption picks up from the last checkpoint, not from scratch.

Pricing

Local training: free (your hardware). Cloud training: per-second GPU billing at the same rates as inference endpoints (A10 / A100 / H100). Adapter storage is metered as workspace file storage.