Home / Features / Run any model / Local LLM inference
Run local LLMs on your laptop with osFoundry
osFoundry runs open-weight LLMs (Llama, Qwen, Mistral, GPT-OSS) locally on Apple Silicon and NVIDIA GPUs through a built-in inference server. Quantised Q4 weights run 7–13B models on a 16 GB consumer GPU; 30B models fit on 24 GB; 70B+ models need an A100/H100 or aggressive quantisation. No tokens billed, no data leaves your machine.
Quick answer
- Built-in local inference server — no Ollama, no llama.cpp setup.
- Apple Silicon (Metal) and NVIDIA (CUDA) supported.
- One-click install for any model in the catalog.
- Models stay loaded across chats — no re-load latency.
Frequently asked questions
Does osFoundry use Ollama or llama.cpp?
osFoundry runs its own inference server. From your perspective it’s just "Install" and the model is ready.
How much RAM do I need?
A Q4 7B model needs ~6 GB. A 13B needs ~10 GB. A 70B Q4 needs ~50 GB.
Can I run multiple local models at once?
Yes — the server hot-loads on demand and unloads idle models to free memory.
Is local inference billed?
No. Local runs on your own hardware and is free.
Related features