Home / Glossary / Mixture of Experts
What is Mixture of Experts?
Abbreviation: MoE
Mixture of Experts (MoE) is an LLM architecture where only a subset of parameters is activated per token, giving the capacity of a large model at the inference cost of a smaller one. Mixtral 8x7B and 8x22B are popular MoE models in osFoundry’s catalog.
Detail
Standard transformers activate every parameter for every token. MoE models route tokens through a learned gating layer that picks a small subset of "expert" sub-networks. A Mixtral 8x22B has 176B total parameters but only ~39B are activated per token — inference cost matches a ~40B dense model, but quality matches a much larger model.
MoE models are memory-hungry to host (you need all the experts loaded) but cheap to run per token. Good fit for GPU endpoints with enough VRAM.
Related terms
Related features