Mistral Small 3

High performance in a 24b open-source model

Screenshots

Hunter's comment

Mistral Small 3 is the most efficient and versatile model of Mistral. Pre-trained and instructed version, Apache 2.0, 24B, 81% MMLU, 150 token/s. No synthetic data so great base for anything reasoning.
Check out Mistral Small 3 – it's setting a new benchmark for "small" LLMs (under 70B)! 🚀 This 24B parameter model from Mistral AI offers performance comparable to much larger models, but with a focus on efficiency.

So here's the key features:

· Powerful & Efficient: State-of-the-art results with low latency (150 tokens/s).
· Locally Deployable: Runs on a single RTX 4090 or a 32GB RAM MacBook (once quantized).
· Knowledge-Dense: Packs a lot of knowledge into a compact size.
· Versatile: Great for fast conversational agents, low-latency function calling, creating subject matter experts (via fine-tuning), and local inference (for privacy).

It's also open-source under the Apache 2.0 License! Today we’re introducing Mistral Small 3, a latency-optimized 24B-parameter model released under t