Mistral just dropped Devstral, a new open-source AI model built to solve real software development issues - not just autocomplete code.
It’s a result between Mistral and All Hands AI. It is small enough to run locally, trained on actual GitHub issues, and scored surprisingly high on one of the toughest dev-focused AI benchmarks out there: SWE-Bench Verified.
And yes - you can try it now, no waitlist, no vendor lock-in.
🔹 Why This Sounds Like the Same Story
We’ve heard this pitch before - from OpenAI, Anthropic, Google, DeepMind, you name it:
-
“Agentic behavior”
-
“Understands full repos”
-
“Fixes bugs, submits pull requests”
-
“Trained on GitHub”
-
“Open-source developer assistant”
So when Mistral shows up with the same words, it’s easy to tune it out.
But something’s different this time.
🔹 SWE-Bench Verified Is the Answer
If all these models sound the same to you - you’re not wrong.
Same words. Same promises.
SWE-Bench Verified is how you cut through it.
If you want to know what’s real, what works, and what’s just marketing - this is where you look.
🔹 What is SWE-Bench Verified?
SWE-Bench is a benchmark created by Princeton University to test whether a language model can actually act like a software engineer.
Not just:
-
"Finish this function"
But: -
"Read the issue. Understand the repo. Write the fix. Pass the test."
✅ “Verified” means a human has manually reviewed the model’s pull request and confirmed that the bug was solved correctly.
So when Mistral says Devstral scored 46.8%, they’re saying:
“This model fixed nearly half of real-world GitHub issues in the benchmark - and passed testing.”
That’s a meaningful number. Especially for a model you can run on your own machine.
🔹 What’s Actually New Here
-
It performs - and it’s small.
Devstral beats commercial models like GPT-4.1 Mini and Claude 3.5 Haiku on SWE-Bench Verified. -
It’s open and local.
You can download it. Run it. Fork it. No API required. -
It’s trained differently.
Not on code examples - but on real GitHub issues. That’s a step closer to how developers actually work.
🔹 How Devstral Compares on SWE-Bench Verified
Model |
SWE-Bench Verified Score |
Open Source |
Local Use |
License |
Devstral (Mistral) |
46.8% |
✅ Yes |
✅ Yes (4090 / Mac 32GB) |
Apache 2.0 |
GPT-4.1 Mini (OpenAI) |
~37% |
❌ No |
❌ Cloud-only |
Commercial |
Claude 3.5 Haiku |
~40% |
❌ No |
❌ Cloud-only |
Commercial |
Code Llama 70B |
~15–20% (est.) |
✅ Yes |
⚠️ Heavy hardware needed |
Custom OSS |
🔹 Bottom Line
-
✅ Available now - download it from Hugging Face:
http://huggingface.co/mistralai/Devstral-Small-2505 -
💸 No cost to use - open-source under Apache 2.0, free for personal or commercial use:
http://www.apache.org/licenses/LICENSE-2.0
💻 Runs locally - works on an RTX 4090 or Mac with 32GB RAM -
📊 Scores 46.8% on SWE-Bench Verified - benchmarked on real GitHub issues
-
🔧 Built for repo-level problem solving, not just code snippets
-
🧠 No API, no cloud, no vendor lock-in - just download and go
If you’re working with code and want an AI that shows up ready to help - this is the one to try.
Simple, local, and fully open.
❄️ Frozen Light Team Perspective
Most models in this space still sound the same.
Devstral does too - until you realise two things:
It was tested against real GitHub issues, not made-up examples.
And it’s open, free, and ready to run without permission or pricing plans.
We haven’t run it ourselves yet.
But the fact that it was benchmarked on real data - and made available without restrictions - says something.
This isn’t another demo model.
It’s a signal.
That small, focused models - backed by strong benchmarks and community access - might be the real path forward in AI for devs.
If you care about that space, Devstral’s worth your attention.
Not because of what we’ve seen - but because of how it was shared and who it was built for.