Mistral just dropped Devstral, a new open-source AI model built to solve real software development issues - not just autocomplete code.

It’s a result between Mistral and All Hands AI. It is small enough to run locally, trained on actual GitHub issues, and scored surprisingly high on one of the toughest dev-focused AI benchmarks out there: SWE-Bench Verified.

And yes - you can try it now, no waitlist, no vendor lock-in.

🔹 Why This Sounds Like the Same Story

We’ve heard this pitch before - from OpenAI, Anthropic, Google, DeepMind, you name it:

  • “Agentic behavior”

  • “Understands full repos”

  • “Fixes bugs, submits pull requests”

  • “Trained on GitHub”

  • “Open-source developer assistant”

So when Mistral shows up with the same words, it’s easy to tune it out.
But something’s different this time.

🔹 SWE-Bench Verified Is the Answer

If all these models sound the same to you - you’re not wrong.
Same words. Same promises.

SWE-Bench Verified is how you cut through it.
If you want to know what’s real, what works, and what’s just marketing - this is where you look.

🔹 What is SWE-Bench Verified?

SWE-Bench is a benchmark created by Princeton University to test whether a language model can actually act like a software engineer.

Not just:

  • "Finish this function"
    But:

  • "Read the issue. Understand the repo. Write the fix. Pass the test."

✅ “Verified” means a human has manually reviewed the model’s pull request and confirmed that the bug was solved correctly.

So when Mistral says Devstral scored 46.8%, they’re saying:

“This model fixed nearly half of real-world GitHub issues in the benchmark - and passed testing.”

That’s a meaningful number. Especially for a model you can run on your own machine.

🔹 What’s Actually New Here

  1. It performs - and it’s small.
    Devstral beats commercial models like GPT-4.1 Mini and Claude 3.5 Haiku on SWE-Bench Verified.

  2. It’s open and local.
    You can download it. Run it. Fork it. No API required.

  3. It’s trained differently.
    Not on code examples - but on real GitHub issues. That’s a step closer to how developers actually work.

🔹 How Devstral Compares on SWE-Bench Verified

Model

SWE-Bench Verified Score

Open Source

Local Use

License

Devstral (Mistral)

46.8%

✅ Yes

✅ Yes (4090 / Mac 32GB)

Apache 2.0

GPT-4.1 Mini (OpenAI)

~37%

❌ No

❌ Cloud-only

Commercial

Claude 3.5 Haiku

~40%

❌ No

❌ Cloud-only

Commercial

Code Llama 70B

~15–20% (est.)

✅ Yes

⚠️ Heavy hardware needed

Custom OSS

🔹 Bottom Line

  • ✅ Available now - download it from Hugging Face:
    http://huggingface.co/mistralai/Devstral-Small-2505

  • 💸 No cost to use - open-source under Apache 2.0, free for personal or commercial use:
    http://www.apache.org/licenses/LICENSE-2.0
    💻 Runs locally - works on an RTX 4090 or Mac with 32GB RAM

  • 📊 Scores 46.8% on SWE-Bench Verified - benchmarked on real GitHub issues

  • 🔧 Built for repo-level problem solving, not just code snippets

  • 🧠 No API, no cloud, no vendor lock-in - just download and go

If you’re working with code and want an AI that shows up ready to help - this is the one to try.
Simple, local, and fully open.

❄️ Frozen Light Team Perspective

Most models in this space still sound the same.
Devstral does too - until you realise two things:

It was tested against real GitHub issues, not made-up examples.
And it’s open, free, and ready to run without permission or pricing plans.

We haven’t run it ourselves yet.
But the fact that it was benchmarked on real data - and made available without restrictions - says something.

This isn’t another demo model.
It’s a signal.

That small, focused models - backed by strong benchmarks and community access - might be the real path forward in AI for devs.

If you care about that space, Devstral’s worth your attention.
Not because of what we’ve seen - but because of how it was shared and who it was built for.

Share Article

Get stories direct to your inbox

We’ll never share your details. View our Privacy Policy for more info.