#AI News #Mistral 26 May. 2025

Mistral Drops Devstral: Agentic LLM for Software Engineering

By Frozen Light Team

Mistral just dropped Devstral, a new open-source AI model built to solve real software development issues - not just autocomplete code.

It’s a result between Mistral and All Hands AI. It is small enough to run locally, trained on actual GitHub issues, and scored surprisingly high on one of the toughest dev-focused AI benchmarks out there: SWE-Bench Verified.

And yes - you can try it now, no waitlist, no vendor lock-in.

🔹 Why This Sounds Like the Same Story

We’ve heard this pitch before - from OpenAI, Anthropic, Google, DeepMind, you name it:

“Agentic behavior”
“Understands full repos”
“Fixes bugs, submits pull requests”
“Trained on GitHub”
“Open-source developer assistant”

So when Mistral shows up with the same words, it’s easy to tune it out.
But something’s different this time.

🔹 SWE-Bench Verified Is the Answer

If all these models sound the same to you - you’re not wrong.
Same words. Same promises.

SWE-Bench Verified is how you cut through it.
If you want to know what’s real, what works, and what’s just marketing - this is where you look.

🔹 What is SWE-Bench Verified?

SWE-Bench is a benchmark created by Princeton University to test whether a language model can actually act like a software engineer.

Not just:

"Finish this function"
But:
"Read the issue. Understand the repo. Write the fix. Pass the test."

✅ “Verified” means a human has manually reviewed the model’s pull request and confirmed that the bug was solved correctly.

So when Mistral says Devstral scored 46.8%, they’re saying:

“This model fixed nearly half of real-world GitHub issues in the benchmark - and passed testing.”

That’s a meaningful number. Especially for a model you can run on your own machine.

🔹 What’s Actually New Here

It performs - and it’s small.
Devstral beats commercial models like GPT-4.1 Mini and Claude 3.5 Haiku on SWE-Bench Verified.
It’s open and local.
You can download it. Run it. Fork it. No API required.
It’s trained differently.
Not on code examples - but on real GitHub issues. That’s a step closer to how developers actually work.

🔹 How Devstral Compares on SWE-Bench Verified

Model	SWE-Bench Verified Score	Open Source	Local Use	License
Devstral (Mistral)	46.8%	✅ Yes	✅ Yes (4090 / Mac 32GB)	Apache 2.0
GPT-4.1 Mini (OpenAI)	~37%	❌ No	❌ Cloud-only	Commercial
Claude 3.5 Haiku	~40%	❌ No	❌ Cloud-only	Commercial
Code Llama 70B	~15–20% (est.)	✅ Yes	⚠️ Heavy hardware needed	Custom OSS

🔹 Bottom Line

✅ Available now - download it from Hugging Face:
http://huggingface.co/mistralai/Devstral-Small-2505
💸 No cost to use - open-source under Apache 2.0, free for personal or commercial use:
http://www.apache.org/licenses/LICENSE-2.0
💻 Runs locally - works on an RTX 4090 or Mac with 32GB RAM
📊 Scores 46.8% on SWE-Bench Verified - benchmarked on real GitHub issues
🔧 Built for repo-level problem solving, not just code snippets
🧠 No API, no cloud, no vendor lock-in - just download and go

If you’re working with code and want an AI that shows up ready to help - this is the one to try.
Simple, local, and fully open.

❄️ Frozen Light Team Perspective

Most models in this space still sound the same.
Devstral does too - until you realise two things:

It was tested against real GitHub issues, not made-up examples.
And it’s open, free, and ready to run without permission or pricing plans.

We haven’t run it ourselves yet.
But the fact that it was benchmarked on real data - and made available without restrictions - says something.

This isn’t another demo model.
It’s a signal.

That small, focused models - backed by strong benchmarks and community access - might be the real path forward in AI for devs.

If you care about that space, Devstral’s worth your attention.
Not because of what we’ve seen - but because of how it was shared and who it was built for.

Share Article