Quick news drop for you all: ElevenLabs just released version 3 of their text-to-speech model, and guess what? They’re claiming to be the most expressive TTS out there. Now, I don’t know if that means "world’s most expressive" (they’re definitely flexing, though), but one thing’s for sure: it’s definitely more expressive than anything they’ve dropped before. I couldn’t wait to test it out, so here’s what I found!
What’s the Buzz?
Here’s the scoop: this new model is still in “research preview” mode, but they’re already letting you dive in. Every time it generates a voice, it gives you two options to pick from - so you get a little “choose your own adventure” for your TTS needs. The cool part? They’re still fine-tuning it because, well, the competition's getting fierce. This is a good time to get your hands dirty and start playing around with it.
Cool Features to Check Out
Here’s a rundown of some things that caught my eye:
-
Emotions with Delivery: Yep, you can now add emotion to the voice delivery using brackets. Super easy to use and gives a lot more character to the voice.
-
Multiple Speed Dialogs: You can now have two characters talking at different speeds, which is great for creating lifelike conversations.
-
70 Languages: Yup, they’ve got 70 languages under their belt, so you can take this model global (or at least regional, depending on where you’re using it).
-
Early Access API: The API’s not fully live yet, but if you want in early, you can reach out to them and get a shot at testing it out.
Special Deal Alert!
If you’re interested, there’s an 80% discount until June. Yeah, you read that right - 80%. So if you want to start generating some serious audio content without breaking the bank, now’s the time to jump in. After June, I imagine the price is going to bump up, so don’t wait!
Test Run: Let's Hear It!
I did a quick test myself and, well, the results speak for themselves. Take a listen to these two versions generated by the model on the youtube attached above
There’s a slight difference in how they sound - Version 2 came out better, more natural.
But that’s the beauty of the model: you get to pick the version that works best for you.
What’s Next?
This model is still in development, but I’m seriously hyped about what it’s already capable of. Start experimenting with it now, and let me know what you come up with. Whether you’re generating lifelike audio content, building interactive bots, or just having fun with different characters, the possibilities are endless.