Podcast recording and modifying platform Podcastle is now becoming a member of different corporations within the AI-powered, text-to-speech race by releasing its personal AI mannequin known as Asyncflow v1.0. An API for builders can even be obtainable, permitting them to instantly combine the text-to-speech mannequin of their apps.
Because of the brand new mannequin, the corporate is ready to supply greater than 450 AI voices that may narrate your textual content. The startup mentioned that it developed the know-how and mannequin in such a method that its coaching and inference prices are low, giving it a bonus in opposition to rivals.
With the transfer, Podcastle joins a variety of startups, together with ElevenLabs, Speechify, and WellSaid, which have developed know-how and AI fashions to transform any sort of textual content right into a voice clip narrated by AI. This know-how spans use instances like advertising and marketing, commercial, content material creation, schooling, and company coaching.
Podcastle’s founder, Arto Yeritsyan, informed TechCrunch that the corporate had all the time needed to construct a text-to-speech mannequin, however the price of coaching and knowledge necessities for that have been very excessive.
“We wanted to build a robust text-to-speech model since our inception. However, the costs of development were very high. Thanks to recent large language model developments, we were able to reach a breakthrough last year to get to a place where we could build a high-quality voice model without needing a ton of data,” Yeritsyan mentioned.
The corporate was additionally aided in its efforts by its $13.5 million Collection A fundraise final 12 months.
Yeritsyan mentioned that whereas Podcastle prices round $40 per 500 minutes of text-to-speech conversion, ElevenLabs prices $99 for a similar.
Podcastle’s voice cloning characteristic is getting an improve, as properly, to create a faster course of for coaching.
Earlier, the coaching course of concerned studying roughly 70 completely different sentences. Now, it simply wants a couple of seconds of recording from you to create a clone of your voice. The brand new course of additionally used Podcastle’s Magic Mud AI, which was launched final 12 months, to enhance audio recording high quality.
In our testing, the voice created with the brand new course of sounded a bit robotic, although it mimicked our tone. The corporate mentioned that, over time, it can enhance the characteristic. Plus, you possibly can practice completely different samples of your voice to get completely different outcomes.
Podcastle mentioned that other than prices, having instruments for audio, video, podcasts, and AI-powered narration beneath one redesigned web site will give it an edge over rivals. Yeritsyan mentioned that whereas the vast majority of the customers use Podcastle to work on audio content material, video is catching as much as it as properly.