Listen

Description

With demand for spoken-word audio on the up, 80% of media leaders are investing more in digital audio this year. If you're making the move into spoken-word audio, one of the main things you'll need to consider is human vs AI audio.

While publishers like Zetland, The Economist, and Harvard Business Review have seen success with human voice-over, publishers including Berlingske, The Japan Times, and Media24 are engaging audiences with synthetic speech. Some, like The Washington Post, use a mixture of both.

Human voices can be more engaging, but this comes at a huge cost - one that's often unviable. With the quality of synthetic speech catching up to, and in some ways surpassing, human voice over, many find that the balance has tipped into AI's favor. Especially when it comes to audio articles.

In this article, I'm going to compare human vs text-to-speech audio production in terms of quality, cost, and time, to help you make an informed decision.

Quality

Human-read audio is generally considered more personable and engaging than AI audio because people can add more emphasis and emotion into their speech. They can also manually check pronunciations and make thoughtful decisions on delivery.

"With audio, it's even more personal than text, and we see more opportunities there because it's a more intimate way of consuming journalism." - Ernst-Jan Pfauth, The Correspondent CEO

However, if you haven't had training in narration or voice acting, delivering clear and engaging speech yourself can be difficult. The sound can also be compromised by your recording environment and equipment.

Hiring a voice actor and professional recording studio will give the highest-quality results, but this can be time-consuming and expensive. You may also have issues with achieving a consistent brand voice, because you will be relying on the availability of the voice actor.

Another drawback with human-read audio is a lack of flexibility. Switching between multiple languages or voices means hiring and managing multiple speakers. This compromises your ability to choose the best voice for each piece of content you're producing. It's also impractical to edit human-read audio after publishing.

"You can't have somebody producing a new audio version of one article every time it's updated. But with [...] synthetic language, there's hardly any additional cost to production at all." - Andy Webb, head of product for the voice and artificial intelligence team at the BBC

AI audio offers more consistency and reliability, as well as flexibility. With BeyondWords, you can easily update what's being said and switch between 500+ voices across 130+ language locales.

There's even the option to create custom AI voices. This means you can clone your own voice, or the voice of a person on your team, to give audio a more personal touch. Or, you can work with a voice actor to create a unique and engaging brand voice.

"The synthetic voice we developed with BeyondWords handles local names better than anything we've heard before. It's much more engaging to listen to a voice that sounds like our brand." - Kelly Anderson, Deputy Site Editor at News24

The quality of your text-to-speech audio will depend largely on the AI voice itself. As well as having the option to create a custom voice, our users get access to hyper-realistic premade voices.

"[AI voices] are of such good quality that it's kind of hard to distinguish [them] from human voices. Particularly for news articles, they are a really good solution to the audio problem." - Paddy Logue, digital editor at The Irish Times

But the voice isn't the only thing that matters. AI voices sound better on BeyondWords because we use natural language processing algorithms to convert your text into speech synthesis markup language (SSML). This reduces the risk of pronunciation errors and allows for custom text-to-speech rules.

Cost

Human-read audio is traditionally expensive to produce. Of course, the cost will vary significantly depending on s...