Do you want to create a voice-over for your game or mobile application? The length of the final audio is 10 minutes. How can you do it? If you can wait a few days and pay hundreds of dollars – hire an actor and rent a studio. Remote freelancers will save you some cash, but you are still looking at 2-3 days of work. What if you urgently need to convert text to voice and still want to save hundreds of dollars? Learn about a solution we recently discovered. This method was tested as we were developing our games and augmented reality applications.  Rest assured you can get quality voice over in a matter of seconds, and at a lesser cost than a cup of coffee.

Of course, the ideal option is to hire an actor and a studio. Why? In this case, you select the suitable voice, timbre, work with intonations and emotions and are guaranteed to get a high-quality result. And what if a recording studio is a distance away and the voice actor was hand-picked? Alternatively, what if you need to voice a text in a foreign language? It might take anywhere from 1 to 3 days (if you’re lucky), and values at least $200 Cdn but being more realistic: $500-1000. You would not save much by converting a few minutes. We used to pay $100 for 3 minutes and it was not the highest cost.

Hence, what are the options? Should you wait 2-3 days and pay hundreds of dollars for 3 minutes of voice over? Or is there another way? It’s time to learn about online converters of the new generation.

The first text to voice converters appeared half a century ago and offered a limited number of voices with a synthetic sound and predictably were called robotic. The famous one among them was the voice that Stephen Hawking used for most of his life. Until recently, Google monthly received more than 3,000 keyword searches “Stephen Hawking text to speech voice”. Yet the use of this voice is limited. Such a voice is suitable for a robot in a computer game, or for simulating interaction with a computer. Using it for a realistic character, or for a voice-over text would sound peculiar. For this reason- until today- companies had to spend thousands of dollars on actors and studios to voice a few minutes of the character’s speech in a game.

What has changed? The breakthrough transpired a year ago when Google presented its voice assistant. Perhaps you saw the video – it became viral. The assistant called a hair salon and made an appointment for a haircut. On another call, it reserved a table at a restaurant. In both cases, the individuals on the other end of the phone did not notice they were talking with a computer. And how could you notice, if the voice sounded real? With pauses, hmm and uh, that is characteristic of our ordinary speech. 

Meanwhile, Saturn studio was developing an augmented reality application, with a virtual assistant, intended to communicate with the users and answer their questions. We wanted the assistant to speak English, Chinese and Spanish, depending on the user’s choice. And the length of all the phrases was about 3 minutes. The Google voice generator allowed us to convert text to speech. However, surprise! We couldn’t download the generated audio files! We could listen to the audio, but there was no way to download it. That was weird. Another unpleasant discovery was that adding effects such as pauses and whispers was only possible if you figured out how to add a code. That was a bummer.

We decided to find other solutions. Amazon, Microsoft, IBM; all of them were in competition, which was a good sign. We hoped one of them will offer a similar service with a friendlier user interface. We were disappointed with the various glitches. One service had a large library of languages, but the voices were not realistic or top quality, in other cases we ran into issues with user interfaces. Each service had advantages, however, the disadvantages nullified the result. Other solutions were outperformed by the Big Four. Fewer voices and languages, not quite useful effects, unreasonably lofty fees.

Apart from them stood the Kukarella converter. This TTS app provided access to all languages and voices available on Google, Amazon, IBM and Microsoft platforms. Instead of competing with voice generators, Kukarella developed an aggregator model. As a result, from one platform we had access to almost 300 voices across more than 50 languages ​​and accents. We were also able to download and share the generated audio files.

We experimented and created a 3-minute voice-over in 3 languages. The entire method took 2 minutes and cost 14 cents. Fantastic! Then we experimented with effects. It was simple; the effect was added by clicking a button. We tested pauses, whispers, emphasis, the softness of voices and even tried to “regulate” the height of the virtual speaker. This test cost 15 cents more but took as little as the other voice-over – less than 2 minutes.

Consequently, 7 cents for 1 minute of the typical voice-over, and 15 for a voice with effects. A minute in any studio would cost us at least $2 and the payment is customarily for a full hour. Kukarella was more frugal by at least 15 times. Great savings! Given that we spent only 2 minutes of our time, the result was even more impressive. 

As for the quality, a few pros were introduced to our virtual assistant. Do you suppose someone noticed they were listening to a computer voice? No one! That’s remarkable for a few minutes of work and 15 cents.

The results you get with the new-gen online voice generators are exceptional. Yet can we affirm that solutions like Kukarella will compete with the industry of professional voice-over actors? Yes and no. For small paragraphs of text – yes, it’s a reality now. For large documents, when you are dealing with emotions, nuances, semantic accents; probably not today. Nevertheless, the idea this will occur in the approaching years has a good probability.

The voice synthesis capabilities Google demonstrated only a year ago, are becoming commonplace today. If you often create voice-over for your games or applications,  it’s time to discover online services such as Kukarella. They will save hundreds of hours and thousands of dollars of your budget. The fact that the future lies with voice synthesizers is tough to dispute. Automation replaces routine labor. And this happens everywhere; even in such subtle fields as human speech.