Groups Similar Look up By Text Browse About



Similar articles
Article Id Title Prob Score Similar Compare
135374 TECHCRUNCH 2019-5-15:
Google’s Translatotron converts one spoken language to another, no text involved
1.000 Find similar Compare side-by-side
135714 THEVERGE 2019-5-17:
Google’s prototype AI translator translates your tone as well as your words
0.349 0.596 Find similar Compare side-by-side
135583 THENEXTWEB 2019-5-16:
Google’s new AI can help you speak another language in your own voice
0.774 0.578 Find similar Compare side-by-side
135401 ENGADGET 2019-5-15:
Google's Translatotron can translate speech in the speaker's voice
0.964 0.563 Find similar Compare side-by-side
135315 VENTUREBEAT 2019-5-15:
Google’s Translatotron is an end-to-end model that mimics human voices
0.941 0.543 Find similar Compare side-by-side
135607 VENTUREBEAT 2019-5-16:
Alexa speech normalization AI reduces errors by up to 81%
0.444 Find similar Compare side-by-side
135067 VENTUREBEAT 2019-5-13:
Amazon Alexa scientists retrain an English-language AI model on Japanese
0.417 Find similar Compare side-by-side
135235 VENTUREBEAT 2019-5-14:
IBM’s AI performs state-of-the-art broadcast news captioning
0.414 Find similar Compare side-by-side
135316 ARSTECHNICA 2019-5-15:
No, someone hasn’t cracked the code of the mysterious Voynich manuscript
0.002 0.388 Find similar Compare side-by-side
135633 VENTUREBEAT 2019-5-16:
Google’s Live Transcribe is getting sound events and transcription saving
0.362 Find similar Compare side-by-side
135002 THEVERGE 2019-5-13:
Use this cutting-edge AI text generator to write stories, poems, news articles, and more
0.361 Find similar Compare side-by-side
135594 THENEXTWEB 2019-5-16:
Designing products for people with disabilities has never been so important
0.356 Find similar Compare side-by-side
135618 THEVERGE 2019-5-16:
Android’s Live Transcribe will let you save transcriptions and show ‘sound events’
0.337 Find similar Compare side-by-side
135527 ENGADGET 2019-5-16:
Android's Live Transcribe gets sound alerts and transcript saving
0.317 Find similar Compare side-by-side
135724 ENGADGET 2019-5-17:
I listened to a Massive Attack record remixed by a neural network
0.313 Find similar Compare side-by-side
135363 THEVERGE 2019-5-15:
AI translation boosted eBay sales more than 10 percent
0.306 Find similar Compare side-by-side
135038 THEVERGE 2019-5-13:
How to stop Google from keeping your voice recordings
0.300 Find similar Compare side-by-side
135678 THEVERGE 2019-5-17:
This AI-generated Joe Rogan fake has to be heard to be believed
0.290 Find similar Compare side-by-side
134953 VENTUREBEAT 2019-5-13:
Adding audio data helps AI navigate 3D mazes
0.286 Find similar Compare side-by-side
135312 TECHCRUNCH 2019-5-15:
7 accessibility-focused startups snag grants from Microsoft
0.285 Find similar Compare side-by-side
135279 TECHCRUNCH 2019-5-14:
Google’s latest app, Rivet, uses speech processing to help kids learn to read
0.284 Find similar Compare side-by-side
135495 VENTUREBEAT 2019-5-16:
Microsoft makes Google’s BERT NLP model better
0.279 Find similar Compare side-by-side
134891 TECHREPUBLIC 2019-5-12:
Five tips for controlling procrastination
0.278 Find similar Compare side-by-side
135032 THEVERGE 2019-5-13:
How Silicon Valley’s successes are fueled by an underclass of ‘ghost workers’
0.276 Find similar Compare side-by-side
135308 ENGADGET 2019-5-15:
China is blocking Wikipedia in every language
0.275 Find similar Compare side-by-side

1

ID: 135374

URL: https://techcrunch.com/2019/05/15/googles-translatotron-converts-one-spoken-language-to-another-no-text-involved/

Date: 2019-05-15

Google’s Translatotron converts one spoken language to another, no text involved

Every day we creep a little closer to Douglas Adams famous and prescient Babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speakers voice. Translatotron, as the project is called, is the culmination of several years of related work, though its still very much an experiment. Googles researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting. Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isnt perfect; each step has types of errors it is prone to, and these can compound one another. Furthermore, its not really how multilingual people translate in their own heads, as testimony about their own thought processes suggests. How exactly it works is impossible to say with certainty, but few would say that they break down the text and visualize it changing to a new language, then read the new text. Human cognition is frequently a guide for how to advance machine learning algorithms. Spectrograms of source and translated speech. The translation, let us admit, is not the best. But it sounds better! To that end, researchers began looking into converting spectrograms, detailed frequency breakdowns of audio, of speech in one language directly to spectrograms in another. This is a very different process from the three-step one, and has its own weaknesses, but it also has advantages. One is that, while complex, it is essentially a single-step process rather than multi-step, which means, assuming you have enough processing power, Translatotron could work quicker. But more importantly for many, the process makes it easy to retain the character of the source voice, so the translation doesnt come out robotically, but with the tone and cadence of the original sentence. Naturally this has a huge impact on expression, and someone who relies on translation or voice synthesis regularly will appreciate that not only what they say comes through, but how they say it. Its hard to overstate how important this is for regular users of synthetic speech. Googles Project Euphonia wants to make voice recognition work for people with speech impairmentsThe accuracy of the translation, the researchers admit, is not as good as the traditional systems, which have had more time to hone their accuracy. But many of the resulting translations are (at least partially) quite good, and being able to include expression is too great an advantage to pass up. In the end, the team modestly describes their work as a starting point demonstrating the feasibility of the approach, though its easy to see that it is also a major step forward in an important domain. The paper describing the new technique was published on Arxiv, and you can browse samples of speech, from source to traditional translation to Translatotron, at this page. Just be aware that these are not all selected for the quality of their translation, but serve more as examples of how the system retains expression while getting the gist of the meaning.