Using speech to text in combination with text translation and text to speech, we built a tool not unlike Google's new babel platform. Call someone and they will hear what you say in their language 😮.
Both parties call the service's Nexmo number which will connect them to one another. However, both parties do not hear each other directly. They hear a translated version of one another thus enabling them both to speak and hear the other side in their preferred language. The way that works is that the audio stream of the speech on one caller's side will be transcribed and translated by the Microsoft Translator Speech API and the resulting text will be spoken to the other person. A German could thus hear a British person speaking German and respond in German. The response would be heard by the British person in English.
Python, Microsoft Translator Speech API, Nexmo Voice API, Ngrok.
virtualenv venv
source venv/bin/activate
vim requirements.txt
pip install -r requirements.txt
In another terminal window run:
ngrok http 5000
- Create a Nexmo account
- Install Nexmo CLI
- Create an application
nexmo app:create "Babelfish" http://your_url.ngrok.io/ncco http://your_url.ngrok.io/event --keyfile private.key
- Purchase a number
nexmo number:buy --country_code GB --confirm
- Link your number and application
nexmo link:app <number> <application_id>
- Create a Microsoft Azure account
- Search for Cognitive Services in the search bar
- Click
add
and create a newTranslator Speech API
- Once created, click on the new translator service, then click on
Keys
under theResource Management
header - Copy
KEY 1
for use in your application
- A Nexmo number with voice capability and a Nexmo application
- When setting up the Nexmo Application use a Ngrok forwarding URL for both the Event URL and the Answer URL:
- Event URL: http://abc123.ngrok.io/event
- Answer URL: http://abc123.ngrok.io/ncco
- Microsoft’s Translator Speech API key
- Follow the instructions in the
secrets.py
andconfig.py
files.
Create config.py
with the following contents:
HOSTNAME="your_host.ngrok.io" # Do not add HTTP
CALLER="<your_number>"
LANGUAGE1 = 'de-DE' # the caller's language
VOICE1 = 'Marlene' # a Nexmo voice for the caller's language
LANGUAGE2 = 'en-US' # the other person's language
VOICE2 = 'Kimberly' # a Nexmo voice for the other person's language
Create secrets.py
with the following contents:
NEXMO_API_KEY = "<your-api-key>"
NEXMO_API_SECRET = "<your-api-secret>"
NEXMO_NUMBER = "+447512345678"
NEXMO_APPLICATION_ID = "<nexmo-application-id>"
NEXMO_PRIVATE_KEY = '''-----BEGIN PRIVATE KEY-----
<your-private-key>
-----END PRIVATE KEY-----'''
# You will have to sign up for a free Microsoft account to use the Microsoft Translator Speech API: http://docs.microsofttranslator.com/speech-translate.html
MICROSOFT_TRANSLATION_SPEECH_CLIENT_SECRET = "<your-api-key>"
Run python main.py
then have two (or more) people call your Nexmo number. When one person speaks, the rest will hear in another language. To change the languages used, edit config.py
.