Skip to content

audeering/shift

Repository files navigation

SHIFT TTS System

Affective TTS tool for SHIFT Horizon using this phenomenon. Synthesize speech from text or subtitles .srt and overlays it to videos.

Available Voices

Listen to available voices!

Install

virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt

Demo. TTS output saved as out.wav

CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=0 python demo.py

API

Start Flask server

CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=0 python api.py

Inference

The following needs api.py to be already running e.g. on tmux session

Text 2 Speech

# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective

# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav

Image 2 Video

# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg

Video 2 Video

# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4

Examples

Native voice video

Native voice ANBPR video

Same video where Native voice is replaced with English TTS voice with similar emotion

Same video w. Native voice replaced with English TTS

Video Dubbing

Review demo SHIFT

Generate dubbed video:

python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

Joint Application of D3.1 & D3.2

Captions To Video

From an image with caption(s) create a video:

python tts.py --text sample.txt --image assets/image_from_T31.jpg