Use it as a karaoke machine for any sources without needing to convert beforehand.
Any feedback or pull requests are appreciated.
Clone this repo.
- cd to the project root directory and run
python live.py
Command-line arguments:
-i or --in: Input device
-o or --out: Output device
log_level: (Optional) Logging level, e.g. info, debug, warning. Default: INFO
model_name: (Optional) The name of the model to use for separation. Default: UVR-MDX-NET-Inst_Main
model_file_dir: (Optional) Directory to cache model files in. Default: /tmp/audio-separator-models/
use_cuda: (Optional) Flag to use Nvidia GPU via CUDA for separation if available. Default: False
GPU mode is recommended to perform the inference with lower latency.
Make sure to install the onnxruntime-gpu:
pip install onnxruntime-gpu
CPU / GPU | model_run() speed | window_size | overlap_size | initial_wait_size | block_size | sample_rate | Theoretical latency |
---|---|---|---|---|---|---|---|
i7-12700K & RTX 3090 | 0.04s | 20 | 1 | 0 | 4000 | 48000 | 1.75s |
i7-12700K | 0.82s | 20 | 1 | 16 | 4000 | 48000 | 3.08s |
Parameters | Suggested values | Description |
---|---|---|
window_size | 16-32 | Processing window for inference, recommended at least 1.5 seconds |
overlap_size | 1-4 | How many frames to keep before and after the processing window to reduce artifacts |
initial_wait | True/False | Use True for CPU, False for GPU |
initial_wait_size | 16 | Initial frames to buffer for slower CPUs, duration should be longer than time needed to execute model_run() |
blocksize | 4000 | The rate to call the callback function of sounddevice |
use_threading | True/False | Use True for CPU, False for GPU (Introduces additional ~3s delay) |
- Real-time audio separation using any of the MDX-NET single model.
- Approximately 1-5 seconds latency depending on hardware.
- No ensemble support yet.
- karaokenerds - Author of python-audio-separator, a python package based on Ultimate Vocal Remover GUI by Anjok07.
- facebookresearch - Author of denoiser. Copied code for implementing real-time streaming via sounddevice.