-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Record clips into a single voice file for Piper #7
Comments
Hello Bugsbane. I'm not an expert on this topic, but I thought I'd respond to some of your questions. Hopefully you'll find my answers useful. From what I've been able to figure out after a few weeks of trial and error, there's a few steps to training your own voice to use with piper.
Piper Recording Studio only does the 1st and I guess also the second from the command line. I too found the instructions for training your own voice to be really complicated and missing important steps. The youtube video at the top of the training guide was definitely helpful, but even that had some missing info. The most important bit of info is that none of this will build correctly if you don't have the correct python version. Even in the YouTube video it's never mentioned. Tucked away in the comments it mentions you need Python 3.9-3.11. I can tell you from experience, this is wrong since some of the modules just don't build on 3.11 (or at least they didn't for me). I settled on using Python 3.10 and got everything built. I completely missed this since I watch YouTube videos on my TV, which don't show the comments. The other bit of information that I would think to be important, is some kind of benchmarking information. I'm sure anyone who endeavors to do this, would be curious about "how long will this take". The youtube video also glosses over this. Hold on to your hat because I can tell you it's going to take a very long time. First, if you don't have a REALLY fast GPU with a LOT of RAM, just don't bother trying to train on your local system. If you want to try training on a fast system using CPU only, also don't bother. If you want your training done quickly, use a cloud system built for this purpose. I have one system with a supported GPU, it has 6GB of RAM and 4864 graphics cores (system is 8 core/16 threads i7, 64GB RAM). It's not a real powerhouse compared to the huge gamer GPUs available these days. Using this GPU I was able to get the following performance:
I've been doing spot checks on my currently running 600 samples training and the results aren't great. It's clear that I probably need to complete the 1,150 voice samples suggested by piper-recording-studio. Actually, I feel like I need even more for a high quality voice. Maybe closer to 2,000 samples. It's all trial and error. Either way, I'll keep plugging away, as I'm learning a lot along the way. Hope my experience was helpful. |
I'll add one more comment. At least with the docker config, your voice samples aren't sent to some mysterious repo. They are sent to your local directory that you specify when you start the container. |
Right now, Piper Recording Studio seems to give texts to read and record the audio, but there doesn't seem any simple path for then turning that into a voice that Piper can actually use. I read the Piper doc about how to train a new voice and... well, it pretty much made my eyes bleed.
I have some actor friends who could do some really fun voices for Piper, like demon or pixie voices.
I would like to be able to open up Piper Recording Studio, click "New voice", have them read out a bunch of texts and be given a file that I can edit, share or upload to Piper (in my Home Assistant) to use as a new voice.
Currently all samples seem to be uploaded to some mysterious repo somewhere for purposes unknown. I'm happy with the CC0 licensing, but if samples are being used to train TTS then I really don't want to upload a bunch of semi/non-human sounding voice clips there!
Please allow recording a session of audio clips which are then wrapped into a single voice file (eg archive) that Piper/HA can upload and use.
Thank you!
The text was updated successfully, but these errors were encountered: