Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech-to-Speech task prompt #32

Open
ehosseiniasl opened this issue Jul 23, 2024 · 6 comments
Open

Speech-to-Speech task prompt #32

ehosseiniasl opened this issue Jul 23, 2024 · 6 comments

Comments

@ehosseiniasl
Copy link

https://github.com/OpenMOSS/AnyGPT/blame/6404dbafccc10943be6bf6e24a4b99b3a6545501/anygpt/src/m_utils/prompter.py#L45

Hello,
Is this line correct? Is this for speech-to-speech conversation?
In that case, isn't this the correct prompt:

Speech-Response-Speech': '{speech} Please interpret the user\'s voice commands, provide text responses, and generate corresponding voice replies
@JunZhan2000
Copy link
Collaborator

Hello, part of the prompt in this file was used for debugging. I suggest you refer to this place https://github.com/OpenMOSS/AnyGPT/blame/6404dbafccc10943be6bf6e24a4b99b3a6545501/anygpt/src/m_utils/prompter.py#L113

So actually for voice commands and voice replies, we use the prompt of 'Speech-Instruction'

@ehosseiniasl
Copy link
Author

thanks. Did you have direct speech response generation (without text response generation) for base or chat model?
which speech response tasks are included in instruction tuning?

@ehosseiniasl
Copy link
Author

ehosseiniasl commented Jul 24, 2024

using Speech-Instruction on chat model, response is as bellow. to_modality=speech
Could you please explain what is the first line? : <-Res-> Gmarin misway"- How beautiful you look today!
does the model first generates text reply, then speech, even if output modality is speech only?

response:
 :  <-Res-> Gmarin misway"- How beautiful you look today!
  [AnyGPT] "Guhmyayayay!" - How beautiful you look today!  <sosp> <🗣️691> <🗣️691> <🗣️60> <🗣️868> <🗣️868> <🗣️906> <🗣️316> <🗣️1015> <🗣️965> <🗣️512> <🗣️512> <🗣️223> <🗣️223> <🗣️689> <🗣️35> <🗣️35> <🗣️35> <🗣️962> <🗣️57> <🗣️943> <🗣️699> <🗣️1> <🗣️118> <🗣️118> <🗣️118>

@ehosseiniasl
Copy link
Author

does the prompt include user speech transcription? the sentence after <-Res-> is the transcription of speech instruction I provided

@JunZhan2000
Copy link
Collaborator

does the prompt include user speech transcription? the sentence after <-Res-> is the transcription of speech instruction I provided

Hello, we provide some training data samples and related descriptions, please refer to https://github.com/OpenMOSS/AnyGPT?tab=readme-ov-file#pretraining-and-sft

@JunZhan2000
Copy link
Collaborator

In the voice dialogue mode, the user provides voice commands, the model recognizes the text commands, generates text replies, and finally generates the voice of the reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants