Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about low-resource settings due to potential leaking information from Pre-training Phase #18

Open
jianguoz opened this issue Sep 2, 2022 · 0 comments

Comments

@jianguoz
Copy link

jianguoz commented Sep 2, 2022

Hi, I found that CamRest676 is used in the pre-training phase. However, 675 of CamRest-676(dialogs) are already Included in the MultiWOZ training datasets, and CamRest676 is also processed and trained in a multi-task way.

While in your low-resource training (Sec 4.1.4), i.e., we train our model on MultiWOZ 2.0 by varying the percentage of training data, ranging from 1% (∼80 samples) to 20% (∼1600 samples). Although the model did not use the MultiWOZ dataset in the pre-training phase, the model has already potentially still seen a lot dialogs of MultiWOZ through CamRest676, i.e., the model already leaks information during pre-training phase; as such the results of low-resource training maybe not fully correct.

@jianguoz jianguoz changed the title Incorrect results on low-resource settings due to leaking information from Pre-training Phase Question about low-resource settings due to leaking information from Pre-training Phase Sep 2, 2022
@jianguoz jianguoz changed the title Question about low-resource settings due to leaking information from Pre-training Phase Question about low-resource settings due to potential leaking information from Pre-training Phase Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant