Question about low-resource settings due to potential leaking information from Pre-training Phase #18

jianguoz · 2022-09-02T00:09:32Z

Hi, I found that CamRest676 is used in the pre-training phase. However, 675 of CamRest-676(dialogs) are already Included in the MultiWOZ training datasets, and CamRest676 is also processed and trained in a multi-task way.

While in your low-resource training (Sec 4.1.4), i.e., we train our model on MultiWOZ 2.0 by varying the percentage of training data, ranging from 1% (∼80 samples) to 20% (∼1600 samples). Although the model did not use the MultiWOZ dataset in the pre-training phase, the model has already potentially still seen a lot dialogs of MultiWOZ through CamRest676, i.e., the model already leaks information during pre-training phase; as such the results of low-resource training maybe not fully correct.

The text was updated successfully, but these errors were encountered:

jianguoz changed the title ~~Incorrect results on low-resource settings due to leaking information from Pre-training Phase~~ Question about low-resource settings due to leaking information from Pre-training Phase Sep 2, 2022

jianguoz changed the title ~~Question about low-resource settings due to leaking information from Pre-training Phase~~ Question about low-resource settings due to potential leaking information from Pre-training Phase Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about low-resource settings due to potential leaking information from Pre-training Phase #18

Question about low-resource settings due to potential leaking information from Pre-training Phase #18

jianguoz commented Sep 2, 2022 •

edited

Loading

Question about low-resource settings due to potential leaking information from Pre-training Phase #18

Question about low-resource settings due to potential leaking information from Pre-training Phase #18

Comments

jianguoz commented Sep 2, 2022 • edited Loading

jianguoz commented Sep 2, 2022 •

edited

Loading