Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquire on data of Table 1 #20

Open
void-b583x2-NULL opened this issue Apr 2, 2024 · 1 comment
Open

Inquire on data of Table 1 #20

void-b583x2-NULL opened this issue Apr 2, 2024 · 1 comment

Comments

@void-b583x2-NULL
Copy link

Your insightful work of AutoCompressor on compressing sequences provides wonderful thoughts on the topic of processing long windows. Recently, I've been trying to reproduce some of your results (mainly about Table 1. Sec. 4.1 in your paper) and got a few questions:

  • You've kindly provided the 6K/8K split version of 2B tokens from the Pile for training and evaluation, as well as the checkpoint named as AutoCompressor-2.7b-6k. If I understand it correctly, the checkpoint here is exactly the model "AutoCompressor" in Table 1 and it is trained and evaluated with the 8K split version data. Am I right?

  • Given the assumption above, I evaluated the model using the checkpoint and the data of 8K sequences with the results listed below. I reused your script train.sh and set segments_per_substep=${SEG:-4} and training_substeps=${SUB:-4}. And I got the following results, which had a gap from the reported data.

Domain 6k model 6k→2k
Book3 10.37
FreeLaw 6.44
Github 3.94
Wikipedia 8.86
Average (exp of mean NLL) 6.95
Reported in paper 5.93

I'm not sure if I misunderstood some of the evaluation settings, and I'd like to know whether you may share the script for reproducing results with other context lengths (128,512,2048) in Table 1. Your attention to this matter is highly appreciated. Thanks a lot!

@CodeCreator
Copy link
Member

Hi @void-b583x2-NULL!

Thank you for raising this issue. I think we forgot to add the right eval scripts to the repo. It is separate from the training script, because in eval we hold the last segment fixed while varying the preceding context. Our development codebase and the public repo have diverged a bit, so we'll work on making this public -- please bear with us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants