You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your insightful work of AutoCompressor on compressing sequences provides wonderful thoughts on the topic of processing long windows. Recently, I've been trying to reproduce some of your results (mainly about Table 1. Sec. 4.1 in your paper) and got a few questions:
You've kindly provided the 6K/8K split version of 2B tokens from the Pile for training and evaluation, as well as the checkpoint named as AutoCompressor-2.7b-6k. If I understand it correctly, the checkpoint here is exactly the model "AutoCompressor" in Table 1 and it is trained and evaluated with the 8K split version data. Am I right?
Given the assumption above, I evaluated the model using the checkpoint and the data of 8K sequences with the results listed below. I reused your script train.sh and set segments_per_substep=${SEG:-4} and training_substeps=${SUB:-4}. And I got the following results, which had a gap from the reported data.
Domain
6k model 6k→2k
Book3
10.37
FreeLaw
6.44
Github
3.94
Wikipedia
8.86
Average (exp of mean NLL)
6.95
Reported in paper
5.93
I'm not sure if I misunderstood some of the evaluation settings, and I'd like to know whether you may share the script for reproducing results with other context lengths (128,512,2048) in Table 1. Your attention to this matter is highly appreciated. Thanks a lot!
The text was updated successfully, but these errors were encountered:
Thank you for raising this issue. I think we forgot to add the right eval scripts to the repo. It is separate from the training script, because in eval we hold the last segment fixed while varying the preceding context. Our development codebase and the public repo have diverged a bit, so we'll work on making this public -- please bear with us!
Your insightful work of AutoCompressor on compressing sequences provides wonderful thoughts on the topic of processing long windows. Recently, I've been trying to reproduce some of your results (mainly about Table 1. Sec. 4.1 in your paper) and got a few questions:
You've kindly provided the 6K/8K split version of 2B tokens from the Pile for training and evaluation, as well as the checkpoint named as AutoCompressor-2.7b-6k. If I understand it correctly, the checkpoint here is exactly the model "AutoCompressor" in Table 1 and it is trained and evaluated with the 8K split version data. Am I right?
Given the assumption above, I evaluated the model using the checkpoint and the data of 8K sequences with the results listed below. I reused your script train.sh and set
segments_per_substep=${SEG:-4}
andtraining_substeps=${SUB:-4}
. And I got the following results, which had a gap from the reported data.I'm not sure if I misunderstood some of the evaluation settings, and I'd like to know whether you may share the script for reproducing results with other context lengths (128,512,2048) in Table 1. Your attention to this matter is highly appreciated. Thanks a lot!
The text was updated successfully, but these errors were encountered: