configs for Hyena Wikitext103 experiments #28

xiaobo-guo · 2023-06-23T04:21:58Z

Your work is excellent! I am trying to follow your work and facing some problems. I wonder if you may share the config for the wiki103 dataset of the Hyena. I try to conduct experiments with 125-slim but the test perplexity is higher than the reported result (about 21 with hyena). And I am wondering whether the removal of flash-atten will influence the result or not.

Zymrael · 2023-06-23T18:34:50Z

Can you share the config? Wikitext is quite sensitive to a few hyperparameters. Flash attention will not affect the result for Hyena.

xiaobo-guo · 2023-06-23T18:59:40Z

Thanks for your response.

I attach the config file
config.txt

Zymrael · 2023-06-23T19:01:23Z

You should set dropouts to 0.2 as a first step. After you get to sub 19 ppl you will be in tuning range.

xiaobo-guo · 2023-06-23T19:05:21Z

Thank you, shall I also set the order to be 3 in the Heyna layer?

sustcsonglin · 2023-06-28T03:42:00Z

Can you share the config? Wikitext is quite sensitive to a few hyperparameters. Flash attention will not affect the result for Hyena.

Could you please put the configs you used in configs/experiment/wt103? That would be super helpful!

sustcsonglin · 2023-06-29T13:48:10Z

Thank you, shall I also set the order to be 3 in the Heyna layer?

Did you reproduce the 19 ppl result using dropout=0.2? I still get 22

xiaobo-guo · 2023-06-29T15:31:08Z

Thank you, shall I also set the order to be 3 in the Heyna layer?

Did you reproduce the 19 ppl result using dropout=0.2? I still get 22

I set the dropout to 0.2 and the order to 3 and get about 20 but still can not get the reported result

Zymrael · 2023-06-29T15:41:09Z

You can look at this config for an independent reproduction that gets to sub 19. Let me know if after this you still have issues with the loss being too high, and I'll rerun experiments in the new codebase.

radarFudan · 2023-06-30T06:42:37Z

Thank you, shall I also set the order to be 3 in the Heyna layer?

Did you reproduce the 19 ppl result using dropout=0.2? I still get 22

I set the dropout to 0.2 and the order to 3 and get about 20 but still can not get the reported result

Question:
Did you change the attain_layer_indx? It seems in your attached config, there is attention layer at layer 1 and 8 (which is inherited from the base.yaml)

radarFudan · 2023-07-07T02:29:54Z

You can look at this config for an independent reproduction that gets to sub 19. Let me know if after this you still have issues with the loss being too high, and I'll rerun experiments in the new codebase.

Thanks for the helpful reference. However, I checked that repo and the released [log] from S5 (https://wandb.ai/jimmysmith1919/S5_ICL/reports/Hyena-red-and-Hyena-S5-blue-on-WikiText-103--Vmlldzo0MTkwODEx?accessToken=pk0zw5w75uo1s4zkn3kh7koum902t4q2yzbm28xk0olzzgxuskoq0g1iyauixlob) which shows that Hyena with test perplexity 19.094.

It would be very helpful if you can share the detailed configuration of Hyena on wikitext-103.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs for Hyena Wikitext103 experiments #28

configs for Hyena Wikitext103 experiments #28

xiaobo-guo commented Jun 23, 2023

Zymrael commented Jun 23, 2023

xiaobo-guo commented Jun 23, 2023

Zymrael commented Jun 23, 2023

xiaobo-guo commented Jun 23, 2023

sustcsonglin commented Jun 28, 2023

sustcsonglin commented Jun 29, 2023

xiaobo-guo commented Jun 29, 2023

Zymrael commented Jun 29, 2023

radarFudan commented Jun 30, 2023 •

edited

Loading

radarFudan commented Jul 7, 2023

configs for Hyena Wikitext103 experiments #28

configs for Hyena Wikitext103 experiments #28

Comments

xiaobo-guo commented Jun 23, 2023

Zymrael commented Jun 23, 2023

xiaobo-guo commented Jun 23, 2023

Zymrael commented Jun 23, 2023

xiaobo-guo commented Jun 23, 2023

sustcsonglin commented Jun 28, 2023

sustcsonglin commented Jun 29, 2023

xiaobo-guo commented Jun 29, 2023

Zymrael commented Jun 29, 2023

radarFudan commented Jun 30, 2023 • edited Loading

radarFudan commented Jul 7, 2023

radarFudan commented Jun 30, 2023 •

edited

Loading