-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About RealToBinaryNet model #176
Comments
Hi @appleleaves thanks for bringing this to our attention. The code on zoo is complete and should allow training the model. I just tried running it again from zoo and I also get some errors. I will investigate what is going on and get back to you. |
I run the experiments by using command
P.S. I use python3.7, tensorflow-gpu 1.15, larq installed by pip install larq(should be latest) |
Hi @appleleaves ,
Edit: a new version of larq-zoo is now released, so if you pip install larq_zoo --upgrade Then you should be able to use lqz TrainR2B |
Just a heads-up though: the training procedure is a simplified version of what we used to train the model on larq-zoo. Our internal code is dependent on our infrastructure and the training code on zoo is meant more as a documentation of the training procedure. It should run and produce (almost) the same result, but too keep it simple it misses some things like saving intermediate results for resuming and logging of diagnostics. You also mention that you are interested in SOTA networks. That is kind of a difficult thing to define. For instance, comparing the r2b network to QuickNetXL, r2b has half the number of parameters, but when doing inference on a Pixel1 using lce, the inference times are almost the same while QuickNetXL gets 2% higher accuracy. So that might also be an interesting network to look at, especially since it only has a single training phase, which is a lot simpler. |
Thanks for your advice! The code is working. |
When I run the code using |
Have you checked if this is an out-of-memory issue? By default, we cache our datasets in RAM, which means you need a lot of RAM to train (for ImageNet and other big datasets). You can disable caching by commenting out the call on this line. |
The R2B training has 4 stages, how can I resume it from, for example finishing the first stage? |
The weights of the intermediate models are saved in a shared directory see here and here. Note that each separate run has its own directory. The weights of previous stages are loaded in in later stages, see for example here. When only the model name is given (e.g. lqz TrainR2B initial_stage=1 stage_1.initialize_teacher_weights_from="/home/yourname/some/dir/models/resnet_fp" |
I follow the I then fix it by changing code in "larq_zoo/training/knowledge_distillation/multi_stage_training.py (lines bottom)" into
please let me know if I am wrong or the codes should be fixed. |
It indeed seems def run(self) -> None:
Path(self.parent_output_dir).mkdir(parents=True, exist_ok=True)
for i, experiment in enumerate(self.experiments):
if experiment.stage < self.initial_stage:
print(f"Skipping stage {experiment.stage}")
continue
print(f"Starting stage {experiment.stage} at {datetime.now().isoformat()}.")
experiment.run() Feel free to make PR fixing this. |
Hi @timdebruin
When I train with ImageNet dataset, it would work well because the input size is large, but for CIFAR-100, the input size is quite small and it can not pass through AvgPool2d, it can be solved by padding the input before entering AvgPool2d layer but the result I got is very low. I wonder in case of CIFAR-100 dataset, how I can handle this shortcut implementation in a proper way ? Thank you so much. |
For details on the architecture we recommend reaching out to the original authors directly. Architecture details like striding and pooling may be different from ImageNet. The paper also mentions some other differences between the two datasets such as mix-up which you may want to consider. |
I read from https://docs.larq.dev/zoo/ that the RealToBinaryNet reach 65% accuracy and reach the SOTA.
I really appreciate this and want to train the model to learn about it.
I also read the code and see the method is included, but errors appeared when I train the model. and it seems like the code is not completed.
I want to fix it but it seems a lot of effects.
If it is available, would you please share the training code?
The text was updated successfully, but these errors were encountered: