Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snapshots for model summaries #106

Closed
wants to merge 20 commits into from
Closed

Conversation

jneeven
Copy link
Contributor

@jneeven jneeven commented Feb 21, 2020

TODO:

  • Try this directly from zookeeper without subprocess

tests/models_test.py Outdated Show resolved Hide resolved
@jneeven jneeven marked this pull request as ready for review February 21, 2020 13:35
@jneeven
Copy link
Contributor Author

jneeven commented Feb 21, 2020

Rip in TFDS, I'm gonna try to get that generic dummy data up and running

@jneeven
Copy link
Contributor Author

jneeven commented Feb 27, 2020

@AdamHillier I know this is my PR, but could you update this to use #119 ? I can also do it myself, probably just not today

@leonoverweel
Copy link
Contributor

leonoverweel commented Feb 27, 2020

I don't think I like having the model summary strings in tests/snapshots/snap_models_test.py. If we put them all in separate .txt files (named by their models), it'll be easier to track model changes over time and find/link to individual model summaries.

@jneeven
Copy link
Contributor Author

jneeven commented Feb 27, 2020

I don't think I like having the model summary strings in tests/snapshots/snap_models_test.py. If we put them all in separate .txt files (named by their models), it'll be easier to track model changes over time and find/link to individual model summaries.

This is not a choice I've made; it's just how snapshots work. I'm not sure if what you suggest is possible without creating an explicit python test file for every model separately...

Edit: obviously it is possible, but then we'd need to not use snapshot and manually save, load and compare them. Sounds like a lot of hassle to me but could be okay. I also don't like the way snapshot works; if a comparison is wrong it will just print the entire string in red without much of an indication of what's wrong.

@leonoverweel
Copy link
Contributor

Ah, I see. Since we're literally just comparing the string output of model.summary() to the text contents of a .txt file though, it shouldn't be too hard to implement without the snapshottest framework right?

@jneeven
Copy link
Contributor Author

jneeven commented Feb 27, 2020

Ah, I see. Since we're literally just comparing the string output of model.summary() to the text contents of a .txt file though, it shouldn't be too hard to implement without the snapshottest framework right?

Yes true, I think I like this suggestion

@AdamHillier
Copy link
Contributor

The nice thing about the snapshot module is that there is a single pytest command to update the snapshot. As long as there is a nice way to replicate that with a single command, I'm happy to have individual txt files.

@jneeven
Copy link
Contributor Author

jneeven commented Feb 27, 2020

The nice thing about the snapshot module is that there is a single pytest command to update the snapshot. As long as there is a nice way to replicate that with a single command, I'm happy to have individual txt files.

As long as we keep these summaries in a specific folder, we can just delete the folder and re-generate the snapshots that way?

@AdamHillier
Copy link
Contributor

As long as we keep these summaries in a specific folder, we can just delete the folder and re-generate the snapshots that way?

Sure, if that's how it works then that sounds good.

tests/models_test.py Outdated Show resolved Hide resolved
larq_zoo/train.py Outdated Show resolved Hide resolved
@AdamHillier
Copy link
Contributor

I like this, once the isort linting is fixed should be good to go.

@leonoverweel
Copy link
Contributor

😍 love this!

tests/models_test.py Outdated Show resolved Hide resolved
@jneeven
Copy link
Contributor Author

jneeven commented Apr 6, 2020

@larq/core For some reason, keras.backend.clear_session() (called in parametrize) no longer seems to have any effect; the second model to be tested will have layer names like input_2 etc. Does anyone have any idea what might be the issue here? I'm on TF 2.0.0 and manually calling clear_session() makes no difference...

On February 27th, this was working fine so something must have changed

@leonoverweel
Copy link
Contributor

It also looks like it's failing on fixture 'snapshot' not found. Maybe some requirements/imports got messed up when you merged master?

@jneeven
Copy link
Contributor Author

jneeven commented Apr 6, 2020

It also looks like it's failing on fixture 'snapshot' not found. Maybe some requirements/imports got messed up when you merged master?

Yeah that's indeed interesting, though snapshottest is still in the requirements and pytest is imported so I don't see why that wouldn't work...

@AdamHillier
Copy link
Contributor

Hopefully CI for this should pass now that #166 / #168 are merged.

@AdamHillier
Copy link
Contributor

There's surely something wrong with 018ed29, Larq Zoo shouldn't be on its own line should it?

@jneeven
Copy link
Contributor Author

jneeven commented May 1, 2020

There's surely something wrong with 018ed29, Larq Zoo shouldn't be on its own line should it?

Why not? It's not a third-party library in this case

@AdamHillier
Copy link
Contributor

Why not? It's not a third-party library in this case

Sorry you're absolutely right, that was a dumb moment on my part 😂

@leonoverweel
Copy link
Contributor

So this looks like it's still failing now because auto-named layers have the wrong indices because previously-built models aren't properly being cleared out? So same problem as before :(

@AdamHillier
Copy link
Contributor

Haha so it turns out that this code actually works just fine, it's just that the snapshots must have been generated when it wasn't working because the layer names in the .txt files are definitely wrong, they just need to be re-generated :)

I'm about to push a commit that does that.

@AdamHillier
Copy link
Contributor

AdamHillier commented May 5, 2020

Well that's progress, but unfortunately it looks like TF 2.2 changes a default layer name, tf_op_layer_Mul -> tf_op_layer_mul. Not sure how we can get around that without setting layer names explicitly....

@jneeven
Copy link
Contributor Author

jneeven commented May 6, 2020

Well that's progress, but unfortunately it looks like TF 2.2 changes a default layer name, tf_op_layer_Mul -> tf_op_layer_mul. Not sure how we can get around that without setting layer names explicitly....

I compare the summary strings manually so we could just use assert snapshot.lower() == current.lower()... It's not extremely pretty but I don't think it's necessarily a bad solution either; in general this makes the snapshots less brittle (I don't foresee any scenario in which this would cause issues)

@leonoverweel leonoverweel requested a review from lgeiger May 6, 2020 10:09
@leonoverweel
Copy link
Contributor

leonoverweel commented May 6, 2020

Nice, exciting that this is ready now!

Copy link
Contributor

@AdamHillier AdamHillier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@jneeven
Copy link
Contributor Author

jneeven commented May 7, 2020

@lgeiger I think this should be good to go, could you review this another time? Merging is currently blocked.

@jneeven
Copy link
Contributor Author

jneeven commented Aug 17, 2020

@larq/core do we still want this? If so, I'll resolve the conflicts and we can get this merged. If not, let's close this PR.

@AdamHillier
Copy link
Contributor

@larq/core do we still want this? If so, I'll resolve the conflicts and we can get this merged. If not, let's close this PR.

I don't have a strong desire for this, and it seems like it'd be a bunch of work to get it working, so I'd be minded to close it.

@lgeiger
Copy link
Member

lgeiger commented Aug 17, 2020

@larq/core do we still want this? If so, I'll resolve the conflicts and we can get this merged. If not, let's close this PR.

I'm fairly neutral to this, although I fear that these tests might be tricky to maintain accross the multiple versions of TensorFlow which we support since the snapshots might be slightly different depending on the TensorFlow version used to generate them.

@jneeven
Copy link
Contributor Author

jneeven commented Aug 18, 2020

Would it make sense to just compare the ModelProfiles instead? Then we don't need to care about the layer names etc, but will at least be notified if e.g, the number of MACs suddenly changes unintentionally

@koenhelwegen
Copy link
Contributor

Would it make sense to just compare the ModelProfiles instead? Then we don't need to care about the layer names etc, but will at least be notified if e.g, the number of MACs suddenly changes unintentionally

Would this catch issues that are not already covered by the unit tests for the ModeProfile itself? (https://github.com/larq/larq/blob/master/larq/models_test.py#L66)

@jneeven
Copy link
Contributor Author

jneeven commented Aug 19, 2020

Would this catch issues that are not already covered by the unit tests for the ModeProfile itself? (https://github.com/larq/larq/blob/master/larq/models_test.py#L66)

It would do that check for all the models, rather than just for the toy model used in the unit tests. That test is a good way to check the calculations made in ModelProfile are still correct, but the idea behind the tests here is that we'd want to catch subtle things like accidentally enabling / disabling biases somewhere while updating some code

@jneeven
Copy link
Contributor Author

jneeven commented Aug 27, 2020

We decided that at this time, it's not worth the extra effort to change to a ModelProfile snapshot solution. I'll close this PR for now, but we can re-open it in the future if we do decide this is necessary.

@jneeven jneeven closed this Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot tests of model summaries
5 participants