Serialization of big Numpy arrays #150

albertz · 2022-05-06T07:38:33Z

There are cases when some bigger Numpy array is part of your net dict, e.g. when you have some custom init for some parameter, e.g. like in the case of GammatoneV2.

When some bigger Numpy array is part of the net dict, it is currently serialized as is, via __repr__. This makes the produced net dict very difficult to read, when 99% of it is just the Numpy array.

So, should we do sth about it?

What are possible things we could do? Here some ideas:

We could at least move the definition to the top, similar as we do it for dim tags. Then the net dict itself stays readable. But still 99% of the resulting RETURNN config would be just the Numpy array.

We could move them outside, either as Numpy txt files and do numpy.loadtxt, or as Python files and import them. However, this means that any config serialization logic now needs extra logic to handle these cases. Although we are probably only writing this once anyway and then not care anymore about it.

Such external file handling of the serialization could also be done in a generic way, and maybe it becomes useful for other purpose as well.

The text was updated successfully, but these errors were encountered:

albertz · 2022-05-06T07:38:52Z

@JackTemaki @Atticus1806 opinions?

Atticus1806 · 2022-05-09T09:03:54Z

I would really prefer external handling. The config is already usually a lot longer than "old" configs due to explicitly setting everything, but it is still readable. I feel like dumping big arrays into it would probably make it unreadable or at least very annoying (slows text editor etc.).

albertz · 2022-05-09T09:32:56Z

Yes, me too. But then the next question is, how exactly?

I mean, probably numpy.loadtxt should be fine.

Should the path be relative? Relative to what?

Where should those files be stored?

How should the API look like? get_returnn_config would get some extra param like extra_out_dir? Is there any reasonable default? Probably not...

JackTemaki · 2022-05-09T09:46:38Z

For Sisyphus usage the extra_out_dir would be fine, because then we can even place it with an absolute path if wanted. So in the end this will probably be a extra_data (or somewhat similar named) directory next to the config file.

JackTemaki · 2022-05-09T09:47:13Z

I would prefer if it works relative thought, because then you can move both the config and the extra dirs around

albertz · 2022-05-09T09:52:45Z

Ok, extra_out_dir.

Where do we expect the config to be? So how should we generate relative paths to extra_out_dir? Should this be configurable? config_extra_out_dir_prefix or whatever?

Should there be a reasonable default for extra_out_dir? Maybe allow None in which case this is not used? I think for many simple test cases, this might make it simpler. But for Sisyphus usage or any setup pipeline, you would set this.

JackTemaki · 2022-05-09T10:02:43Z

Should there be a reasonable default for extra_out_dir? Maybe allow None in which case this is not used? I think for many simple test cases, this might make it simpler. But for Sisyphus usage or any setup pipeline, you would set this.

Yes why not this way. With Sisyphus we always know where the file should be, and for the tests it can be within the config.

albertz · 2022-05-09T10:32:30Z

So, where do we expect the config to be? So how should we generate relative paths to extra_out_dir? Should this be configurable? config_extra_out_dir_prefix or whatever?

albertz mentioned this issue May 6, 2022

Missing pieces for first release #32

Open

albertz added this to the first-release milestone May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization of big Numpy arrays #150

Serialization of big Numpy arrays #150

albertz commented May 6, 2022

albertz commented May 6, 2022

Atticus1806 commented May 9, 2022

albertz commented May 9, 2022

JackTemaki commented May 9, 2022

JackTemaki commented May 9, 2022

albertz commented May 9, 2022

JackTemaki commented May 9, 2022

albertz commented May 9, 2022

Serialization of big Numpy arrays #150

Serialization of big Numpy arrays #150

Comments

albertz commented May 6, 2022

albertz commented May 6, 2022

Atticus1806 commented May 9, 2022

albertz commented May 9, 2022

JackTemaki commented May 9, 2022

JackTemaki commented May 9, 2022

albertz commented May 9, 2022

JackTemaki commented May 9, 2022

albertz commented May 9, 2022