Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advantage over Matlab HDF5 support? #7

Closed
ftadel opened this issue Apr 24, 2020 · 3 comments
Closed

Advantage over Matlab HDF5 support? #7

ftadel opened this issue Apr 24, 2020 · 3 comments
Labels
question Further information is requested

Comments

@ftadel
Copy link

ftadel commented Apr 24, 2020

Hello
Since this is working only with Matlab and not Octave, what is the advantage of this library over using the readily available HDF5 functions in Matlab?
https://www.mathworks.com/help/matlab/hdf5-files.html

I'm sorry if this is a silly question, I haven't looked at your code or examples at all...
I was just wondering about the use of this code in jsnirf, and questioning the necessity of many additional dependencies to Brainstorm if we want to reuse some of you code. Related with this PR: brainstorm-tools/brainstorm3#283

Thanks!

@fangq
Copy link
Member

fangq commented Apr 24, 2020

@ftadel, I was about to start a conversation with your team on file formats, glad you you mentioned this.

what is the advantage of this library over using the readily available HDF5 functions in Matlab?

the main appeal of easyh5 is its simplicity and usability. it is super compact and extremely easy to use.

the corresponding functions to loadh5/saveh5 in matlab's high-level H5 interface are h5read/h5write, but the latter have a lot of limitations.

For starters, they can't directly save a struct, a struct array, or a cell array. It only accepts numerical or string arrays as input. They are also unable to directly handle other advanced data structures, such as tables, graph, containers.Map etc, unless you use the low-level interfaces and serialize those manually.

In comparison, easyh5 accepts literally all matlab data structures, serializing those and storing in a single hdf5 file. It also reads any hdf5 file and obtain a convenient struct/containers.Map object to store the complex hierarchical data.

Secondly, even one calls low-level interface, without special settings, matlab's functions do not save data fields in their creation order. This can be quite annoying as some data records requires an appearance order. This was fixed in easyh5 https://github.com/fangq/easyh5/issues/1

Last but not the least, hdf5 is a general data storage format, like JSON and MessagePack, and it does not have internal data-structure vocabularies to serialize complex data structures. People will have to serialize those by themselves, and use non-standardized fields. For example, hdf5 does not have built-in complex numbers, people have been storing those in composite fields as r/i, Re/Im or Real/Imag etc. This makes it difficult to share and parse.

My solution to the above issue is to adopt the JData specification I defined for JSON-based data storage, aiming for standardizing the vocabularies of serializing complex data structures.

you can see my other related thoughts related to hdf5 in this BIDS thread:

bids-standard/bids-specification#197 (comment)

@ftadel
Copy link
Author

ftadel commented Apr 27, 2020

This is all very clear, thanks!

Which types of data does the SNIRF data format need to store in HDF5 that the Matlab functions can't store easily?
Only cell-arrays, or more than this?

@fangq
Copy link
Member

fangq commented Apr 27, 2020

The SNIRF data is basically a struct if mapped to a matlab data structure.

Unfortunately, matlab's h5read/h5write can only deal with numerical arrays. To support a hierarchical data, you will have to loop over subfields and call the low-level functions, which in the end is exactly what easyh5's loadh5/saveh5 does.

  • in other words, yes, you can write a manual code use matlab's low level H5 APIs, but it will end up with same thing as the loadh5/saveh5 code structure but less generalizable.

take a look at the loadh5/saveh5 code and you can see they are super compact (yet very general).

@ftadel ftadel closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants