You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m curious about approaches for cataloging large numbers of Intake-ESM datastores. The conventional approach of nesting Intake YAMLFileCatalogs (e.g. the Pangeo Intake catalog) works great when there are only a few Intake-ESM datastores with a clear hierarchy, but data search/discovery is pretty limited when there are many datastores. My experience is that, to some extent, users have to know what they’re looking for in order to be able to use the catalog effectively.
My simple attempt to try and improve user experience was to write a new Intake plugin called intake-dataframe-catalog that provides a tabular catalog of Intake sources and associated metadata. The design and API is inspired by Intake-ESM, but the entries in an intake-dataframe-catalog are other Intake sources (e.g. Intake-ESM datastores). Similar to the way that users filter for datasets using Intake-ESM, users can filter on metadata in an intake-dataframe-catalog and eventually open the sources that are of interest to them.
This post is partly to make people aware of intake-dataframe-catalog and partly to see if there are other approaches out there for solving this same issue?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I’m curious about approaches for cataloging large numbers of Intake-ESM datastores. The conventional approach of nesting Intake YAMLFileCatalogs (e.g. the Pangeo Intake catalog) works great when there are only a few Intake-ESM datastores with a clear hierarchy, but data search/discovery is pretty limited when there are many datastores. My experience is that, to some extent, users have to know what they’re looking for in order to be able to use the catalog effectively.
My simple attempt to try and improve user experience was to write a new Intake plugin called intake-dataframe-catalog that provides a tabular catalog of Intake sources and associated metadata. The design and API is inspired by Intake-ESM, but the entries in an intake-dataframe-catalog are other Intake sources (e.g. Intake-ESM datastores). Similar to the way that users filter for datasets using Intake-ESM, users can filter on metadata in an intake-dataframe-catalog and eventually open the sources that are of interest to them.
Here’s the intake-dataframe-catalog documentation: https://intake-dataframe-catalog.readthedocs.io/en/latest/?badge=latest
And here’s an example of an intake-dataframe-catalog of many Intake-ESM datastores: https://access-nri-intake-catalog.readthedocs.io/en/latest/usage/quickstart.html
(unfortunately only those with access to Australia’s supercomputer Gadi can actually use this catalog)
This post is partly to make people aware of intake-dataframe-catalog and partly to see if there are other approaches out there for solving this same issue?
Beta Was this translation helpful? Give feedback.
All reactions