Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed behaviour of source._open_dataset to: #681

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

charles-turner-1
Copy link

@charles-turner-1 charles-turner-1 commented Oct 4, 2024

Change Summary

Changes discovery of variables in source._open_dataset to additionally include coordinate variables, with the following behaviour:

  • Only data variables requested & coordinates they depend on are returned if only data variables are requested (no change from previous behaviour).

  • Entire dataset (all data & coordinate variables) are returned if no variables are requested (no change from previous behaviour).

  • Only requested coordinate variables are returned if only coordinate variables are requested (updated behaviour).

  • Data variables requested, coordinates they depend on, and additional requested coordinate variables are returned both data and coordinate variables are requested (updated behaviour)

  • Sync fork (intake_esm/cat.py)

Related issue number

Closes #660

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI

 - Search for data & coordinate variables from just data variables.
 - Don't check to remove unnecessary coordinates & variables from dataset as
this automatically removes all requested coordinate variables.
 - If no data variables are found, load the first dataset returned: this
avoids concatenation issues resulting from trying to concatenate along
nonexistent dimenions.

Added a 'test_request_coord_vars' test to test/test_source to
ensure the following behaviour:
- Only data variables requested & coordinates they depend on are
returned if only data variables are requested (no change from
previous behaviour).
- Entire dataset (all data & coordinate variables) are returned if
no variables are requested (no change from previous behaviour).
- Only requested coordinate variables are returned if only
coordinate variables are requested (updated behaviour).
- Data variables requested, coordinates they depend on, and
additional requested coordinate variables are returned both data
and coordinate variables are requested (updated behaviour).
@charles-turner-1
Copy link
Author

charles-turner-1 commented Oct 8, 2024

NB. I've reproduced the failing test on main (see attached image).

I've now checked multiple test runs & there seems to be some stochastic failure of all parametrisations of the tests/test_core::test_to_dataset_dict() function. I'm currently investigating further, however I don't presently believe that this change introduces the test failure.

Screenshot 2024-10-08 at 10 00 15 AM

```
datasets = [
    ds.set_coords(set(ds.variables) - set(ds.attrs[OPTIONS['vars_key']]))
    for ds in datasets
]
```
to _open_dataset - ought to fix failing test. Removal was based on wrong
assumption that this check always removes specified coordinate
variables, which is only true if they are not passed in the __init__ of
the ESMDataSource class.
@charles-turner-1 charles-turner-1 marked this pull request as draft October 11, 2024 05:51
… return first dataset if so - necessary to indexing static files
charles-turner-1 and others added 3 commits October 15, 2024 19:01
… return first dataset if so - necessary to indexing static files

Appears to be potential intermittent issue with Read the Docs build -
unable to reproduce reliably locally.
@andersy005
Copy link
Member

thank you so much for this addition, @charles-turner-1!

regarding the test failures, the recent test failures have highlighted the need to revamp our testing infrastructure. one area of focus is reducing our reliance on cloud-based datasets, which are prone to changes without prior notice. additionally, network connectivity glitches often lead to intermittent test failures.

@charles-turner-1 charles-turner-1 marked this pull request as ready for review October 16, 2024 23:11
@charles-turner-1
Copy link
Author

charles-turner-1 commented Oct 16, 2024

No worries @andersy005 - think this one is ready for review now.

Our infrastructure for interacting with model outputs at ACCESS-NRI builds pretty heavily/directly on intake-esm, so would be happy to assist with identifying/improving flaky tests.

@charles-turner-1
Copy link
Author

@mgrover1 If you're able to review that would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot open coordinate variables from Intake-ESM datastores
2 participants