`Component`s should be independent #121

dafyddstephenson · 2024-10-03T18:27:02Z

Conceptually, a Component should be able to run in standalone, without being defined as part of a Case, and without being coupled to any other Component.

The Component class shares many of its methods with Case: setup, build, pre_run, run, and post_run. When these methods are called on Case, Case loops over each Component and calls the methods on them in turn.

Component should be able to run all of these methods without existing in the context of a Case, but can't do this sensibly, as certain fundamental aspects of a simulation are held in the Case class above the reach of the Component instance, such as the Case.start_date and Case.end_date attributes, which are used to pass an n_time_steps parameter to Component.run(), or the Case.caseroot attribute, which is used to pass an output_dir parameter to Component.run().

I propose that we:

1. restructure the `caseroot` directory.

The caseroot currently resembles, e.g.:

.
└── caseroot
    ├── output
    │   └── ROMS
    ├── additional_source_code
    │   └── ROMS
    │       ├── bgc.opt
    │       ├── cppdefs.opt
    │       └── my_custom_module.F
    ├── input_datasets
    │   └── ROMS
    │       ├── input_dataset1.nc
    │       └── input_dataset2.nc
    └── namelists
        └── ROMS
            └── roms.in

This is, in my opinion, more navigable to a user exploring it on the filesystem, but jumbles the hierarchical order of classes in C-Star. It would be more compatible with C-Star's design to structure it as follows:

.
└── caseroot
    └── ROMS
        ├─ output
        ├── additional_source_code
        │   ├── bgc.opt
        │   ├── cppdefs.opt
        │   └── my_custom_module.F
        ├── input_datasets
        │   ├── input_dataset1.nc
        │   └── input_dataset2.nc
        └── namelists
            └── roms.in

The advantage of this structure is that it reflects the design of C-Star. Perhaps more importantly, a component_root attribute naturally follows from this structure, allowing a user to run ROMS outside of the context of a Case with other Components, without even defining a Case or caseroot. Following #115 , this is likely to be the only context in which C-Star is run for the foreseeable future. If the user is running the Component as part of a Case, the Component.component_root (or whatever we call it) would just be defined as a subdirectory of Case.caseroot.

2. Add `start_date` and `end_date` as attributes on a Component.

In the event that the Component contributes to a Case, the Case could select the earliest/latest start/end dates from all the Components, or something like that. Suggestions 1 & 2 together would allow independence of the Component class.

3. Consider eliminating the `Case` class altogether and renaming `Component`

Following #115 , there is (well, will be) only a single Component supported by C-Star. Following the above suggestions, it will be able to run as a standalone object. It is certainly worthwhile to anticipate that this will change, but I don't see it as making sense that we accommodate the change without thoroughly considering how it will look. Building an umbrella class that has no formal coupling infrastructure and just calls all its constituent components in a loop does not reflect expected usage of a multi-component system.

Without Case , however , the name Component is a bit redundant (component of what?) and should thus be changed. To what, I'm unsure. Maybe we could ask the public as an outreach effort.

@matt-long @NoraLoose @TomNicholas it would be great to hear your thoughts on this

The text was updated successfully, but these errors were encountered:

matt-long · 2024-10-09T00:39:18Z

Thanks for the conversation today, @dafyddstephenson. I support this conceptual change.

As discussed, the term component can be thought of as a "component of the C-Star workflow." I am not opposed to the term component, but if we are to keep it, we need to be careful to disambiguate the use of the term in coupled model, where "component" refers to a "component model."

Alternative terminology:

element
process
subsystem

I kind of like subsystem — the flavor of "system" rings true with our ROMS-MARBL example and even, perhaps, a coupled ESM.

We were focused on geophysical codes today — but more broadly our workflows will require postprocessing analysis.

Do we consider the analysis sequences as another flavor of a subsystem?

From a workflow management perspective, what is the most generic description of a subsystem?

The methods:

setup
- ensure input data
run
- provision of compute resource
finalize
- curate/register output data
persist (i.e., mint a blueprint)

The various flavors of subsystem may have differences in their APIs and additional "low level" methods — but the concept of a blueprint could flow through all. Could it?

TomNicholas added the design top-level design question label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Component`s should be independent #121

`Component`s should be independent #121

dafyddstephenson commented Oct 3, 2024

matt-long commented Oct 9, 2024

Components should be independent #121

Components should be independent #121

Comments

dafyddstephenson commented Oct 3, 2024

1. restructure the caseroot directory.

2. Add start_date and end_date as attributes on a Component.

3. Consider eliminating the Case class altogether and renaming Component

matt-long commented Oct 9, 2024

`Component`s should be independent #121

`Component`s should be independent #121

1. restructure the `caseroot` directory.

2. Add `start_date` and `end_date` as attributes on a Component.

3. Consider eliminating the `Case` class altogether and renaming `Component`