Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full circle: Nextflow #93

Merged
merged 16 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11" ]
python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11", "3.12" ]
name: Pre-commit python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/inab/WfExS-backend"
type: software
title: "WfExS-backend"
version: 0.99.1
version: 0.99.2
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,10 @@ WfExS commands are:

* `stage`: This command is used to first validate workflow staging and security context configuration files, then fetch all the workflow preconditions and files, staging them for an execution. It honours `-L`, `-W`, `-Z` parameters and `WFEXS_CONFIG_FILE` environment variable, and once the staging is finished it prints the path to the parent execution environment.

* `re-stage`: This command is used to reuse an already staged workflow in a completely uncoupled working directory. The command allows replacing some of the parameters.

* `import`: This command is used to fetch and import a previously generated Workflow Run RO-Crate, for reproducibility. The command allows replacing some of the original parameters, for replicability.

* `staged-workdir`: This command is complementary to `stage`. It recognizes both `-L` parameter and `WFEXS_CONFIG_FILE` environment variable. This command has several subcommands which help on the workflow execution lifecycle (list available working directories and their statuses, remove some of them, execute either a shell or a custom command in a working directory context, execute, export prospective and retrospective provenance to RO-Crate, ...).

* `export`: This command is complementary to `stage`. It recognizes both `-L` parameter and `WFEXS_CONFIG_FILE` environment variable, and depends on `-J` parameter to locate the execution environment directory to be used, properly staged through `stage`. It also depends on both -E and -Z parameters, to declare the different export patterns and the needed credentials to complete the rules. This command has a couple of subcommands to list previously exported items and to do those exports.
Expand Down
2 changes: 1 addition & 1 deletion wfexs_backend/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
__license__ = "Apache 2.0"

# https://www.python.org/dev/peps/pep-0396/
__version__ = "0.99.1"
__version__ = "0.99.2"
__url__ = "https://github.com/inab/WfExS-backend"
__official_name__ = "WfExS-backend"

Expand Down
11 changes: 10 additions & 1 deletion wfexs_backend/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,8 @@ class ExpectedOutput(NamedTuple):
glob: When the workflow engine does not use symbolic
names to label the outputs, this is the filename pattern to capture the
local path, based on the output / working directory.
syntheticOutput: It is true for outputs which do not really exist
either as parameter or explicit outputs.
"""

name: "SymbolicOutputName"
Expand All @@ -424,9 +426,10 @@ class ExpectedOutput(NamedTuple):
cardinality: "Tuple[int, int]"
fillFrom: "Optional[SymbolicParamName]" = None
glob: "Optional[GlobPattern]" = None
syntheticOutput: "Optional[bool]" = None

def _marshall(self) -> "MutableMapping[str, Any]":
mD = {
mD: "MutableMapping[str, Any]" = {
"c-l-a-s-s": self.kind.name,
"cardinality": list(self.cardinality),
}
Expand All @@ -437,6 +440,8 @@ def _marshall(self) -> "MutableMapping[str, Any]":
mD["glob"] = self.glob
if self.fillFrom is not None:
mD["fillFrom"] = self.fillFrom
if self.syntheticOutput is not None:
mD["syntheticOutput"] = self.syntheticOutput

return mD

Expand All @@ -451,6 +456,7 @@ def _unmarshall(cls, **obj: "Any") -> "ExpectedOutput":
fillFrom=obj.get("fillFrom"),
glob=obj.get("glob"),
cardinality=cast("Tuple[int, int]", tuple(obj["cardinality"])),
syntheticOutput=cast("Optional[bool]", obj.get("syntheticOutput")),
)


Expand Down Expand Up @@ -525,6 +531,9 @@ class MaterializedOutput(NamedTuple):
kind: "ContentKind"
expectedCardinality: "Tuple[int, int]"
values: "Union[Sequence[bool], Sequence[str], Sequence[int], Sequence[float], Sequence[AbstractGeneratedContent]]"
syntheticOutput: "Optional[bool]" = None
filledFrom: "Optional[str]" = None
glob: "Optional[GlobPattern]" = None


class LocalWorkflow(NamedTuple):
Expand Down
107 changes: 76 additions & 31 deletions wfexs_backend/ro_crate.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,10 @@
ContainerType2AdditionalType,
ContainerTypeMetadata,
ContainerTypeMetadataDetails,
WFEXS_TERMS_CONTEXT,
WFEXS_TERMS_NAMESPACE,
WORKFLOW_RUN_CONTEXT,
WORKFLOW_RUN_NAMESPACE,
)

magic = lazy_import("magic")
Expand Down Expand Up @@ -929,21 +932,25 @@ def _init_empty_crate_and_ComputerLanguage(
RO_licences = self._process_licences(licences)

# Add extra terms
# self.crate.metadata.extra_terms.update(
# {
# "sha256": WORKFLOW_RUN_CONTEXT + "#sha256",
# # Next ones are experimental
# ContainerImageAdditionalType.Docker.value: WORKFLOW_RUN_CONTEXT + "#"
# + ContainerImageAdditionalType.Docker.value,
# ContainerImageAdditionalType.Singularity.value: WORKFLOW_RUN_CONTEXT + "#"
# + ContainerImageAdditionalType.Singularity.value,
# "containerImage": WORKFLOW_RUN_CONTEXT + "#containerImage",
# "ContainerImage": WORKFLOW_RUN_CONTEXT + "#ContainerImage",
# "registry": WORKFLOW_RUN_CONTEXT + "#registry",
# "tag": WORKFLOW_RUN_CONTEXT + "#tag",
# }
# )
self.crate.metadata.extra_terms.update(
{
# "sha256": WORKFLOW_RUN_NAMESPACE + "sha256",
# # Next ones are experimental
# ContainerImageAdditionalType.Docker.value: WORKFLOW_RUN_NAMESPACE
# + ContainerImageAdditionalType.Docker.value,
# ContainerImageAdditionalType.Singularity.value: WORKFLOW_RUN_NAMESPACE
# + ContainerImageAdditionalType.Singularity.value,
# "containerImage": WORKFLOW_RUN_NAMESPACE + "containerImage",
# "ContainerImage": WORKFLOW_RUN_NAMESPACE + "ContainerImage",
# "registry": WORKFLOW_RUN_NAMESPACE + "registry",
# "tag": WORKFLOW_RUN_NAMESPACE + "tag",
"syntheticOutput": WFEXS_TERMS_NAMESPACE + "syntheticOutput",
"globPattern": WFEXS_TERMS_NAMESPACE + "globPattern",
"filledFrom": WFEXS_TERMS_NAMESPACE + "filledFrom",
}
)
self.crate.metadata.extra_contexts.append(WORKFLOW_RUN_CONTEXT)
# self.crate.metadata.extra_contexts.append(WFEXS_TERMS_CONTEXT)

self.compLang = rocrate.model.computerlanguage.ComputerLanguage(
self.crate,
Expand Down Expand Up @@ -1337,6 +1344,11 @@ def addWorkflowInputs(

failed_licences: "MutableSequence[URIType]" = []
for in_item in inputs:
# Skip autoFilled inputs, as they should have their
# mirror parameters in outputs
if in_item.autoFilled:
continue

formal_parameter_id = (
f"{self.wf_file.id}#{input_sep}:"
+ urllib.parse.quote(in_item.name, safe="")
Expand Down Expand Up @@ -1756,7 +1768,7 @@ def _add_file_to_crate(
dest_path=the_name if do_attach else None,
clazz=SourceCodeFile if is_soft_source else FixedFile,
)
if do_attach and (the_uri is not None):
if the_uri is not None:
if the_uri.startswith("http") or the_uri.startswith("ftp"):
# See https://github.com/ResearchObject/ro-crate/pull/259
uri_key = "contentUrl"
Expand Down Expand Up @@ -1833,7 +1845,7 @@ def _add_directory_as_dataset(
validate_url=False,
# properties=file_properties,
)
if do_attach and (the_uri is not None):
if the_uri is not None:
if the_uri.startswith("http") or the_uri.startswith("ftp"):
# See https://github.com/ResearchObject/ro-crate/pull/259
uri_key = "contentUrl"
Expand Down Expand Up @@ -1995,6 +2007,8 @@ def _add_workflow_to_crate(
the_name: "Optional[str]" = None
rocrate_wf_folder: "str" = os.path.relpath(the_workflow.dir, self.work_dir)
the_alternate_name: "str"
assert self.staged_setup.workflow_dir is not None
the_alternate_name = os.path.relpath(the_path, self.staged_setup.workflow_dir)
if do_attach:
# if wf_entrypoint_url is not None:
# # This is needed to avoid future collisions with other workflows stored in the RO-Crate
Expand All @@ -2004,14 +2018,8 @@ def _add_workflow_to_crate(
# else:
# rocrate_wf_folder = str(uuid.uuid4())

the_alternate_name = os.path.relpath(the_path, the_workflow.dir)
the_name = rocrate_wf_folder + "/" + the_alternate_name
else:
the_alternate_name = cast(
"RelPath",
os.path.join(
rocrate_wf_folder, os.path.relpath(the_path, the_workflow.dir)
),
the_name = os.path.join(
rocrate_wf_folder, os.path.relpath(the_path, the_workflow.dir)
)

# When the id is none and ...
Expand Down Expand Up @@ -2084,7 +2092,7 @@ def _add_workflow_to_crate(
if added_operational_container not in existing_operational_containers:
existing_operational_containers.append(added_operational_container)

if do_attach and (the_uri is not None):
if the_uri is not None:
if the_uri.startswith("http") or the_uri.startswith("ftp"):
# See https://github.com/ResearchObject/ro-crate/pull/259
uri_key = "contentUrl"
Expand Down Expand Up @@ -2144,15 +2152,17 @@ def _add_workflow_to_crate(
self.crate.add(the_entity)
else:
rocrate_file_id = rocrate_file_id_base + "/" + rel_file
the_name = cast(
the_s_name = cast(
"RelPath", os.path.join(rocrate_wf_folder, rel_file)
)
the_alternate_name = os.path.relpath(
os.path.join(the_workflow.dir, rel_file),
self.staged_setup.workflow_dir,
)
the_entity = self._add_file_to_crate(
the_path=os.path.join(the_workflow.dir, rel_file),
the_name=the_name,
the_alternate_name=cast("RelPath", rel_file)
if do_attach
else the_name,
the_name=the_s_name,
the_alternate_name=cast("RelPath", the_alternate_name),
the_uri=cast("URIType", rocrate_file_id),
do_attach=do_attach,
is_soft_source=True,
Expand Down Expand Up @@ -2195,6 +2205,18 @@ def addWorkflowExpectedOutputs(
identifier=formal_parameter_id,
additional_type=additional_type,
)

# This one must be a real boolean, as of schema.org
if out_item.syntheticOutput is not None:
formal_parameter["valueRequired"] = not out_item.syntheticOutput
formal_parameter["syntheticOutput"] = out_item.syntheticOutput
if out_item.syntheticOutput:
if out_item.glob is not None:
formal_parameter["globPattern"] = out_item.glob
if out_item.fillFrom is not None:
# This is a bit dirty, but effective
formal_parameter["filledFrom"] = out_item.fillFrom

self.crate.add(formal_parameter)

# Add to the list only when it is needed
Expand Down Expand Up @@ -2506,6 +2528,18 @@ def _add_workflow_execution_outputs(
identifier=formal_parameter_id,
additional_type=additional_type,
)

# This one must be a real boolean, as of schema.org
if out_item.syntheticOutput is not None:
formal_parameter["valueRequired"] = not out_item.syntheticOutput
formal_parameter["syntheticOutput"] = out_item.syntheticOutput
if out_item.syntheticOutput:
if out_item.glob is not None:
formal_parameter["globPattern"] = out_item.glob
if out_item.filledFrom is not None:
# This is a bit dirty, but effective
formal_parameter["filledFrom"] = out_item.filledFrom

self.crate.add(formal_parameter)
self.wf_file.append_to("output", formal_parameter, compact=True)

Expand Down Expand Up @@ -2629,6 +2663,11 @@ def _add_GeneratedContent_to_crate(

if the_content.uri is not None and not the_content.uri.uri.startswith("nih:"):
the_content_uri = the_content.uri.uri
crate_file = cast(
"Optional[FixedFile]", self.crate.dereference(the_content_uri)
)
if crate_file is not None:
return crate_file
else:
the_content_uri = None

Expand Down Expand Up @@ -2687,6 +2726,9 @@ def _add_GeneratedDirectoryContent_as_dataset(
the_id = dest_path
else:
the_id = the_uri
crate_dataset = cast(
"Optional[FixedDataset]", self.crate.dereference(the_id)
)
# if the_uri is not None:
# an_uri = the_uri
# dest_path = None
Expand All @@ -2696,6 +2738,9 @@ def _add_GeneratedDirectoryContent_as_dataset(
# # digest, algo = extract_digest(the_content.signature)
# # dest_path = hexDigest(algo, digest)

if crate_dataset is not None:
return crate_dataset, the_files_crates

crate_dataset = self.crate.add_dataset_ext(
identifier=the_id,
source=the_content.local if do_attach else None,
Expand All @@ -2705,7 +2750,7 @@ def _add_GeneratedDirectoryContent_as_dataset(
# properties=file_properties,
)

if do_attach and (the_uri is not None):
if the_uri is not None:
if the_uri.startswith("http") or the_uri.startswith("ftp"):
# See https://github.com/ResearchObject/ro-crate/pull/259
uri_key = "contentUrl"
Expand Down
6 changes: 6 additions & 0 deletions wfexs_backend/schemas/stage-definition.json
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@
"minLength": 1
},
"security-context": {
"description": "Use an explicitly named security context",
"type": "string",
"minLength": 1
},
Expand All @@ -250,6 +251,7 @@
"default": false
},
"autoPrefix": {
"description": "When autoFill is true and this parameter is false, this directory is mapped to the parent output one for this execution. When both autoFill and this parameter are true, an output file or directory name is assigned, based on its complete param name",
"type": "boolean",
"default": false
}
Expand Down Expand Up @@ -733,6 +735,10 @@
"type": "string",
"minLength": 1
},
"syntheticOutput": {
"description": "Is this output a synthetic one? The default value when it is not defined depends on the type of workflow.",
"type": "boolean"
},
"glob": {
"description": "Glob pattern to get the files and directories to be assigned to this output, useful in workflow models where outputs are not explicitly declared (Nextflow, Snakemake)",
"type": "string",
Expand Down
Loading
Loading