Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata logging even when actors crash #721

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

shivdeep-singh-ibm
Copy link
Collaborator

Why are these changes needed?

In case the ray actors crash, the key processing_time may not be updated to stats, so while rounding off this key, we may get an exception which stops metadata from being written.

Related issue number (if any).

Also seen in the crash here. #719

updated.

In case the ray actors crash, the key `processing_time` may not be
updated to stats, so while rounding off this key, we may get an
exception which stops metadata from being written.

Signed-off-by: Shivdeep Singh <[email protected]>
@@ -142,7 +142,8 @@ def orchestrate(
# Compute execution statistics
logger.debug("Computing execution stats")
stats = runtime.compute_execution_stats(ray.get(statistics.get_execution_stats.remote()))
stats["processing_time"] = round(stats["processing_time"], 3)
if "processing_time" in stats:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set this to 0 if not found in case downstream code depends on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, value 0 for processing_time can be misleading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a missing key downstream means what? I suspect this really means we're not properly handling the crashing of the actor and should indicate that in some other way, perhaps with a graceful shutdown.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way stats["processing_time"] = round(stats["processing_time"], 3) can throw an exception is in the case where either:

  1. init method in file processor fails due to inability to create transform
  2. All of the transform invocations failed and "processing time" was never reported by file processor

In both cases, there is really no statistics of interest to be collected.
But if you really want to avoid this exception the right thing to do (in my opinion) is to do what Shivdeep is doing:

      if "processing_time" in stats:
            stats["processing_time"] = round(stats["processing_time"], 3)

This will return metadata with no processing time value, which is correct in this case

Note that if you want to go for this fix, please fix it in all 3 orchestrators - Python, Ray and Spark. Not just Ray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants