Enable per job class `max_job_runtime` #240

sambostock · 2022-06-28T02:18:47Z

This allows incremental adoption of the setting, without applying the setting globally.

JobIteration.max_job_runtime = nil

class ParentJob < ApplicationJob
  include JobIteration::Iteration
  self.max_job_runtime = 5.minutes
  # ...
end

Alternatively, it allows applications to set a conservative global setting, and a more aggressive setting per jobs.

class ChildJob < ParentJob
  self.max_job_runtime # 5 minutes, inherited
end
class CarefulChildJob < ParentJob
  self.max_job_runtime = 3.minutes # override
end

In order to prevent rogue jobs from causing trouble, the per-job override can only be set to a value less than the inherited value.

class RogueJob < ParentJob
  self.max_job_runtime = 10.minutes # 💥 ArgumentError
end
class UncleJob < ApplicationJob
  self.max_job_runtime = 10.minutes # ✅ No global default or inherited value, so okay
end

⚠️ Before Merging

Clean up commits

sambostock · 2022-06-29T14:56:19Z

CI should be fixed by #241

sambostock · 2022-06-29T19:35:02Z

CI appears to only fail on Rails 5.2. I'm not sure why, and given #241 drops support, I probably won't bother investigating.

Mangara

I really like the idea! Should we namespace the setting on the job, to avoid conflicts? Something like self.job_iteration_max_runtime?

sambostock · 2022-06-29T20:53:56Z

Should we namespace the setting on the job, to avoid conflicts?

This had crossed my mind, and I'm open to it. On the one hand, it makes sense to avoid conflicts. On the other hand, we don't namespace anything else (e.g. on_start, on_shutdown, & on_complete callback DSL methods). I guess it would be possible for Active Job, or some adapter to introduce a similar idea of a time limit... 🤔

Mangara · 2022-06-29T20:59:02Z

Hedwig already has its own max_job_runtime with subtly different semantics 🙈 It's only a global config for now, so no collision yet, but just like this one, it may make sense to have a job-level override.

adrianna-chang-shopify

I really like this as a feature! ❤️ A couple small comments, and I'll leave it to the Job Platform experts to make the call on naming, but this looks good from my perspective!

lib/job-iteration/iteration.rb

test/unit/iteration_test.rb

sambostock · 2022-07-04T15:15:13Z

guides/best-practices.md

+class MyJob < ApplicationJob
+  include JobIteration::Iteration
+
+  self.job_iteration_max_job_runtime = 3.minutes


I have renamed the class attribute 👍

adrianna-chang-shopify

❤️

pedropb

Implementation LGTM.

If you can share a bit more context on why this feature is being added now, it could help answer some of the questions I left on the PR.

pedropb · 2022-07-05T15:19:15Z

guides/best-practices.md

+  # ...
+```
+
+This setting will be inherited by any child classes, although it can be further overridden. Note that no class can **increase** the `max_job_runtime` it has inherited; it can only be **decreased**.


I'm curious on the reasoning behind this design. One counter-example I can think of:
"All jobs should run for a maximum of 30s per run; except this one job which could run for up to a minute." -> this is not supported by design, why?

The implementation would be simpler if the "only decrease" constraint isn't enforced at the lib level.

Good question. I considered this scenario, and you're right that enforcing this constraint adds complexity (needing to prepend the module that verifies it).

The idea was that there should be friction when it comes to increasing the max_job_runtime, because it ideally wouldn't be done. An owner of a large application may tune the max_job_runtime, and another developer may think that it is safe for their job to run without limits, and end up impacting the stability of the job infrastructure if they're able to bypass the limit easily.

By requiring the developer to go through a change like this, it would be clearer that they're doing something exceptional:

-JobIteration.max_job_runtime = 5.minutes class ApplicationIterativeJob < ActiveJob::Base include JobIteration::Iteration + self.max_job_runtime = 5.minutes end + +class ExceptionallyLongRunningIterativeJob < ActiveJob::Base + include JobIteration::Iteration + self.max_job_runtime = 10.minutes +end -class MyLongRunningJob < ApplicationIterativeJob +class MyLongRunningJob < ExceptionallyLongRunningIterativeJob end

To me, it feels like JobIteration.max_job_runtime should be the maximum globally, and feels weird that a single job could violate that invariant. Allowing decreases only means that the invariant continues to hold.

That said, if we have legitimate use cases, then we could certainly change this behaviour:

We could split the API into one method for decreasing the maximum, and another for increasing it, whose name makes it clear that it's exceptional. For example:
JobIteration.max_job_runtime = 5.minutes class LowPriorityIterativeJob < ApplicationIterativeJob decrease_job_iteration_max_job_runtime(to: 3.minutes) # forbids increasing end class LongRunningIterativeJob < ApplicationIterativeJob increase_job_iteration_max_job_runtime!(to: 10.minutes) # forbids decreasing end

We could remove the constraint all together, and go with the out-of-the-box behaviour of class_attribute (overridable inheritance)

pedropb · 2022-07-05T15:27:45Z

lib/job-iteration/iteration.rb

@@ -262,7 +289,7 @@ def output_interrupt_summary
    end

    def job_should_exit?
-      if ::JobIteration.max_job_runtime && start_time && (Time.now.utc - start_time) > ::JobIteration.max_job_runtime
+      if job_iteration_max_job_runtime && start_time && (Time.now.utc - start_time) > job_iteration_max_job_runtime


As a user of the library, it would be helpful if there was an easy way to get notified (as in ActiveSupport::Notifications.instrument) when a job is interrupted due to elapsing the max run time setting. For example, a user could track how many times this is happening per some application-level metric to tune the time out setting accordingly.

We do instrument that a job has been interrupted, although we do not note the reason why.

I think that could be something worth considering, but I would consider it outside the scope of this PR. I've opened #247 to capture the discussion.

sambostock · 2022-07-05T17:07:35Z

why this feature is being added now

I've wanted to push for adopting max_job_runtime in a number of apps owned by my team for some time now, but we already have a number of existing jobs. Being able to adopt the setting gradually would make that less risky, so I figured I'd implement it.

For example, some of our long jobs run on schedules that mean that they rarely end up interrupted by deploys (middle of the night). We had one such job that was exceptionally interrupted once, only to turn out to improperly deserialize its cursor and blow up every time it tried to resume (the motivation for #73 & #81).

guides/best-practices.md

lib/job-iteration.rb

This allows incremental adoption of the setting, without applying the setting globally. Alternatively, it allows applications to set a conservative global setting, and a more aggressive setting per jobs. In order to prevent rogue jobs from causing trouble, the per-job override can only be set to a value less than the inherited value.

simbasdad · 2023-11-09T21:44:06Z

lib/job-iteration/iteration.rb

@@ -262,7 +289,7 @@ def output_interrupt_summary
    end

    def job_should_exit?
-      if ::JobIteration.max_job_runtime && start_time && (Time.now.utc - start_time) > ::JobIteration.max_job_runtime
+      if job_iteration_max_job_runtime && start_time && (Time.now.utc - start_time) > job_iteration_max_job_runtime


Sorry to comment on such an old PR, please let me know if you'd rather I convert this to an issue, or move the conversation somewhere else.

This change actually creates a subtle behaviour change. Previously, we read the global configuration at job runtime. Now, we read the global configuration when JobIteration::Iteration is loaded, subsequent changes are ignored.

Today, we noticed that maintenance tasks were never pausing because that gem were setting a default job runtime in an after_initialze block that was run after this module was included.

I'm not actually sure which option is better. Just wanted to point out the difference in behaviours. In the meantime, I've proposed Shopify/maintenance_tasks#918 to fix maintenance tasks under the assumption that the current behaviour of job-iteration is correct.

Oof, nice catch! I'd consider that a bug - changes to the global value should be picked up by the jobs as they run. I'm not entirely sure what the best fix is, especially in a way that enforces the "individual jobs must not increase this value".

I can try to have a look on Monday, but I think the fix would be to override the method defined by class_attribute to delegate to the top level default, as opposed to setting it to the default value at the time the method was defined.

#436 should fix this.

sambostock requested review from adrianna-chang-shopify, MatthewRBruce and pedropb and removed request for adrianna-chang-shopify, MatthewRBruce and pedropb June 29, 2022 14:56

sambostock marked this pull request as ready for review June 29, 2022 15:02

sambostock requested a review from Mangara June 29, 2022 15:02

sambostock force-pushed the configurable-max-job-runtime branch 2 times, most recently from 7098006 to d5a9a3e Compare June 29, 2022 19:53

Mangara reviewed Jun 29, 2022

View reviewed changes

adrianna-chang-shopify approved these changes Jul 4, 2022

View reviewed changes

lib/job-iteration/iteration.rb Show resolved Hide resolved

test/unit/iteration_test.rb Show resolved Hide resolved

sambostock force-pushed the configurable-max-job-runtime branch from 7ad0c57 to 5e59a1c Compare July 4, 2022 15:14

sambostock requested review from adrianna-chang-shopify and Mangara July 4, 2022 15:14

sambostock commented Jul 4, 2022

View reviewed changes

sambostock force-pushed the configurable-max-job-runtime branch from 5e59a1c to e503b59 Compare July 4, 2022 15:18

sambostock mentioned this pull request Jul 4, 2022

Serialize Cursors #81

Open

2 tasks

adrianna-chang-shopify approved these changes Jul 4, 2022

View reviewed changes

sambostock force-pushed the configurable-max-job-runtime branch from e503b59 to 642cd89 Compare July 4, 2022 21:11

pedropb approved these changes Jul 5, 2022

View reviewed changes

sambostock mentioned this pull request Jul 5, 2022

Should we instrument the reason for interruption? #247

Open

sambostock force-pushed the configurable-max-job-runtime branch from 642cd89 to a84977f Compare July 5, 2022 17:22

Mangara approved these changes Jul 5, 2022

View reviewed changes

guides/best-practices.md Outdated Show resolved Hide resolved

lib/job-iteration.rb Outdated Show resolved Hide resolved

sambostock force-pushed the configurable-max-job-runtime branch from 03c5306 to 8236f63 Compare July 5, 2022 18:49

Test existing global max_job_runtime behavior

318fa3a

sambostock force-pushed the configurable-max-job-runtime branch from 8236f63 to a1ec6f7 Compare July 19, 2022 14:40

sambostock force-pushed the configurable-max-job-runtime branch from a1ec6f7 to 64655bd Compare July 19, 2022 14:43

sambostock merged commit dd93717 into main Jul 22, 2022

sambostock deleted the configurable-max-job-runtime branch July 22, 2022 13:07

shopify-shipit bot temporarily deployed to rubygems July 22, 2022 13:09 Inactive

lavoiesl mentioned this pull request Aug 15, 2023

New release #416

Closed

simbasdad mentioned this pull request Nov 9, 2023

Ensure that maintenance tasks have a default max runtime Shopify/maintenance_tasks#918

Closed

simbasdad reviewed Nov 9, 2023

View reviewed changes

sambostock mentioned this pull request Nov 12, 2023

Lookup global max_job_runtime at runtime #436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable per job class `max_job_runtime` #240

Enable per job class `max_job_runtime` #240

sambostock commented Jun 28, 2022 •

edited

Loading

sambostock commented Jun 29, 2022

sambostock commented Jun 29, 2022

Mangara left a comment

sambostock commented Jun 29, 2022

Mangara commented Jun 29, 2022

adrianna-chang-shopify left a comment

sambostock Jul 4, 2022

adrianna-chang-shopify left a comment

pedropb left a comment

pedropb Jul 5, 2022

sambostock Jul 5, 2022

pedropb Jul 5, 2022

sambostock Jul 5, 2022

sambostock commented Jul 5, 2022

simbasdad Nov 9, 2023

Mangara Nov 11, 2023

sambostock Nov 11, 2023

sambostock Nov 12, 2023

Enable per job class max_job_runtime #240

Enable per job class max_job_runtime #240

Conversation

sambostock commented Jun 28, 2022 • edited Loading

⚠️ Before Merging

sambostock commented Jun 29, 2022

sambostock commented Jun 29, 2022

Mangara left a comment

Choose a reason for hiding this comment

sambostock commented Jun 29, 2022

Mangara commented Jun 29, 2022

adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

pedropb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sambostock commented Jul 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Enable per job class `max_job_runtime` #240

Enable per job class `max_job_runtime` #240

sambostock commented Jun 28, 2022 •

edited

Loading