Make rrule fast forwarding stable #15601

fosterseth · 2024-10-24T19:04:58Z

SUMMARY

By stable, we mean future occurrences of the rrule should be the same before and after the fast forward operation.

The problem before was that we were fast forwarding to exactly 7 days ago. For some rrules, this does not retain the old occurrences (for example, freq=HOURLY and INTERVAL=23). Thus, jobs would launch at unexpected times any time the schedule would fast forward.

This change makes sure we fast forward in increments of the rrule INTERVAL (converted to seconds), thus the new dtstart should be in the occurrence list of the old rrule.

DETAIL

Fast forward won't work for really large intervals. For example if you specify HOURLY and INTERVAL=1200 (that is 50 days worth of time), then we can't fast forward because one CHUNK of the interval doesn't fit in the period of time we are trying to fast forward (window is 30 days). In this case, we'll revert back to the old style of just updating dtstart to 7 days ago. We log a warning, so hopefully the user will update their rrule to be frequency DAYS instead of HOURS.

example log:

2024-10-24 20:01:55,628 WARNING  [-] awx.main.models.schedule Cannot fast forward rrule DTSTART:20190517T000000
RRULE:FREQ=HOURLY;INTERVAL=700000, interval is greater than the fast forward amount

PERFORMANCE

This change doesn't seem to hurt performance much. Since our fast forward window is larger (30 days vs 7), there will be more rrule objects generated, so we do expect it to cost more time. I attempted to capture it empirically.

create a bunch of old schedules

for i in range(2000):
    Schedule.objects.create(name=f's{i}', rrule='DTSTART;TZID=America/New_York:20190517T000000 RRULE:FREQ=HOURLY;INTERVAL=7', unified_job_template_id=7)

runit will update_computed_fields for each schedule

def runit():
    for sch in Schedule.objects.all():
        sch.update_computed_fields()

benchmark it

cProfile.run("runit()", "bench.txt")

before:

after:

So yeah, a slight slowdown, but I think it is worth it for the more accurate generation of occurrences.

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

awx/main/models/schedules.py

PabloHiro · 2024-10-28T09:13:15Z

Fast forward won't work for really large intervals. For example if you specify HOURLY and INTERVAL=1200 (that is 50 days worth of time), then we can't fast forward because one CHUNK of the interval doesn't fit in the period of time we are trying to fast forward (window is 30 days). In this case, we'll revert back to the old style of just updating dtstart to 7 days ago. We log a warning, so hopefully the user will update their rrule to be frequency DAYS instead of HOURS.

Can this be an HTTP 400 when a Schedule is created with an "invalid" rrule? Basically preventing this type of rrules to be created in the first place.

Also, I would specify the value of the fast forward amount in the logs, so the user does not need to guess the value by trial and error for existing "invalid" rrules

pb82 · 2024-10-28T09:59:18Z

Ok, I think I understood what's going on:

To avoid computing a large number of events for an rrule with a dtstart way back in the past, we only compute from one week ago.
To not mess up the interval, we can't just subtract 7 days, we have to go back an amount of time that is dividable by the interval (that's what (fast_forward_seconds // interval) * interval does).

Did I get that right?

pb82 · 2024-10-28T10:01:32Z

awx/main/models/schedules.py

+                                logger.warning(e)
+                                # fallback to setting dtstart to 7 days ago, but this has the consequence of
+                                # occurrences not matching the old occurrences.
+                                new_start = now() - datetime.timedelta(days=7)


as @PabloHiro mentioned, this would lead to the exact problem that is being fixed. Can we just reject the rrule here?

I wonder how we should handle the cases where a user has these "invalid" rules already. We can prevent further invalid rules, but this code has a solution to handle the existing ones.

By stable, we mean future occurrences of the rrule should be the same before and after the fast forward operation. The problem before was that we were fast forwarding to 7 days ago. For some rrules, this does not retain the old occurrences. Thus, jobs would launch at unexpected times. This change makes sure we fast forward in increments of the rrule INTERVAL (converted to seconds), thus the new dtstart should be in the occurrence list of the old rrule. Signed-off-by: Seth Foster <[email protected]>

sonarcloud · 2024-10-29T20:28:34Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

pb82 · 2024-11-12T13:06:19Z

@fosterseth can this be merged? The failures seem to be about code coverage.

webknjaz

got a few observations below

webknjaz · 2024-11-12T16:04:42Z

awx/main/models/schedules.py

+    Returns a datetime object
+    '''
+    if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):
+        raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY")


This seems to be semantically closer to a ValueError than to a RuntimeError as it is function input validation, not a runtime check. Perhaps, it's worth changing?

Suggested change

raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY")

raise ValueError(f"Cannot fast forward rrule, frequency must be HOURLY or MINUTELY, but got {rrule._freq !r}")

webknjaz · 2024-11-12T16:09:28Z

awx/main/models/schedules.py

+    Fast forwards the rrule to 7 days ago
+    Returns a datetime object
+    '''
+    if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):


not x in y is a common mistake. Linters would usually require it to be an x not in y instead, as it's clearer what the semantics of the check is. I wouldn't even be sure whether it's (not x) in y or not (x in y) off the top of my head. Let's improve this a bit:

Suggested change

if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):

if rrule._freq not in {dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY}:

webknjaz · 2024-11-12T16:10:59Z

awx/main/models/schedules.py

+        interval *= 60
+
+    if type(interval) == float and not interval.is_integer():
+        raise RuntimeError("Cannot fast forward rule, interval is a fraction of a second")


Same concern here: should this be a ValueError?

webknjaz · 2024-11-12T16:23:33Z

awx/main/models/schedules.py

+    elif rrule._freq == dateutil.rrule.MINUTELY:
+        interval *= 60
+
+    if type(interval) == float and not interval.is_integer():


Is it intentional that isinstance(interval, float) is not used here? Do we want to disallow subclasses?
Also, wouldn't it be more correct to do an identity check, as in type(interval) is float?

Looking at what's checked, I think that it should be possible to use rrule._interval instead and move it to the beginning of the function, having a chance to bail before the computations. This is probably not a huge performance benefit, but it would improve the structure by grouping all the “guard expression”-style checks together.

Additionally, once these checks are close, they could be moved into a helper called smth like _assert_rrule_is_fast-forwardable(), which might improve the structure even further.

webknjaz · 2024-11-12T16:31:13Z

awx/main/models/schedules.py

+    # it is important to fast forward by a number that is divisible by
+    # interval. For example, if interval is 7 hours, we fast forward by 7, 14, 21, etc. hours.
+    # Otherwise, the occurrences after the fast forward might not match the ones before.
+    # x // y is integer division, lopping off any remainder, so that we get the outcome we want.
+    new_start = rrule._dtstart + datetime.timedelta(seconds=(fast_forward_seconds // interval) * interval)
+    return new_start


Seeing this code comment, makes me think that this line is hard to follow, and it might not belong on this abstraction layer. I believe that such things can be labeled with descriptive variable names (or function names for that matter), improving the perception.
Wouldn't it be easier to follow if this were to read

return _compute_interval_aligned_datetime(rrule._dtstart, fast_forward_seconds, interval)

or

return rrule._dtstart + _compute_interval_aligned_time_shift(fast_forward_seconds, interval)

?

webknjaz · 2024-11-12T16:35:43Z

awx/main/models/schedules.py

-                            new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start), rrule)
+                            try:
+                                new_start = fast_forward_date(rule)
+                            except RuntimeError as e:


Since there's nothing wrong with the runtime, this would also need to align with the above suggestions to represent the case with a different exception:

Suggested change

except RuntimeError as e:

except ValueError as val_err:

webknjaz · 2024-11-12T16:43:17Z

awx/main/models/schedules.py

+                                new_start = now() - datetime.timedelta(days=7)
+                            new_start_fmt = new_start.strftime('%Y%m%d')
+                            # Now we want to replace the DTSTART:<value>T with the new date (which includes the T)
+                            new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule)


try this

Suggested change

new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule)

new_rrule = re.sub('(?P<dstart>DTSTART[^:]*):[^T]+T', fr'\g<dstart>:{new_start_fmt !s}T', rrule)

webknjaz · 2024-11-12T16:45:25Z

awx/main/tests/unit/utils/test_schedule_fast_forward.py

+    found_matching_date = False
+    for occurrence in gen:
+        if occurrence == new_datetime:
+            found_matching_date = True
+            break
+
+    assert found_matching_date


Any reason it's not

Suggested change

found_matching_date = False

for occurrence in gen:

if occurrence == new_datetime:

found_matching_date = True

break

assert found_matching_date

assert new_datetime in gen

?

webknjaz · 2024-11-12T16:49:17Z

awx/main/tests/unit/utils/test_schedule_fast_forward.py

+    'freq, interval',
+    [
+        (MINUTELY, 15),
+        (MINUTELY, 120),
+        (MINUTELY, 60 * 24 * 3),
+        (HOURLY, 7),
+        (HOURLY, 24 * 3),
+    ],


(cosmetic changes)

Suggested change

'freq, interval',

[

(MINUTELY, 15),

(MINUTELY, 120),

(MINUTELY, 60 * 24 * 3),

(HOURLY, 7),

(HOURLY, 24 * 3),

],

('freq', 'interval'),

(

pytest.param(MINUTELY, 15, id='every-15-minutes-minutely'),

pytest.param(MINUTELY, 120, id='every-2-hours-minutely'),

pytest.param(MINUTELY, 60 * 24 * 3, id='every-3-days-minutely'),

pytest.param(HOURLY, 7, id='every-7-hours-hourly'),

pytest.param(HOURLY, 24 * 3, id='every-3-days-hourly'),

),

webknjaz · 2024-11-12T16:59:08Z

awx/main/tests/unit/utils/test_schedule_fast_forward.py

+@pytest.mark.parametrize(
+    'freq, interval, error',
+    [
+        (MINUTELY, 15.5555, "interval is a fraction of a second"),
+        (MONTHLY, 1, "frequency must be HOURLY or MINUTELY"),
+        (HOURLY, 24 * 30, "interval is greater than the fast forward amount"),
+    ],
+)
+def test_error_fast_forward_date(freq, interval, error):
+    dtstart = now() - datetime.timedelta(days=30)
+    rule = rrule(freq=freq, interval=interval, dtstart=dtstart)
+    if error:
+        with pytest.raises(Exception) as e_info:
+            fast_forward_date(rule)
+
+        assert error in e_info.value.args[0]


It's best to avoid convoluted logic in tests. Error is always truthy, so the conditional is not really needed. Additionally, the right way of checking the exceptions raised is to narrow the expectation. pytest.raises() uses (and pytest.warns() or pytest.deprecated() for that matter) should always have a match argument to be as accurate as possible. If there's different expected exceptions, it's possible to bake them into parametrize. But let's start with the following:

Suggested change

@pytest.mark.parametrize(

'freq, interval, error',

[

(MINUTELY, 15.5555, "interval is a fraction of a second"),

(MONTHLY, 1, "frequency must be HOURLY or MINUTELY"),

(HOURLY, 24 * 30, "interval is greater than the fast forward amount"),

],

)

def test_error_fast_forward_date(freq, interval, error):

dtstart = now() - datetime.timedelta(days=30)

rule = rrule(freq=freq, interval=interval, dtstart=dtstart)

if error:

with pytest.raises(Exception) as e_info:

fast_forward_date(rule)

assert error in e_info.value.args[0]

@pytest.mark.parametrize(

('freq', 'interval', 'error'),

(

pytest.param(MINUTELY, 15.5555, r"^interval is a fraction of a second$", id='fraction-of-sec'),

pytest.param(MONTHLY, 1, r"^frequency must be HOURLY or MINUTELY$", id='monthly'),

pytest.param(HOURLY, 24 * 30, r"^interval is greater than the fast forward amount$", id='over-a-month'),

),

)

def test_error_fast_forward_date(freq, interval, error):

dtstart = now() - datetime.timedelta(days=30)

rule = rrule(freq=freq, interval=interval, dtstart=dtstart)

with pytest.raises(ValueError, match=error):

fast_forward_date(rule)

I'm also concerned by the use of now() as it may contribute to the flakiness of the tests. It's typical to freeze time in such cases.

github-actions bot added the component:api label Oct 24, 2024

fosterseth force-pushed the fix_rrule_fast_forward branch from aef1c4a to 4057916 Compare October 24, 2024 19:05

fosterseth requested a review from AlanCoding October 24, 2024 19:06

fosterseth force-pushed the fix_rrule_fast_forward branch 2 times, most recently from f2584db to ea714de Compare October 24, 2024 19:29

AlanCoding reviewed Oct 25, 2024

View reviewed changes

awx/main/models/schedules.py Outdated Show resolved Hide resolved

fosterseth force-pushed the fix_rrule_fast_forward branch 2 times, most recently from fd620a3 to 6f169cc Compare October 25, 2024 16:59

pb82 reviewed Oct 28, 2024

View reviewed changes

djyasin approved these changes Oct 29, 2024

View reviewed changes

fosterseth force-pushed the fix_rrule_fast_forward branch from 6f169cc to d2e1cfa Compare October 29, 2024 20:26

AlanCoding approved these changes Oct 30, 2024

View reviewed changes

webknjaz reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make rrule fast forwarding stable #15601

Make rrule fast forwarding stable #15601

fosterseth commented Oct 24, 2024 •

edited

Loading

PabloHiro commented Oct 28, 2024

pb82 commented Oct 28, 2024

pb82 Oct 28, 2024

PabloHiro Oct 28, 2024

sonarcloud bot commented Oct 29, 2024

pb82 commented Nov 12, 2024

webknjaz left a comment

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

webknjaz Nov 12, 2024

	raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY")
	raise ValueError(f"Cannot fast forward rrule, frequency must be HOURLY or MINUTELY, but got {rrule._freq !r}")

	if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):
	if rrule._freq not in {dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY}:

	new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule)
	new_rrule = re.sub('(?P<dstart>DTSTART[^:]*):[^T]+T', fr'\g<dstart>:{new_start_fmt !s}T', rrule)

Make rrule fast forwarding stable #15601

Are you sure you want to change the base?

Make rrule fast forwarding stable #15601

Conversation

fosterseth commented Oct 24, 2024 • edited Loading

SUMMARY

DETAIL

PERFORMANCE

ISSUE TYPE

COMPONENT NAME

PabloHiro commented Oct 28, 2024

pb82 commented Oct 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Oct 29, 2024

Quality Gate passed

pb82 commented Nov 12, 2024

webknjaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fosterseth commented Oct 24, 2024 •

edited

Loading