-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rrule fast forwarding stable #15601
base: devel
Are you sure you want to change the base?
Conversation
aef1c4a
to
4057916
Compare
f2584db
to
ea714de
Compare
fd620a3
to
6f169cc
Compare
Can this be an HTTP 400 when a Schedule is created with an "invalid" rrule? Basically preventing this type of rrules to be created in the first place. Also, I would specify the value of the |
Ok, I think I understood what's going on:
Did I get that right? |
logger.warning(e) | ||
# fallback to setting dtstart to 7 days ago, but this has the consequence of | ||
# occurrences not matching the old occurrences. | ||
new_start = now() - datetime.timedelta(days=7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as @PabloHiro mentioned, this would lead to the exact problem that is being fixed. Can we just reject the rrule here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how we should handle the cases where a user has these "invalid" rules already. We can prevent further invalid rules, but this code has a solution to handle the existing ones.
By stable, we mean future occurrences of the rrule should be the same before and after the fast forward operation. The problem before was that we were fast forwarding to 7 days ago. For some rrules, this does not retain the old occurrences. Thus, jobs would launch at unexpected times. This change makes sure we fast forward in increments of the rrule INTERVAL (converted to seconds), thus the new dtstart should be in the occurrence list of the old rrule. Signed-off-by: Seth Foster <[email protected]>
6f169cc
to
d2e1cfa
Compare
Quality Gate passedIssues Measures |
@fosterseth can this be merged? The failures seem to be about code coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got a few observations below
Returns a datetime object | ||
''' | ||
if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY): | ||
raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be semantically closer to a ValueError
than to a RuntimeError
as it is function input validation, not a runtime check. Perhaps, it's worth changing?
raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY") | |
raise ValueError(f"Cannot fast forward rrule, frequency must be HOURLY or MINUTELY, but got {rrule._freq !r}") |
Fast forwards the rrule to 7 days ago | ||
Returns a datetime object | ||
''' | ||
if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not x in y
is a common mistake. Linters would usually require it to be an x not in y
instead, as it's clearer what the semantics of the check is. I wouldn't even be sure whether it's (not x) in y
or not (x in y)
off the top of my head. Let's improve this a bit:
if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY): | |
if rrule._freq not in {dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY}: |
interval *= 60 | ||
|
||
if type(interval) == float and not interval.is_integer(): | ||
raise RuntimeError("Cannot fast forward rule, interval is a fraction of a second") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here: should this be a ValueError
?
elif rrule._freq == dateutil.rrule.MINUTELY: | ||
interval *= 60 | ||
|
||
if type(interval) == float and not interval.is_integer(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that isinstance(interval, float)
is not used here? Do we want to disallow subclasses?
Also, wouldn't it be more correct to do an identity check, as in type(interval) is float
?
Looking at what's checked, I think that it should be possible to use rrule._interval
instead and move it to the beginning of the function, having a chance to bail before the computations. This is probably not a huge performance benefit, but it would improve the structure by grouping all the “guard expression”-style checks together.
Additionally, once these checks are close, they could be moved into a helper called smth like _assert_rrule_is_fast-forwardable()
, which might improve the structure even further.
# it is important to fast forward by a number that is divisible by | ||
# interval. For example, if interval is 7 hours, we fast forward by 7, 14, 21, etc. hours. | ||
# Otherwise, the occurrences after the fast forward might not match the ones before. | ||
# x // y is integer division, lopping off any remainder, so that we get the outcome we want. | ||
new_start = rrule._dtstart + datetime.timedelta(seconds=(fast_forward_seconds // interval) * interval) | ||
return new_start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this code comment, makes me think that this line is hard to follow, and it might not belong on this abstraction layer. I believe that such things can be labeled with descriptive variable names (or function names for that matter), improving the perception.
Wouldn't it be easier to follow if this were to read
return _compute_interval_aligned_datetime(rrule._dtstart, fast_forward_seconds, interval)
or
return rrule._dtstart + _compute_interval_aligned_time_shift(fast_forward_seconds, interval)
?
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start), rrule) | ||
try: | ||
new_start = fast_forward_date(rule) | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's nothing wrong with the runtime, this would also need to align with the above suggestions to represent the case with a different exception:
except RuntimeError as e: | |
except ValueError as val_err: |
new_start = now() - datetime.timedelta(days=7) | ||
new_start_fmt = new_start.strftime('%Y%m%d') | ||
# Now we want to replace the DTSTART:<value>T with the new date (which includes the T) | ||
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try this
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule) | |
new_rrule = re.sub('(?P<dstart>DTSTART[^:]*):[^T]+T', fr'\g<dstart>:{new_start_fmt !s}T', rrule) |
found_matching_date = False | ||
for occurrence in gen: | ||
if occurrence == new_datetime: | ||
found_matching_date = True | ||
break | ||
|
||
assert found_matching_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason it's not
found_matching_date = False | |
for occurrence in gen: | |
if occurrence == new_datetime: | |
found_matching_date = True | |
break | |
assert found_matching_date | |
assert new_datetime in gen |
?
'freq, interval', | ||
[ | ||
(MINUTELY, 15), | ||
(MINUTELY, 120), | ||
(MINUTELY, 60 * 24 * 3), | ||
(HOURLY, 7), | ||
(HOURLY, 24 * 3), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(cosmetic changes)
'freq, interval', | |
[ | |
(MINUTELY, 15), | |
(MINUTELY, 120), | |
(MINUTELY, 60 * 24 * 3), | |
(HOURLY, 7), | |
(HOURLY, 24 * 3), | |
], | |
('freq', 'interval'), | |
( | |
pytest.param(MINUTELY, 15, id='every-15-minutes-minutely'), | |
pytest.param(MINUTELY, 120, id='every-2-hours-minutely'), | |
pytest.param(MINUTELY, 60 * 24 * 3, id='every-3-days-minutely'), | |
pytest.param(HOURLY, 7, id='every-7-hours-hourly'), | |
pytest.param(HOURLY, 24 * 3, id='every-3-days-hourly'), | |
), |
@pytest.mark.parametrize( | ||
'freq, interval, error', | ||
[ | ||
(MINUTELY, 15.5555, "interval is a fraction of a second"), | ||
(MONTHLY, 1, "frequency must be HOURLY or MINUTELY"), | ||
(HOURLY, 24 * 30, "interval is greater than the fast forward amount"), | ||
], | ||
) | ||
def test_error_fast_forward_date(freq, interval, error): | ||
dtstart = now() - datetime.timedelta(days=30) | ||
rule = rrule(freq=freq, interval=interval, dtstart=dtstart) | ||
if error: | ||
with pytest.raises(Exception) as e_info: | ||
fast_forward_date(rule) | ||
|
||
assert error in e_info.value.args[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to avoid convoluted logic in tests. Error is always truthy, so the conditional is not really needed. Additionally, the right way of checking the exceptions raised is to narrow the expectation. pytest.raises()
uses (and pytest.warns()
or pytest.deprecated()
for that matter) should always have a match
argument to be as accurate as possible. If there's different expected exceptions, it's possible to bake them into parametrize. But let's start with the following:
@pytest.mark.parametrize( | |
'freq, interval, error', | |
[ | |
(MINUTELY, 15.5555, "interval is a fraction of a second"), | |
(MONTHLY, 1, "frequency must be HOURLY or MINUTELY"), | |
(HOURLY, 24 * 30, "interval is greater than the fast forward amount"), | |
], | |
) | |
def test_error_fast_forward_date(freq, interval, error): | |
dtstart = now() - datetime.timedelta(days=30) | |
rule = rrule(freq=freq, interval=interval, dtstart=dtstart) | |
if error: | |
with pytest.raises(Exception) as e_info: | |
fast_forward_date(rule) | |
assert error in e_info.value.args[0] | |
@pytest.mark.parametrize( | |
('freq', 'interval', 'error'), | |
( | |
pytest.param(MINUTELY, 15.5555, r"^interval is a fraction of a second$", id='fraction-of-sec'), | |
pytest.param(MONTHLY, 1, r"^frequency must be HOURLY or MINUTELY$", id='monthly'), | |
pytest.param(HOURLY, 24 * 30, r"^interval is greater than the fast forward amount$", id='over-a-month'), | |
), | |
) | |
def test_error_fast_forward_date(freq, interval, error): | |
dtstart = now() - datetime.timedelta(days=30) | |
rule = rrule(freq=freq, interval=interval, dtstart=dtstart) | |
with pytest.raises(ValueError, match=error): | |
fast_forward_date(rule) |
I'm also concerned by the use of now()
as it may contribute to the flakiness of the tests. It's typical to freeze time in such cases.
SUMMARY
By stable, we mean future occurrences of the rrule should be the same before and after the fast forward operation.
The problem before was that we were fast forwarding to exactly 7 days ago. For some rrules, this does not retain the old occurrences (for example, freq=HOURLY and INTERVAL=23). Thus, jobs would launch at unexpected times any time the schedule would fast forward.
This change makes sure we fast forward in increments of the rrule INTERVAL (converted to seconds), thus the new dtstart should be in the occurrence list of the old rrule.
DETAIL
Fast forward won't work for really large intervals. For example if you specify HOURLY and INTERVAL=1200 (that is 50 days worth of time), then we can't fast forward because one CHUNK of the interval doesn't fit in the period of time we are trying to fast forward (window is 30 days). In this case, we'll revert back to the old style of just updating dtstart to 7 days ago. We log a warning, so hopefully the user will update their rrule to be frequency DAYS instead of HOURS.
example log:
PERFORMANCE
This change doesn't seem to hurt performance much. Since our fast forward window is larger (30 days vs 7), there will be more rrule objects generated, so we do expect it to cost more time. I attempted to capture it empirically.
create a bunch of old schedules
runit will update_computed_fields for each schedule
benchmark it
before:
after:
So yeah, a slight slowdown, but I think it is worth it for the more accurate generation of occurrences.
ISSUE TYPE
COMPONENT NAME