Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rrule fast forwarding stable #15601

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions awx/main/models/schedules.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,45 @@
UTC_TIMEZONES = {x: tzutc() for x in dateutil.parser.parserinfo().UTCZONE}


SECONDS_IN_WEEK = 7 * 24 * 60 * 60


def fast_forward_date(rrule):
'''
Utility to fast forward an rrule, maintaining consistency in the resulting
occurrences
Fast forwards the rrule to 7 days ago
Returns a datetime object
'''
if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not x in y is a common mistake. Linters would usually require it to be an x not in y instead, as it's clearer what the semantics of the check is. I wouldn't even be sure whether it's (not x) in y or not (x in y) off the top of my head. Let's improve this a bit:

Suggested change
if not rrule._freq in (dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY):
if rrule._freq not in {dateutil.rrule.HOURLY, dateutil.rrule.MINUTELY}:

raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be semantically closer to a ValueError than to a RuntimeError as it is function input validation, not a runtime check. Perhaps, it's worth changing?

Suggested change
raise RuntimeError("Cannot fast forward rrule, frequency must be HOURLY or MINUTELY")
raise ValueError(f"Cannot fast forward rrule, frequency must be HOURLY or MINUTELY, but got {rrule._freq !r}")


interval = rrule._interval if rrule._interval else 1
if rrule._freq == dateutil.rrule.HOURLY:
interval *= 60 * 60
elif rrule._freq == dateutil.rrule.MINUTELY:
interval *= 60

if type(interval) == float and not interval.is_integer():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that isinstance(interval, float) is not used here? Do we want to disallow subclasses?
Also, wouldn't it be more correct to do an identity check, as in type(interval) is float?

Looking at what's checked, I think that it should be possible to use rrule._interval instead and move it to the beginning of the function, having a chance to bail before the computations. This is probably not a huge performance benefit, but it would improve the structure by grouping all the “guard expression”-style checks together.

Additionally, once these checks are close, they could be moved into a helper called smth like _assert_rrule_is_fast-forwardable(), which might improve the structure even further.

raise RuntimeError("Cannot fast forward rule, interval is a fraction of a second")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern here: should this be a ValueError?


seconds_since_dtstart = (now() - rrule._dtstart).total_seconds()

# fast forward to 7 days ago
fast_forward_seconds = seconds_since_dtstart - SECONDS_IN_WEEK

# make sure at least one chunk of interval can fit inside the time period we are trying to fast forward
if interval > fast_forward_seconds:
raise RuntimeError(f"Cannot fast forward rrule {rrule}, interval is greater than the fast forward amount of {fast_forward_seconds} seconds")

# it is important to fast forward by a number that is divisible by
# interval. For example, if interval is 7 hours, we fast forward by 7, 14, 21, etc. hours.
# Otherwise, the occurrences after the fast forward might not match the ones before.
# x // y is integer division, lopping off any remainder, so that we get the outcome we want.
new_start = rrule._dtstart + datetime.timedelta(seconds=(fast_forward_seconds // interval) * interval)
return new_start
Comment on lines +69 to +74
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this code comment, makes me think that this line is hard to follow, and it might not belong on this abstraction layer. I believe that such things can be labeled with descriptive variable names (or function names for that matter), improving the perception.
Wouldn't it be easier to follow if this were to read

return _compute_interval_aligned_datetime(rrule._dtstart, fast_forward_seconds, interval)

or

return rrule._dtstart + _compute_interval_aligned_time_shift(fast_forward_seconds, interval)

?



class ScheduleFilterMethods(object):
def enabled(self, enabled=True):
return self.filter(enabled=enabled)
Expand Down Expand Up @@ -207,23 +246,33 @@
raise ValueError('A valid TZID must be provided (e.g., America/New_York)')

# Fast forward is a way for us to limit the number of events in the rruleset
# If we are fastforwading and we don't have a count limited rule that is minutely or hourley
# If we are fast forwarding and we don't have a count limited rule that is minutely or hourly
# We will modify the start date of the rule to last week to prevent a large number of entries
if fast_forward:
try:
# All rules in a ruleset will have the same dtstart value
# so lets compare the first event to now to see if its > 7 days old
# so lets compare the first event to now to see if its > 30 days old
# we choose > 30 days because if rrule has FREQ=HOURLY and INTERVAL=23, it will take
# at least 23 days before we can fast forward reliably and retain stable occurrences.
# since we are fast forwarding to 7 days ago, we check fo rrrules older than (23+7) days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/rrr/rr/

first_event = x[0]
if (now() - first_event).days > 7:
if (now() - first_event).days > 30:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest moving the arbitrary constant of 30 into a variable? Maybe, a class/module-level constant. This piece of data needs a name.

for rule in x._rrule:
# If any rule has a minutely or hourly rule without a count...
if rule._freq in [dateutil.rrule.MINUTELY, dateutil.rrule.HOURLY] and not rule._count:
# hourly/minutely rrules with far-past DTSTART values
# are *really* slow to precompute
# start *from* one week ago to speed things up drastically
new_start = (now() - datetime.timedelta(days=7)).strftime('%Y%m%d')
# Now we want to repalce the DTSTART:<value>T with the new date (which includes the T)
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start), rrule)
try:
new_start = fast_forward_date(rule)
except RuntimeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's nothing wrong with the runtime, this would also need to align with the above suggestions to represent the case with a different exception:

Suggested change
except RuntimeError as e:
except ValueError as val_err:

logger.warning(e)

Check warning on line 269 in awx/main/models/schedules.py

View check run for this annotation

Codecov / codecov/patch

awx/main/models/schedules.py#L268-L269

Added lines #L268 - L269 were not covered by tests
# fallback to setting dtstart to 7 days ago, but this has the consequence of
# occurrences not matching the old occurrences.
new_start = now() - datetime.timedelta(days=7)

Check warning on line 272 in awx/main/models/schedules.py

View check run for this annotation

Codecov / codecov/patch

awx/main/models/schedules.py#L272

Added line #L272 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @PabloHiro mentioned, this would lead to the exact problem that is being fixed. Can we just reject the rrule here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we should handle the cases where a user has these "invalid" rules already. We can prevent further invalid rules, but this code has a solution to handle the existing ones.

new_start_fmt = new_start.strftime('%Y%m%d')
# Now we want to replace the DTSTART:<value>T with the new date (which includes the T)
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try this

Suggested change
new_rrule = re.sub('(DTSTART[^:]*):[^T]+T', r'\1:{0}T'.format(new_start_fmt), rrule)
new_rrule = re.sub('(?P<dstart>DTSTART[^:]*):[^T]+T', fr'\g<dstart>:{new_start_fmt !s}T', rrule)

return Schedule.rrulestr(new_rrule, fast_forward=False)
except IndexError:
pass
Expand Down
92 changes: 92 additions & 0 deletions awx/main/tests/unit/utils/test_schedule_fast_forward.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import pytest
import datetime

from django.utils.timezone import now

from awx.main.models.schedules import fast_forward_date
from dateutil.rrule import rrule, HOURLY, MINUTELY, MONTHLY


def test_fast_forward_date_all_hours():
'''
Assert that the resulting fast forwarded date is included in the original rrule
occurrence list
'''
for interval in range(1, 24):
dtstart = now() - datetime.timedelta(days=30)
rule = rrule(freq=HOURLY, interval=interval, dtstart=dtstart)
new_datetime = fast_forward_date(rule)

# get occurrences of the rrule
gen = rule.xafter(new_datetime - datetime.timedelta(days=1), count=100)
found_matching_date = False
for occurrence in gen:
if occurrence == new_datetime:
found_matching_date = True
break

assert found_matching_date


@pytest.mark.parametrize(
'freq, interval',
[
(MINUTELY, 15),
(MINUTELY, 120),
(MINUTELY, 60 * 24 * 3),
(HOURLY, 7),
(HOURLY, 24 * 3),
],
Comment on lines +32 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cosmetic changes)

Suggested change
'freq, interval',
[
(MINUTELY, 15),
(MINUTELY, 120),
(MINUTELY, 60 * 24 * 3),
(HOURLY, 7),
(HOURLY, 24 * 3),
],
('freq', 'interval'),
(
pytest.param(MINUTELY, 15, id='every-15-minutes-minutely'),
pytest.param(MINUTELY, 120, id='every-2-hours-minutely'),
pytest.param(MINUTELY, 60 * 24 * 3, id='every-3-days-minutely'),
pytest.param(HOURLY, 7, id='every-7-hours-hourly'),
pytest.param(HOURLY, 24 * 3, id='every-3-days-hourly'),
),

)
def test_fast_forward_date(freq, interval):
'''
Assert that the resulting fast forwarded date is included in the original rrule
occurrence list
'''
dtstart = now() - datetime.timedelta(days=30)
rule = rrule(freq=freq, interval=interval, dtstart=dtstart)
new_datetime = fast_forward_date(rule)

# get occurrences of the rrule
gen = rule.xafter(new_datetime - datetime.timedelta(days=1), count=100)
found_matching_date = False
for occurrence in gen:
if occurrence == new_datetime:
found_matching_date = True
break

assert found_matching_date
Comment on lines +52 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason it's not

Suggested change
found_matching_date = False
for occurrence in gen:
if occurrence == new_datetime:
found_matching_date = True
break
assert found_matching_date
assert new_datetime in gen

?



@pytest.mark.parametrize(
'freq, interval, error',
[
(MINUTELY, 15.5555, "interval is a fraction of a second"),
(MONTHLY, 1, "frequency must be HOURLY or MINUTELY"),
(HOURLY, 24 * 30, "interval is greater than the fast forward amount"),
],
)
def test_error_fast_forward_date(freq, interval, error):
dtstart = now() - datetime.timedelta(days=30)
rule = rrule(freq=freq, interval=interval, dtstart=dtstart)
if error:
with pytest.raises(Exception) as e_info:
fast_forward_date(rule)

assert error in e_info.value.args[0]
Comment on lines +61 to +76
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to avoid convoluted logic in tests. Error is always truthy, so the conditional is not really needed. Additionally, the right way of checking the exceptions raised is to narrow the expectation. pytest.raises() uses (and pytest.warns() or pytest.deprecated() for that matter) should always have a match argument to be as accurate as possible. If there's different expected exceptions, it's possible to bake them into parametrize. But let's start with the following:

Suggested change
@pytest.mark.parametrize(
'freq, interval, error',
[
(MINUTELY, 15.5555, "interval is a fraction of a second"),
(MONTHLY, 1, "frequency must be HOURLY or MINUTELY"),
(HOURLY, 24 * 30, "interval is greater than the fast forward amount"),
],
)
def test_error_fast_forward_date(freq, interval, error):
dtstart = now() - datetime.timedelta(days=30)
rule = rrule(freq=freq, interval=interval, dtstart=dtstart)
if error:
with pytest.raises(Exception) as e_info:
fast_forward_date(rule)
assert error in e_info.value.args[0]
@pytest.mark.parametrize(
('freq', 'interval', 'error'),
(
pytest.param(MINUTELY, 15.5555, r"^interval is a fraction of a second$", id='fraction-of-sec'),
pytest.param(MONTHLY, 1, r"^frequency must be HOURLY or MINUTELY$", id='monthly'),
pytest.param(HOURLY, 24 * 30, r"^interval is greater than the fast forward amount$", id='over-a-month'),
),
)
def test_error_fast_forward_date(freq, interval, error):
dtstart = now() - datetime.timedelta(days=30)
rule = rrule(freq=freq, interval=interval, dtstart=dtstart)
with pytest.raises(ValueError, match=error):
fast_forward_date(rule)

I'm also concerned by the use of now() as it may contribute to the flakiness of the tests. It's typical to freeze time in such cases.



def test_very_old_date():
dtstart = now() - datetime.timedelta(days=365 * 10)
rule = rrule(freq=HOURLY, interval=11, dtstart=dtstart)
new_datetime = fast_forward_date(rule)

# get occurrences of the rrule
gen = rule.xafter(new_datetime - datetime.timedelta(days=1), count=100)
found_matching_date = False
for occurrence in gen:
if occurrence == new_datetime:
found_matching_date = True
break

assert found_matching_date
Loading