Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistent date_range output when using CustomBusinessDay as freq #57456

Closed
2 of 3 tasks
blueharen opened this issue Feb 16, 2024 · 5 comments · Fixed by #59519
Closed
2 of 3 tasks

BUG: Inconsistent date_range output when using CustomBusinessDay as freq #57456

blueharen opened this issue Feb 16, 2024 · 5 comments · Fixed by #59519
Assignees
Labels
Bug Frequency DateOffsets

Comments

@blueharen
Copy link

blueharen commented Feb 16, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

offset = CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu')
start = datetime(2024, 2, 8, 23)
end = datetime(2024, 2, 16, 14)
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-08 23:00:00', '2024-02-11 23:00:00',
#               '2024-02-12 23:00:00', '2024-02-13 23:00:00',
#               '2024-02-14 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

start = datetime(2024, 2, 9, 23)
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00', '2024-02-14 23:00:00',
#               '2024-02-15 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

Issue Description

The last date generated from date_range with start = datetime(2024, 2, 8, 23) does not equal the last date generated from date_range when start = datetime(2024, 2, 9, 23), other arguments unchanged.

When the start date falls on a day within the weekmask in CustomBusinessDay, date_range output does not include 2024-02-15 (Thursday). However, when the start date is on a day not included in the weekmask (Friday or Saturday), date_range does include 2024-02-15.

Expected Behavior

With a freq of CustomBusinessDay(weekmask='Sun Mon Tue Wed Thu') and an end of 2024-02-16, I'd expect date_range to always include 2024-02-15, regardless of start.

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.10.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.5.3
numpy : 1.24.3
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 58.1.0
pip : 24.0
Cython : 0.29.32
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.0
snappy : None
sqlalchemy : 2.0.12
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2023.3

@blueharen blueharen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 16, 2024
@blueharen blueharen changed the title BUG: Inconsistent date_range output when using CustomBusinessDay as offset BUG: Inconsistent date_range output when using CustomBusinessDay as freq Feb 16, 2024
@santhoshbethi
Copy link
Contributor

I am guessing the Python range won't return the last value. Does it relate to this issue?

@NicoACloutier
Copy link

I can replicate the issue (installed versions given at the bottom). I tried messing around with the inputs a bit, and found some potentially interesting things.

For one, this does not appear to be an issue when there is no weekday mask. The following code works as you expect:

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

start = datetime(2024, 2, 8, 23)
end = datetime(2024, 2, 16, 14)
d_range = pd.date_range(start, end)
print(d_range)
#DatetimeIndex(['2024-02-08 23:00:00', '2024-02-09 23:00:00',
#               '2024-02-10 23:00:00', '2024-02-11 23:00:00',
#               '2024-02-12 23:00:00', '2024-02-13 23:00:00',
#               '2024-02-14 23:00:00', '2024-02-15 23:00:00'],
#              dtype='datetime64[ns]', freq='D')

start = datetime(2024, 2, 9, 23)
d_range = pd.date_range(start, end)
print(d_range)
#DatetimeIndex(['2024-02-09 23:00:00', '2024-02-10 23:00:00',
#               '2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00', '2024-02-14 23:00:00',
#               '2024-02-15 23:00:00'],
#              dtype='datetime64[ns]', freq='D')

I thought it might have something to do with the fact that the end day given falls on a masked weekday, so I tried the following code (just shifting the end date back by one, but also masking Thursdays), but it also worked as expected:

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

start = datetime(2024, 2, 8, 23)
end = datetime(2024, 2, 15, 14)
offset = CustomBusinessDay(weekmask='Sun Mon Tue Wed')
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00', '2024-02-14 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

start = datetime(2024, 2, 9, 23)
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00', '2024-02-14 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

After that, I thought maybe it had to do with the fact that both the beginning and ending day fell on masked days, so I tried shifting the starting days back by one:

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

start = datetime(2024, 2, 7, 23)
end = datetime(2024, 2, 15, 14)
offset = CustomBusinessDay(weekmask='Sun Mon Tue Wed')
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-07 23:00:00', '2024-02-11 23:00:00',
#               '2024-02-12 23:00:00', '2024-02-13 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

start = datetime(2024, 2, 8, 23)
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00', '2024-02-14 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

So it seems like the problem does exist in this case, which was also the case of the original example given. After that, I tried to change the end date to not be on a day where there was a mask, so I kept everything the same, but shifted the end back by a day. It seems to work as expected in that case:

import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

start = datetime(2024, 2, 7, 23)
end = datetime(2024, 2, 14, 14)
offset = CustomBusinessDay(weekmask='Sun Mon Tue Wed')
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-07 23:00:00', '2024-02-11 23:00:00',
#               '2024-02-12 23:00:00', '2024-02-13 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

start = datetime(2024, 2, 8, 23)
d_range = pd.date_range(start, end, freq=offset)
print(d_range)
#DatetimeIndex(['2024-02-11 23:00:00', '2024-02-12 23:00:00',
#               '2024-02-13 23:00:00'],
#              dtype='datetime64[ns]', freq='C')

So, overall, seems like it takes effect when the start and end days fall on masked days of the week.

Version details INSTALLED VERSIONS ------------------ commit : a671b5a python : 3.10.9.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 2.1.4
numpy : 1.26.4
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.0
Cython : 0.29.28
pytest : 7.4.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader : None
bs4 : 4.11.1
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2022.7.1
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
sqlalchemy : None
tables : None
tabulate : None
xarray : 2023.2.0
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : 2.3.0
pyqt5 : None

@rhshadrach rhshadrach added Frequency DateOffsets and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2024
@sofiasimass
Copy link

take

@sofiasimass sofiasimass removed their assignment Mar 31, 2024
@matsidzi
Copy link
Contributor

matsidzi commented Aug 9, 2024

take

@matsidzi
Copy link
Contributor

This bug (#57456) is no longer reproduced on the main branch of pandas, because it is fixed after PR #56831.
Added a test for the discussed case in PR #59519.

mroeschke added a commit that referenced this issue Aug 15, 2024
TST: Added test for date_range for bug #57456

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants