Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHEP 3: PyHC Python & Upstream Package Support Policy #29

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
374 changes: 374 additions & 0 deletions pheps/phep-0003.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,374 @@
```
PHEP: 3
Title: PyHC Python & Upstream Package Support Policy
Author: Shawn Polson <[email protected]> <https://orcid.org/0000-0003-0619-5745>
Discussions-To: https://github.com/heliophysicsPy/standards/pull/29
Revision: 1
Status: Draft
Type: Standards Track
Content-Type: text/markdown; charset=UTF-8; variant=CommonMark
Created: 06-Jun-2024
Post-History: 06-Jun-2024, 11-Jun-2024, 02-Jul-2024, 17-Jul-2024, 23-Jul-2024
```

# Abstract
<a name="abstract"></a>
This PHEP recommends that all projects across the PyHC ecosystem adopt a common time-based policy for support of dependencies, inspired by [SPEC 0](https://scientific-python.org/specs/spec-0000/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "recommend" the right word? This gets into the question of applicability of PHEPs which I think we're still feeling out as a community, but I'd suggest stronger wording:
"This PHEP establishes a common time-based policy for support..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something Julie and I should discuss but I suspect we'll end up leaving the "recommend" wording. This PHEP was originally written as more of a "must" but the first major pushback I got in the earlier comments were people requesting I soften the policy; hence why We decided this policy has to be a "should" not a "must." is the second bullet under the Resolved questions and comments section of this PR's description. It also more closely follows SPEC 0's wording which uses "recommend."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that even if I started this first sentence with "This PHEP establishes a common time-based policy for support...", the word "recommends" occurs immediately in the next line. So I'd have to change that wording and all subsequently-related statements to match, which again goes against what we'd previously decided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a hill I intend to die on. But maybe worth putting explicitly in the rejected ideas? E.g. "it was considered making this a requirement rather than a recommendation but the community argued against because xyz"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Specifically, for Python versions and the upstream Scientific Python packages covered by SPEC 0, it recommends that projects:
1. Support Python versions for at least **36 months** (3 years) after their initial release.
2. Support upstream Scientific Python packages for at least **24 months** (2 years) after their initial release.
3. Adopt support for new versions of these dependencies within **6 months** of their release.

The upstream Scientific Python packages are: `numpy, scipy, matplotlib, pandas, scikit-image, networkx, scikit-learn, xarray, ipython, zarr`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to list these, or simply reference SPEC0 and leave it at that? Implicitly then as SPEC0 updates the goal of this policy would update.

I also do like wording like "other dependencies which follow similar versioning schemes should be similarly supported."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's merit in being clear about which packages this policy applies to by explicitly listing them. And they do say that their list of core projects will not change rapidly. But I do like the idea of this list implicitly updating if the core Scientific Python packages ever do change, so here's what I'll do:

  1. Say "upstream core Scientific Python packages"
  2. In the Abstract, change it to: At the time of writing, the upstream [core Scientific Python packages](https://scientific-python.org/specs/core-projects/) are: numpy, scipy, matplotlib, pandas, scikit-image, networkx, scikit-learn, xarray, ipython, zarr.
  3. In the Specification, clarify that by saying: At the time of writing, the upstream [core Scientific Python packages](https://scientific-python.org/specs/core-projects/) are: numpy, scipy, matplotlib, pandas, scikit-image, networkx, scikit-learn, xarray, ipython, zarr. If their core packages are updated, this policy applies to the updated list instead.

Additionally, while your wording about other dependencies sounds good, I don't want to be vague and introduce uncertainty about which packages this policy applies to. We're simply adopting SPEC 0 here, and SPEC 0 only applies to their core packages, so our policy should do the same.


This policy will replace the current standard [#11](https://github.com/heliophysicsPy/standards/blob/main/standards.md#standards) which simply mandates Python 3 support.
sapols marked this conversation as resolved.
Show resolved Hide resolved

# Motivation
<a name="motivation"></a>
The current PyHC standard [#11](https://github.com/heliophysicsPy/standards/blob/main/standards.md#standards), which mandates compatibility with Python 3, is outdated.
Python 3 support is virtually universal now, so it would be more beneficial to replace this standard with a policy for how to support new minor Python versions and key upstream dependencies.
[SPEC 0](https://scientific-python.org/specs/spec-0000/) provides a structured support timeline that balances stability and progress, essential for software in the heliophysics community.
Adopting a similar policy ensures consistency and predictability in support timelines.
Additionally, limiting the scope of supported versions is an effective way for packages to limit maintenance burden while promoting interoperability.

# Rationale
<a name="rationale"></a>
Following [SPEC 0](https://scientific-python.org/specs/spec-0000/)'s 24/36-month support timeline keeps PyHC in better sync with the broader Scientific Python community, maintaining compatibility with newer Python features and key upstream dependencies, while providing adequate time for package maintainers to adapt.
Allowing 6 months to adopt new versions ensures packages stay current with development cycles while providing a reasonable timeframe for testing and integration.

# Specification
<a name="specification"></a>
This PHEP refers to feature releases of dependencies (e.g., Python 3.12.0, NumPy 2.0.0; not Python 3.12.1, NumPy 2.0.1).

This PHEP adopts Scientific Python's [SPEC 0](https://scientific-python.org/specs/spec-0000/) and specifies that all PyHC packages should:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "specifies that all PyHC packages should" I'd say "specifies that packages must". I.e. a package that is compliant with this PHEP must support that much. The question of the consequences of noncompliance feels more appropriately dealt with elsewhere (e.g. PHEP4 which I haven't looked at yet :) )

I also think involvement in PyHC is sort of implied in the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my reply above about how we previously decided this policy has to be a "should" not a "must." With regard to PHEP 4, rather than thinking about it as "consequences of noncompliance" I'm thinking about it like "if you follow PHEP 3's recommendation then you are compliant for PHEP 4's sake."

Point taken about PyHC involvement being implied, but saying "PyHC" only adds one word and I like the clarity :)

1. Support Python versions for at least **36 months** (3 years) after their initial release.
2. Support upstream Scientific Python packages for at least **24 months** (2 years) after their initial release.
3. Adopt support for new versions of these dependencies within **6 months** of their release.

The upstream Scientific Python packages are: `numpy, scipy, matplotlib, pandas, scikit-image, networkx, scikit-learn, xarray, ipython, zarr`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this list of packages?

I know that they are the SP core packages, but why is that relevant to PyHC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply because they're the packages controlled by SPEC 0. I wouldn't call it a comprehensive list of upstream packages PyHC cares about, but it's many of them and a great start. I was avoiding extending beyond the bounds of SPEC 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for dropping the list (see comment above).


Since new minor Python versions are released annually every October ([PEP 602](https://peps.python.org/pep-0602/)), this effectively means that PyHC packages should be supporting about three minor Python versions at any given time.
Upstream packages have more varied release schedules, but several recent versions should typically be supported concurrently.
Providing ongoing support for older versions beyond the specified support periods is optional.

![Dependency Support Window](phep-0003/dependency-support-window.svg)
sapols marked this conversation as resolved.
Show resolved Hide resolved

PyHC packages should clearly document their dependency version policy (e.g., like [PlasmaPy](https://docs.plasmapy.org/en/stable/contributing/coding_guide.html#python-and-dependency-version-support) and [SpacePy](https://spacepy.github.io/dep_versions.html)) and be tested against the minimum and maximum supported versions.
Testing with CI against release candidates is encouraged, too, as a way to stay ahead of future releases.
Packages that use semantic versioning should consider using their version number to indicate versions that drop support for older dependencies.
There is no expectation that a package "deprecate" an older dependency before dropping support for it.
However, there is an expectation that maximum or exact requirements (e.g., `numpy<2` or `matplotlib==3.5.3`) be set only when absolutely necessary (and that issues be immediately created to remove such requirements), and packages must not require versions of any dependency older than 24 months.
Additionally, if a package has been supporting specific OS versions and CPU architectures (e.g., releasing binary [wheels](https://packaging.python.org/en/latest/discussions/package-formats/#what-is-a-wheel)), this support should continue for new OS versions and architectures to maintain the same level of support as before.

This new policy will replace the current standard [#11](https://github.com/heliophysicsPy/standards/blob/main/standards.md#standards) in the PyHC standards document with the following new text:

> **11. Python and Upstream Package Support:** All packages should support minor Python versions released within the last 36 months (3 years) and upstream core Scientific Python packages released within the last 24 months (2 years).
Additionally, packages should support new versions within 6 months of their release (see [PHEP 3](https://github.com/heliophysicsPy/standards/pull/29)).

Lastly, if there is a Python 4 or other significant changes in dependencies, this policy will have to be reviewed in light of the community's and projects' best interests.

# Backwards Compatibility
<a name="backwards-compatibility"></a>
This policy potentially introduces backwards incompatibilities by enforcing a new support timeline, which may encourage some packages to drop support for older dependency versions sooner than planned.

# Security Implications
<a name="security-implications"></a>
There are no direct security implications of this policy.
However, ensuring packages are updated to newer dependency versions may improve security by incorporating fixes and improvements from newer releases.

# How to Teach This
<a name="how-to-teach-this"></a>
- A new web page on the PyHC website will detail the support policy and include a graphical timeline of the schedule (similar to the Gantt chart above).
- Automated email reminders will be sent via the PyHC mailing list quarterly and near important drop/support dates to remind package maintainers of the schedule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will detail" and "will be sent" are somewhat passive. Suggest this be an explicit burden on the tech lead..."the PyHC tech lead will regularly update a graphical timeline and set up automated email reminders....". I do like this!

I apologize for missing that standard 11 is mentioned all over the place :) It might be worth mentioning here that since this completely supersedes standard 11, people should probably start from scratch in making their plans, rather than using standard 11 compliance as a basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely gonna be my burden so I agree it'd be good to clarify this is the Tech Lead's job! I'll reword.


# Reference Implementation
<a name="reference-implementation"></a>
Multiple PyHC packages already follow this version support policy.
One notable example is PlasmaPy which currently [documents their SPEC 0-based policy](https://docs.plasmapy.org/en/stable/contributing/coding_guide.html#python-and-dependency-version-support) and even mentions it in comments inside their [pyproject.toml](https://github.com/PlasmaPy/PlasmaPy/blob/main/pyproject.toml) file.

## Code to generate support and drop schedules:
Cadair marked this conversation as resolved.
Show resolved Hide resolved
```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be explicitly noted that this code is the source of the included Gantt chart? Does it make more sense to put it elsewhere (in this repository or otherwise) and link it from here, as something that is live updated as necessary without having to update the PHEP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it. I'll add "The following code can be used to generate support and drop schedules, including the Gantt chart above." between the header and the code.

I considered putting the code elsewhere, like in a separate file and linking to it, but decided not to for two main reasons: (1) SPEC 0 included their code in-line inside the SPEC and I liked that. (2) I honestly do not intend to formally maintain this code. Sure the dates etc will eventually become obsolete after enough time passes, but ain't nobody got time to remember to update dates in an obscure script! It's valid now and a good enough starting point for anyone who wants to use it in the future (likely only me tbh).

import requests
import collections
from datetime import datetime, timedelta

import pandas as pd
from packaging.version import Version


py_releases = {
"3.9": "Oct 5, 2020",
"3.10": "Oct 4, 2021",
"3.11": "Oct 24, 2022",
"3.12": "Oct 2, 2023",
}
core_packages = [
"numpy",
"scipy",
"matplotlib",
"pandas",
"scikit-image",
"networkx",
"scikit-learn",
"xarray",
"ipython",
"zarr",
]
plus36 = timedelta(days=int(365 * 3))
plus24 = timedelta(days=int(365 * 2))
plus6 = timedelta(days=int(365 * 0.5))

# Release data

# put cutoff 3 quarters ago – we do not use "just" -9 month,
# to avoid the content of the quarter to change depending on when we generate this
# file during the current quarter.

current_date = pd.Timestamp.now()
current_quarter_start = pd.Timestamp(
current_date.year, (current_date.quarter - 1) * 3 + 1, 1
)
cutoff = current_quarter_start - pd.DateOffset(months=9)


def get_release_dates(package, support_time=plus24):
releases = {}

print(f"Querying pypi.org for {package} versions...", end="", flush=True)
response = requests.get(
f"https://pypi.org/simple/{package}",
headers={"Accept": "application/vnd.pypi.simple.v1+json"},
).json()
print("OK")

file_date = collections.defaultdict(list)
for f in response["files"]:
ver = f["filename"].split("-")[1]
try:
version = Version(ver)
except:
continue

if version.is_prerelease or version.micro != 0:
continue

release_date = None
for format in ["%Y-%m-%dT%H:%M:%S.%fZ", "%Y-%m-%dT%H:%M:%SZ"]:
try:
release_date = datetime.strptime(f["upload-time"], format)
except:
pass

if not release_date:
continue

file_date[version].append(release_date)

release_date = {v: min(file_date[v]) for v in file_date}

for ver, release_date in sorted(release_date.items()):
drop_date = release_date + support_time
if drop_date >= cutoff:
releases[ver] = {
"release_date": release_date,
"drop_date": drop_date,
"support_by_date": release_date + plus6
}

return releases


package_releases = {
"python": {
version: {
"release_date": datetime.strptime(release_date, "%b %d, %Y"),
"drop_date": datetime.strptime(release_date, "%b %d, %Y") + plus36,
"support_by_date": datetime.strptime(release_date, "%b %d, %Y") + plus6
}
for version, release_date in py_releases.items()
}
}

package_releases |= {package: get_release_dates(package) for package in core_packages}

# filter all items whose drop_date are in the past
package_releases = {
package: {
version: dates
for version, dates in releases.items()
if dates["drop_date"] > cutoff
}
for package, releases in package_releases.items()
}


# Save Gantt chart
# You can paste the contents into https://mermaid.live/ to generate the chart image.

print("Saving Mermaid chart to chart.md (render at https://mermaid.live/)")
with open("chart.md", "w") as fh:
fh.write(
"""gantt
dateFormat YYYY-MM-DD
axisFormat %m / %Y
title Support Window"""
)

for name, releases in package_releases.items():
fh.write(f"\n\nsection {name}")
for version, dates in releases.items():
fh.write(
f"\n{version} : {dates['release_date'].strftime('%Y-%m-%d')},{dates['drop_date'].strftime('%Y-%m-%d')}"
)
fh.write("\n")

# Print drop schedule

data = []
for k, versions in package_releases.items():
for v, dates in versions.items():
data.append(
(
k,
v,
pd.to_datetime(dates["release_date"]),
pd.to_datetime(dates["drop_date"]),
pd.to_datetime(dates["support_by_date"]),
)
)

df = pd.DataFrame(data, columns=["package", "version", "release", "drop", "support_by"])

df["quarter_drop"] = df["drop"].dt.to_period("Q")
df["quarter_support_by"] = df["support_by"].dt.to_period("Q")

dq_drop = df.set_index(["quarter_drop", "package"]).sort_index()
dq_support_by = df.set_index(["quarter_support_by", "package"]).sort_index()


print("Saving support schedule to schedule.md")


def pad_table(table):
rows = [[el.strip() for el in row.split("|")] for row in table]
col_widths = [max(map(len, column)) for column in zip(*rows)]
rows[1] = [
el if el != "----" else "-" * col_widths[i] for i, el in enumerate(rows[1])
]
padded_table = []
for row in rows:
line = ""
for entry, width in zip(row, col_widths):
if not width:
continue
line += f"| {str.ljust(entry, width)} "
line += f"|"
padded_table.append(line)

return padded_table


def make_table(sub):
table = []
table.append("| | | |")
table.append("|----|----|----|")
for package in sorted(set(sub.index.get_level_values(0))):
vers = sub.loc[[package]]["version"]
minv, maxv = min(vers), max(vers)
rels = sub.loc[[package]]["release"]
rel_min, rel_max = min(rels), max(rels)
version_range = str(minv) if minv == maxv else f"{minv} to {maxv}"
rel_range = (
str(rel_min.strftime("%b %Y"))
if rel_min == rel_max
else f"{rel_min.strftime('%b %Y')} and {rel_max.strftime('%b %Y')}"
)
table.append(f"|{package:<15}|{version_range:<19}|released {rel_range}|")

return pad_table(table)


def make_adopt_table(sub):
table = []
table.append("| | | |")
table.append("|----|----|----|")
for package in sorted(set(sub.index.get_level_values(0))):
vers = sub.loc[[package]]["version"]
minv, maxv = min(vers), max(vers)
support_bys = sub.loc[[package]]["support_by"]
support_by_min, support_by_max = min(support_bys), max(support_bys)
version_range = str(minv) if minv == maxv else f"{minv} to {maxv}"
support_by_range = (
str(support_by_min.strftime("%b %Y"))
if support_by_min == support_by_max
else f"{support_by_min.strftime('%b %Y')} and {support_by_max.strftime('%b %Y')}"
)
table.append(f"|{package:<15}|{version_range:<19}|support by {support_by_range}|")

return pad_table(table)


def make_quarter(quarter, dq_drop, dq_support_by):
table = ["#### " + str(quarter).replace("Q", " - Quarter ") + ":\n"]

# Add new versions adoption schedule if not empty
if quarter in dq_support_by.index.get_level_values(0):
table.append("###### Adopt support for:\n")
adopt_sub = dq_support_by.loc[quarter]
adopt_table = make_adopt_table(adopt_sub)
table.extend(adopt_table)

table.append("\n###### Can drop support for:\n")
sub = dq_drop.loc[quarter]
table.extend(make_table(sub))

return "\n".join(table)


with open("schedule.md", "w") as fh:
# we collect package 6 month in the past, and drop the first quarter
# as we might have filtered some of the packages out depending on
# when we ran the script.
tb = []
for quarter in list(sorted(set(dq_drop.index.get_level_values(0))))[1:]:
tb.append(make_quarter(quarter, dq_drop, dq_support_by))

fh.write("\n\n".join(tb))
fh.write("\n")

```

# Rejected Ideas
<a name="rejected-ideas"></a>
- [NEP 29](https://numpy.org/neps/nep-0029-deprecation_policy.html)'s more lenient 42-month support timeline was originally considered instead of [SPEC 0](https://scientific-python.org/specs/spec-0000/)'s 36 months, but it was ultimately decided to follow SPEC 0 because it supersedes NEP 29.
- The scope of this PHEP was originally limited to Python version support.
However, it was decided that including the upstream package support policy from SPEC 0 would better promote PyHC package interoperability and avoid the need for a future separate PHEP.

# Open Issues
<a name="open-issues"></a>
1. What should go in the "How to Teach This" section? Should we expand on the ideas already there or take it a different direction?
Or leave it if it is sufficient already?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the absence of feedback I'd now consider "leave it" the default option here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my one suggestion, I think that's sufficient.


# Footnotes
<a name="footnotes"></a>
1. SPEC 0: https://scientific-python.org/specs/spec-0000/
2. NEP 29: https://numpy.org/neps/nep-0029-deprecation_policy.html

# Revisions
<a name="revisions"></a>
Revision 1 (pending): Initial draft.

# Copyright
<a name="copyright"></a>
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. It should be cited as:

```
@techreport(phep3,
author = {Shawn Polson},
title = {PyHC Python Support Policy},
year = {2024},
type = {PHEP},
number = {3},
doi = {10.5281/zenodo.xxxxxxx}
)
```
Loading