diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index b11034f..9dc4fdd 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -2,7 +2,7 @@ name: CI on: push: branches: - - "master" + - "*" pull_request: branches: - "*" @@ -25,7 +25,7 @@ jobs: fail-fast: false matrix: os: ["ubuntu-latest"] - python-version: ["3.7", "3.10"] + python-version: ["3.9", "3.10", "3.11"] steps: - uses: actions/checkout@v3 with: @@ -56,7 +56,7 @@ jobs: - name: Set up conda environment shell: bash -l {0} run: | - python -m pip install -e .[tests] + python -m pip install -e .[dev] conda list - name: Run Tests diff --git a/.github/workflows/pypi-release.yaml b/.github/workflows/pypi-release.yaml index 5171071..45b5830 100644 --- a/.github/workflows/pypi-release.yaml +++ b/.github/workflows/pypi-release.yaml @@ -28,7 +28,7 @@ jobs: run: | git clean -xdf git restore -SW . - python setup.py sdist + python -m build - name: Check built artifacts run: | python -m twine check --strict dist/* diff --git a/.gitignore b/.gitignore index 24a6c44..926bfcb 100644 --- a/.gitignore +++ b/.gitignore @@ -38,7 +38,9 @@ nosetests.xml .cache/ __pycache__/ .eggs/ +.hypothesis/ *~ - *.ini +# Dynamic versioning +numpy_groupies/_version.py diff --git a/MANIFEST.in b/MANIFEST.in deleted file mode 100644 index 2b892fd..0000000 --- a/MANIFEST.in +++ /dev/null @@ -1,6 +0,0 @@ -include README.md -include LICENSE.txt -graft numpy_groupies -recursive-exclude * *.py[co] -include versioneer.py -include numpy_groupies/_versioneer.py diff --git a/README.md b/README.md index b4140a7..fa71074 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ This package consists of a small library of optimised tools for doing things that can roughly be considered "group-indexing operations". The most prominent tool is `aggregate`, which is -descibed in detail further down the page. +described in detail further down the page. ## Installation @@ -57,7 +57,7 @@ in more detail: * `group_idx` - array of non-negative integers to be used as the "labels" with which to group the values in `a`. * `a` - array of values to be aggregated. -* `func='sum'` - the function to use for aggregation. See the section below for nore details. +* `func='sum'` - the function to use for aggregation. See the section below for more details. * `size=None` - the shape of the output array. If `None`, the maximum value in `group_idx` will set the size of the output. * `fill_value=0` - value to use for output groups that do not appear anywhere in the `group_idx` input array. * `order='C'` - for multidimensional output, this controls the layout in memory, can be `'F'` for fortran-style. @@ -70,7 +70,7 @@ in more detail: * Form 1 is the simplest, taking `group_idx` and `a` of matching 1D lengths, and producing a 1D output. * Form 2 is similar to Form 1, but takes a scalar `a`, which is broadcast out to the length of `group_idx`. Note that this is generally not that useful. * Form 3 is more complicated. `group_idx` is the same length as the `a.shape[axis]`. The groups are broadcast out along the other axis/axes of `a`, thus the output is of shape `n_groups x a.shape[0] x ... x a.shape[axis-1] x a.shape[axis+1] x ... a.shape[-1]`, i.e. the output has two or more dimensions. -* Form 4 also produces output with two or more dimensions, but for very different reasons to Form 3. Here `a` is 1D and `group_idx` is exactly `2D`, whereas in Form 3 `a` is `ND`, `group_idx` is `1D`, and we provide a value for `axis`. The length of `a` must match `group_idx.shape[1]`, the value of `group_idx.shape[0]` determines the number of dimensions in the ouput, i.e. `group_idx[:,99]` gives the `(x,y,z)` group indices for the `a[99]`. +* Form 4 also produces output with two or more dimensions, but for very different reasons to Form 3. Here `a` is 1D and `group_idx` is exactly `2D`, whereas in Form 3 `a` is `ND`, `group_idx` is `1D`, and we provide a value for `axis`. The length of `a` must match `group_idx.shape[1]`, the value of `group_idx.shape[0]` determines the number of dimensions in the output, i.e. `group_idx[:,99]` gives the `(x,y,z)` group indices for the `a[99]`. * Form 5 is the same as Form 4 but with scalar `a`. As with Form 2, this is rarely that helpful. **Note on performance.** The `order` of the output is unlikely to affect performance of `aggregate` (although it may affect your downstream usage of that output), however the order of multidimensional `a` or `group_idx` can affect performance: in Form 4 it is best if columns are contiguous in memory within `group_idx`, i.e. `group_idx[:, 99]` corresponds to a contiguous chunk of memory; in Form 3 it's best if all the data in `a` for `group_idx[i]` is contiguous, e.g. if `axis=1` then we want `a[:, 55]` to be contiguous. @@ -94,19 +94,19 @@ the following optimized functions. Note that not all functions might be provided * `'argmin'` - the index in `a` of the minimum value in each group. The above functions also have a `nan`-form, which skip the `nan` values instead of propagating them to the result of the calculation: -* `'nansum'`, `'nanprod'`, `'nanmean'`, `'nanvar'`, `'nanstd'`, `'nanmin'`, `'nanmax'`, `'nanfirst'`, `'nanlast'`, ``nanargmax``, ``nanargmin`` +* `'nansum'`, `'nanprod'`, `'nanmean'`, `'nanvar'`, `'nanstd'`, `'nanmin'`, `'nanmax'`, `'nanfirst'`, `'nanlast'`, `'nanargmax'`, `'nanargmin'` The following functions are slightly different in that they always return boolean values. Their treatment of nans is also different from above: * `'all'` - `True` if all items within a group are truethy. Note that `np.all(nan)` is `True`, i.e. `nan` is actually truethy. * `'any'` - `True` if any items within a group are truethy. * `'allnan'` - `True` if all items within a group are `nan`. -* `'anynan'` - `True` if any items within a gorup are `nan`. +* `'anynan'` - `True` if any items within a group are `nan`. The following functions don't reduce the data, but instead produce an output matching the size of the input: -* `cumsum` - cumulative sum of items within each group. -* `cumprod` - cumulative product of items within each group. (numba only) -* `cummin` - cumulative minimum of items within each group. (numba only) -* `cummax` - cumulative maximum of items within each group. (numba only) +* `'cumsum'` - cumulative sum of items within each group. +* `'cumprod'` - cumulative product of items within each group. (numba only) +* `'cummin'` - cumulative minimum of items within each group. (numba only) +* `'cummax'` - cumulative maximum of items within each group. (numba only) * `'sort'` - sort the items within each group in ascending order, use reverse=True to invert the order. Finally, there are three functions which don't reduce each group to a single value, instead they return the full @@ -178,10 +178,9 @@ like `from numpy_groupies import aggregate_nb as aggregate` or by importing aggr Currently the following implementations exist: * **numpy** - This is the default implementation. It uses plain `numpy`, mainly relying on `np.bincount` and basic indexing magic. It comes without other dependencies except `numpy` and shows reasonable performance for the occasional usage. -* **numba** - This is the most performant implementation in average, based on jit compilation provided by numba and LLVM. -* **weave** - `weave` compiles C-code on demand at runtime, producing binaries that get executed from within python. The performance of this implementation is comparable to the numba implementation. +* **numba** - This is the most performant implementation, based on jit compilation provided by numba and LLVM. * **pure python** - This implementation has no dependencies and uses only the standard library. It's horribly slow and should only be used, if there is no numpy available. -* **numpy ufunc** - *Only for benchmarking.* This impelmentation uses the `.at` method of numpy's `ufunc`s (e.g. `add.at`), which would appear to be designed for perfoming excactly the same calculation that `aggregate` executes, however the numpy implementation is rather incomplete and slow (as of `v1.14.0`). A [numpy issue](https://github.com/numpy/numpy/issues/5922) has been created to address this issue. +* **numpy ufunc** - *Only for benchmarking.* This implementation uses the `.at` method of numpy's `ufunc`s (e.g. `add.at`), which would appear to be designed for performing exactly the same calculation that `aggregate` executes, however the numpy implementation is rather incomplete. * **pandas** - *Only for reference.* The pandas' `groupby` concept is the same as the task performed by `aggregate`. However, `pandas` is not actually faster than the default `numpy` implementation. Also, note that there may be room for improvement in the way that `pandas` is utilized here. Most notably, when computing multiple aggregations of the same data (e.g. `'min'` and `'max'`) pandas could potentially be used more efficiently. All implementations have the same calling syntax and produce the same outputs, to within some floating-point error. @@ -197,99 +196,53 @@ the interval `[0,1)`, with anything less than `0.2` then set to 0 (in order to s For `nan-` operations another 20% of the values are set to nan, leaving the remainder on the interval `[0.2,0.8)`. The benchmarking results are given in ms for an i7-7560U running at 2.40GHz: -```text -function ufunc numpy numba pandas ------------------------------------------------------------------ -sum 36.582 1.708 0.859 12.002 -prod 37.559 37.864 0.857 11.507 -amin 34.394 34.254 0.865 11.711 -amax 34.120 33.964 0.899 12.005 -len 31.899 1.382 0.733 11.092 -all 37.062 3.863 1.048 12.519 -any 36.260 5.601 1.048 12.713 -anynan 32.514 2.735 0.936 141.092 -allnan 34.558 5.611 0.932 151.953 -mean ---- 2.603 1.069 12.227 -std ---- 4.373 1.126 11.963 -var ---- 4.331 1.129 122.625 -first ---- 1.946 1.032 11.850 -last ---- 1.532 0.742 11.736 -argmax ---- 35.397 1.172 346.742 -argmin ---- 39.942 1.407 347.679 -nansum ---- 5.716 1.942 13.248 -nanprod ---- 36.224 1.967 12.585 -nanmin ---- 33.229 1.916 13.067 -nanmax ---- 32.935 1.965 13.258 -nanlen ---- 5.277 1.740 14.426 -nanall ---- 7.703 2.201 16.221 -nanany ---- 8.984 2.215 15.968 -nanmean ---- 6.221 2.024 13.243 -nanvar ---- 7.866 1.929 126.689 -nanstd ---- 7.945 1.933 13.062 -nanfirst ---- 6.015 2.284 15.547 -nanlast ---- 5.561 1.675 15.318 -nanargmin ---- 42.110 2.357 ---- -nanargmax ---- 38.085 2.314 ---- -cumsum ---- 106.524 1.313 8.000 -cumprod ---- ---- 1.319 11.149 -cummax ---- ---- 1.288 11.954 -cummin ---- ---- 1.271 11.631 -arbitrary ---- 206.623 50.381 131.928 -sort ---- 171.702 ---- ---- -Linux(x86_64), Python 3.10.4, Numpy 1.22.4, Numba 0.55.2, Pandas 1.4.3 -``` -```text -function ufunc numpy numba weave ------------------------------------------------------------------ -sum 30.985 1.684 1.116 1.350 -prod 32.553 32.269 0.996 1.172 -amin 34.954 34.837 0.989 2.068 -amax 34.580 34.695 1.023 2.132 -len 30.611 1.342 0.805 1.003 -all 36.724 4.355 1.366 1.407 -any 34.570 7.181 1.373 1.410 -anynan 30.840 2.611 0.986 2.557 -allnan 32.463 6.636 0.982 2.562 -mean ---- 2.248 0.985 1.191 -std ---- 6.532 1.084 1.378 -var ---- 6.590 1.086 1.380 -first ---- 2.126 1.033 1.132 -last ---- 1.592 0.957 1.002 -argmax ---- 34.903 1.018 ---- -argmin ---- 38.538 0.996 ---- -nansum ---- 5.148 1.785 1.335 -nanprod ---- 29.445 1.760 1.334 -nanmin ---- 31.752 1.992 2.747 -nanmax ---- 32.247 2.021 2.802 -nanlen ---- 5.099 1.909 1.267 -nanall ---- 9.637 1.826 1.375 -nanany ---- 10.520 1.830 1.384 -nanmean ---- 5.775 2.018 1.430 -nanvar ---- 10.171 2.145 1.640 -nanstd ---- 10.155 2.163 1.637 -nanfirst ---- 5.640 2.201 1.156 -nanlast ---- 5.218 1.734 1.137 -nanargmin ---- 43.795 1.987 ---- -nanargmax ---- 40.354 2.029 ---- -cumsum ---- 138.660 1.270 ---- -cumprod ---- ---- 1.292 ---- -cummax ---- ---- 1.216 ---- -cummin ---- ---- 1.205 ---- -arbitrary ---- 224.213 80.039 ---- -sort ---- 268.514 ---- ---- -Linux(x86_64), Python 2.7.18, Numpy 1.16.6, Numba 0.46.0, Weave 0.17.0 -``` +| function | ufunc | numpy | numba | pandas | +|-----------|---------|---------|---------|---------| +| sum | 1.950 | 1.728 | 0.708 | 11.832 | +| prod | 2.279 | 2.349 | 0.709 | 11.649 | +| min | 2.472 | 2.489 | 0.716 | 11.686 | +| max | 2.457 | 2.480 | 0.745 | 11.598 | +| len | 1.481 | 1.270 | 0.635 | 10.932 | +| all | 37.186 | 3.054 | 0.892 | 12.587 | +| any | 35.278 | 5.157 | 0.890 | 12.845 | +| anynan | 5.783 | 2.126 | 0.762 | 144.740 | +| allnan | 7.971 | 4.367 | 0.774 | 144.507 | +| mean | ---- | 2.500 | 0.825 | 13.284 | +| std | ---- | 4.528 | 0.965 | 12.193 | +| var | ---- | 4.269 | 0.969 | 12.657 | +| first | ---- | 1.847 | 0.811 | 11.584 | +| last | ---- | 1.309 | 0.581 | 11.842 | +| argmax | ---- | 3.504 | 1.411 | 293.640 | +| argmin | ---- | 6.996 | 1.347 | 290.977 | +| nansum | ---- | 5.388 | 1.569 | 15.239 | +| nanprod | ---- | 5.707 | 1.546 | 15.004 | +| nanmin | ---- | 5.831 | 1.700 | 14.292 | +| nanmax | ---- | 5.847 | 1.731 | 14.927 | +| nanlen | ---- | 3.170 | 1.529 | 14.529 | +| nanall | ---- | 6.499 | 1.640 | 15.931 | +| nanany | ---- | 8.041 | 1.656 | 15.839 | +| nanmean | ---- | 5.636 | 1.583 | 15.185 | +| nanvar | ---- | 7.514 | 1.682 | 15.643 | +| nanstd | ---- | 7.292 | 1.666 | 15.104 | +| nanfirst | ---- | 5.318 | 2.096 | 14.432 | +| nanlast | ---- | 4.943 | 1.473 | 14.637 | +| nanargmin | ---- | 7.977 | 1.779 | 298.911 | +| nanargmax | ---- | 5.869 | 1.802 | 301.022 | +| cumsum | ---- | 71.713 | 1.119 | 8.864 | +| cumprod | ---- | ---- | 1.123 | 12.100 | +| cummax | ---- | ---- | 1.062 | 12.133 | +| cummin | ---- | ---- | 0.973 | 11.908 | +| arbitrary | ---- | 147.853 | 46.690 | 129.779 | +| sort | ---- | 167.699 | ---- | ---- | + +_Linux(x86_64), Python 3.10.12, Numpy 1.25.2, Numba 0.58.0, Pandas 2.0.2_ + ## Development This project was started by @ml31415 and the `numba` and `weave` implementations are by him. The pure python and `numpy` implementations were written by @d1manson. The authors hope that `numpy`'s `ufunc.at` methods or some other implementation of `aggregate` within -`numpy` or `scipy` will eventually be fast enough, to make this package redundant. - - -### python2 -So far `numpy_grpupies` can still be run on `python2`, mainly because `weave` was never ported to `python3`. -Ditching `python2` support would mean to ditch the `weave` implementation, which is so far the best competitor in -terms of speed. In order not to lose this benchmarking option, `python2` compatibility is likely to stay -for now. +`numpy` or `scipy` will eventually be fast enough, to make this package redundant. Numpy 1.25 actually +contained major [improvements on ufunc speed](https://numpy.org/doc/stable/release/1.25.0-notes.html), +which reduced the speed gap between numpy and the numba implementation a lot. diff --git a/conftest.py b/conftest.py index dd87b08..480d61d 100644 --- a/conftest.py +++ b/conftest.py @@ -5,9 +5,7 @@ def pytest_configure(config): - config.addinivalue_line( - "markers", "deselect_if(func): function to deselect tests from parametrization" - ) + config.addinivalue_line("markers", "deselect_if(func): function to deselect tests from parametrization") def pytest_collection_modifyitems(config, items): diff --git a/numpy_groupies/__init__.py b/numpy_groupies/__init__.py index 27ea6f4..60d81bc 100644 --- a/numpy_groupies/__init__.py +++ b/numpy_groupies/__init__.py @@ -1,11 +1,9 @@ -from ._version import get_versions from .aggregate_purepy import aggregate as aggregate_py def dummy_no_impl(*args, **kwargs): raise NotImplementedError( - "You may need to install another package (numpy, " - "weave, or numba) to access a working implementation." + "You may need to install another package (numpy or numba) to access a working implementation." ) @@ -21,7 +19,7 @@ def dummy_no_impl(*args, **kwargs): aggregate_np = aggregate from .aggregate_numpy_ufunc import aggregate as aggregate_ufunc - from .utils_numpy import ( + from .utils import ( label_contiguous_1d, multi_arange, relabel_groups_masked, @@ -30,25 +28,13 @@ def dummy_no_impl(*args, **kwargs): ) -try: - try: - import weave - except ImportError: - from scipy import weave -except ImportError: - aggregate_wv = None -else: - from .aggregate_weave import aggregate as aggregate_wv, step_count, step_indices - - aggregate = aggregate_wv - - try: import numba except ImportError: aggregate_nb = None else: - from .aggregate_numba import aggregate as aggregate_nb, step_count, step_indices + from .aggregate_numba import aggregate as aggregate_nb + from .aggregate_numba import step_count, step_indices aggregate = aggregate_nb @@ -57,5 +43,14 @@ def uaggregate(group_idx, a, **kwargs): return unpack(group_idx, aggregate(group_idx, a, **kwargs)) -__version__ = get_versions()["version"] -del get_versions +try: + # Version is added only when packaged + from ._version import __version__ +except ImportError: + try: + from setuptools_scm import get_version + except ImportError: + __version__ = "0.0.0" + else: + __version__ = get_version(root="..", relative_to=__file__) + del get_version diff --git a/numpy_groupies/_version.py b/numpy_groupies/_version.py deleted file mode 100644 index 78054f0..0000000 --- a/numpy_groupies/_version.py +++ /dev/null @@ -1,556 +0,0 @@ -# This file helps to compute a version number in source trees obtained from -# git-archive tarball (such as those provided by githubs download-from-tag -# feature). Distribution tarballs (built by setup.py sdist) and build -# directories (produced by setup.py build) will contain a much shorter file -# that just contains the computed version number. - -# This file is released into the public domain. Generated by -# versioneer-0.18 (https://github.com/warner/python-versioneer) - -"""Git implementation of _version.py.""" - -import errno -import os -import re -import subprocess -import sys - - -def get_keywords(): - """Get the keywords needed to look up the version information.""" - # these strings will be replaced by git during git-archive. - # setup.py/versioneer.py will grep for the variable names, so they must - # each be defined on a line of their own. _version.py will just call - # get_keywords(). - git_refnames = "$Format:%d$" - git_full = "$Format:%H$" - git_date = "$Format:%ci$" - keywords = {"refnames": git_refnames, "full": git_full, "date": git_date} - return keywords - - -class VersioneerConfig: - """Container for Versioneer configuration parameters.""" - - -def get_config(): - """Create, populate and return the VersioneerConfig() object.""" - # these strings are filled in when 'setup.py versioneer' creates - # _version.py - cfg = VersioneerConfig() - cfg.VCS = "git" - cfg.style = "pep440" - cfg.tag_prefix = "v" - cfg.parentdir_prefix = "numpy_groupies-" - cfg.versionfile_source = "numpy_groupies/_version.py" - cfg.verbose = False - return cfg - - -class NotThisMethod(Exception): - """Exception raised if a method is not valid for the current scenario.""" - - -LONG_VERSION_PY = {} -HANDLERS = {} - - -def register_vcs_handler(vcs, method): # decorator - """Decorator to mark a method as the handler for a particular VCS.""" - - def decorate(f): - """Store f in HANDLERS[vcs][method].""" - if vcs not in HANDLERS: - HANDLERS[vcs] = {} - HANDLERS[vcs][method] = f - return f - - return decorate - - -def run_command(commands, args, cwd=None, verbose=False, hide_stderr=False, env=None): - """Call the given command(s).""" - assert isinstance(commands, list) - p = None - for c in commands: - try: - dispcmd = str([c] + args) - # remember shell=False, so use git.cmd on windows, not just git - p = subprocess.Popen( - [c] + args, - cwd=cwd, - env=env, - stdout=subprocess.PIPE, - stderr=(subprocess.PIPE if hide_stderr else None), - ) - break - except EnvironmentError: - e = sys.exc_info()[1] - if e.errno == errno.ENOENT: - continue - if verbose: - print("unable to run %s" % dispcmd) - print(e) - return None, None - else: - if verbose: - print("unable to find command, tried %s" % (commands,)) - return None, None - stdout = p.communicate()[0].strip() - if sys.version_info[0] >= 3: - stdout = stdout.decode() - if p.returncode != 0: - if verbose: - print("unable to run %s (error)" % dispcmd) - print("stdout was %s" % stdout) - return None, p.returncode - return stdout, p.returncode - - -def versions_from_parentdir(parentdir_prefix, root, verbose): - """Try to determine the version from the parent directory name. - - Source tarballs conventionally unpack into a directory that includes both - the project name and a version string. We will also support searching up - two directory levels for an appropriately named parent directory - """ - rootdirs = [] - - for i in range(3): - dirname = os.path.basename(root) - if dirname.startswith(parentdir_prefix): - return { - "version": dirname[len(parentdir_prefix) :], - "full-revisionid": None, - "dirty": False, - "error": None, - "date": None, - } - else: - rootdirs.append(root) - root = os.path.dirname(root) # up a level - - if verbose: - print( - "Tried directories %s but none started with prefix %s" - % (str(rootdirs), parentdir_prefix) - ) - raise NotThisMethod("rootdir doesn't start with parentdir_prefix") - - -@register_vcs_handler("git", "get_keywords") -def git_get_keywords(versionfile_abs): - """Extract version information from the given file.""" - # the code embedded in _version.py can just fetch the value of these - # keywords. When used from setup.py, we don't want to import _version.py, - # so we do it with a regexp instead. This function is not used from - # _version.py. - keywords = {} - try: - f = open(versionfile_abs, "r") - for line in f.readlines(): - if line.strip().startswith("git_refnames ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["refnames"] = mo.group(1) - if line.strip().startswith("git_full ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["full"] = mo.group(1) - if line.strip().startswith("git_date ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["date"] = mo.group(1) - f.close() - except EnvironmentError: - pass - return keywords - - -@register_vcs_handler("git", "keywords") -def git_versions_from_keywords(keywords, tag_prefix, verbose): - """Get version information from git keywords.""" - if not keywords: - raise NotThisMethod("no keywords at all, weird") - date = keywords.get("date") - if date is not None: - # git-2.2.0 added "%cI", which expands to an ISO-8601 -compliant - # datestamp. However we prefer "%ci" (which expands to an "ISO-8601 - # -like" string, which we must then edit to make compliant), because - # it's been around since git-1.5.3, and it's too difficult to - # discover which version we're using, or to work around using an - # older one. - date = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - refnames = keywords["refnames"].strip() - if refnames.startswith("$Format"): - if verbose: - print("keywords are unexpanded, not using") - raise NotThisMethod("unexpanded keywords, not a git-archive tarball") - refs = set([r.strip() for r in refnames.strip("()").split(",")]) - # starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of - # just "foo-1.0". If we see a "tag: " prefix, prefer those. - TAG = "tag: " - tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)]) - if not tags: - # Either we're using git < 1.8.3, or there really are no tags. We use - # a heuristic: assume all version tags have a digit. The old git %d - # expansion behaves like git log --decorate=short and strips out the - # refs/heads/ and refs/tags/ prefixes that would let us distinguish - # between branches and tags. By ignoring refnames without digits, we - # filter out many common branch names like "release" and - # "stabilization", as well as "HEAD" and "master". - tags = set([r for r in refs if re.search(r"\d", r)]) - if verbose: - print("discarding '%s', no digits" % ",".join(refs - tags)) - if verbose: - print("likely tags: %s" % ",".join(sorted(tags))) - for ref in sorted(tags): - # sorting will prefer e.g. "2.0" over "2.0rc1" - if ref.startswith(tag_prefix): - r = ref[len(tag_prefix) :] - if verbose: - print("picking %s" % r) - return { - "version": r, - "full-revisionid": keywords["full"].strip(), - "dirty": False, - "error": None, - "date": date, - } - # no suitable tags, so version is "0+unknown", but full hex is still there - if verbose: - print("no suitable tags, using unknown + full revision id") - return { - "version": "0+unknown", - "full-revisionid": keywords["full"].strip(), - "dirty": False, - "error": "no suitable tags", - "date": None, - } - - -@register_vcs_handler("git", "pieces_from_vcs") -def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command): - """Get version from 'git describe' in the root of the source tree. - - This only gets called if the git-archive 'subst' keywords were *not* - expanded, and _version.py hasn't already been rewritten with a short - version string, meaning we're inside a checked out source tree. - """ - GITS = ["git"] - if sys.platform == "win32": - GITS = ["git.cmd", "git.exe"] - - out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root, hide_stderr=True) - if rc != 0: - if verbose: - print("Directory %s not under git control" % root) - raise NotThisMethod("'git rev-parse --git-dir' returned error") - - # if there is a tag matching tag_prefix, this yields TAG-NUM-gHEX[-dirty] - # if there isn't one, this yields HEX[-dirty] (no NUM) - describe_out, rc = run_command( - GITS, - [ - "describe", - "--tags", - "--dirty", - "--always", - "--long", - "--match", - "%s*" % tag_prefix, - ], - cwd=root, - ) - # --long was added in git-1.5.5 - if describe_out is None: - raise NotThisMethod("'git describe' failed") - describe_out = describe_out.strip() - full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root) - if full_out is None: - raise NotThisMethod("'git rev-parse' failed") - full_out = full_out.strip() - - pieces = {} - pieces["long"] = full_out - pieces["short"] = full_out[:7] # maybe improved later - pieces["error"] = None - - # parse describe_out. It will be like TAG-NUM-gHEX[-dirty] or HEX[-dirty] - # TAG might have hyphens. - git_describe = describe_out - - # look for -dirty suffix - dirty = git_describe.endswith("-dirty") - pieces["dirty"] = dirty - if dirty: - git_describe = git_describe[: git_describe.rindex("-dirty")] - - # now we have TAG-NUM-gHEX or HEX - - if "-" in git_describe: - # TAG-NUM-gHEX - mo = re.search(r"^(.+)-(\d+)-g([0-9a-f]+)$", git_describe) - if not mo: - # unparseable. Maybe git-describe is misbehaving? - pieces["error"] = "unable to parse git-describe output: '%s'" % describe_out - return pieces - - # tag - full_tag = mo.group(1) - if not full_tag.startswith(tag_prefix): - if verbose: - fmt = "tag '%s' doesn't start with prefix '%s'" - print(fmt % (full_tag, tag_prefix)) - pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % ( - full_tag, - tag_prefix, - ) - return pieces - pieces["closest-tag"] = full_tag[len(tag_prefix) :] - - # distance: number of commits since tag - pieces["distance"] = int(mo.group(2)) - - # commit: short hex revision ID - pieces["short"] = mo.group(3) - - else: - # HEX: no tags - pieces["closest-tag"] = None - count_out, rc = run_command(GITS, ["rev-list", "HEAD", "--count"], cwd=root) - pieces["distance"] = int(count_out) # total number of commits - - # commit date: see ISO-8601 comment in git_versions_from_keywords() - date = run_command(GITS, ["show", "-s", "--format=%ci", "HEAD"], cwd=root)[ - 0 - ].strip() - pieces["date"] = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - - return pieces - - -def plus_or_dot(pieces): - """Return a + if we don't already have one, else return a .""" - if "+" in pieces.get("closest-tag", ""): - return "." - return "+" - - -def render_pep440(pieces): - """Build up version string, with post-release "local version identifier". - - Our goal: TAG[+DISTANCE.gHEX[.dirty]] . Note that if you - get a tagged build and then dirty it, you'll get TAG+0.gHEX.dirty - - Exceptions: - 1: no tags. git_describe was just HEX. 0+untagged.DISTANCE.gHEX[.dirty] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += plus_or_dot(pieces) - rendered += "%d.g%s" % (pieces["distance"], pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - else: - # exception #1 - rendered = "0+untagged.%d.g%s" % (pieces["distance"], pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - return rendered - - -def render_pep440_pre(pieces): - """TAG[.post.devDISTANCE] -- No -dirty. - - Exceptions: - 1: no tags. 0.post.devDISTANCE - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += ".post.dev%d" % pieces["distance"] - else: - # exception #1 - rendered = "0.post.dev%d" % pieces["distance"] - return rendered - - -def render_pep440_post(pieces): - """TAG[.postDISTANCE[.dev0]+gHEX] . - - The ".dev0" means dirty. Note that .dev0 sorts backwards - (a dirty tree will appear "older" than the corresponding clean one), - but you shouldn't be releasing software with -dirty anyways. - - Exceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += plus_or_dot(pieces) - rendered += "g%s" % pieces["short"] - else: - # exception #1 - rendered = "0.post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += "+g%s" % pieces["short"] - return rendered - - -def render_pep440_old(pieces): - """TAG[.postDISTANCE[.dev0]] . - - The ".dev0" means dirty. - - Eexceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - else: - # exception #1 - rendered = "0.post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - return rendered - - -def render_git_describe(pieces): - """TAG[-DISTANCE-gHEX][-dirty]. - - Like 'git describe --tags --dirty --always'. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += "-%d-g%s" % (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render_git_describe_long(pieces): - """TAG-DISTANCE-gHEX[-dirty]. - - Like 'git describe --tags --dirty --always -long'. - The distance/hash is unconditional. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - rendered += "-%d-g%s" % (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render(pieces, style): - """Render the given version pieces into the requested style.""" - if pieces["error"]: - return { - "version": "unknown", - "full-revisionid": pieces.get("long"), - "dirty": None, - "error": pieces["error"], - "date": None, - } - - if not style or style == "default": - style = "pep440" # the default - - if style == "pep440": - rendered = render_pep440(pieces) - elif style == "pep440-pre": - rendered = render_pep440_pre(pieces) - elif style == "pep440-post": - rendered = render_pep440_post(pieces) - elif style == "pep440-old": - rendered = render_pep440_old(pieces) - elif style == "git-describe": - rendered = render_git_describe(pieces) - elif style == "git-describe-long": - rendered = render_git_describe_long(pieces) - else: - raise ValueError("unknown style '%s'" % style) - - return { - "version": rendered, - "full-revisionid": pieces["long"], - "dirty": pieces["dirty"], - "error": None, - "date": pieces.get("date"), - } - - -def get_versions(): - """Get version information or return default if unable to do so.""" - # I am in _version.py, which lives at ROOT/VERSIONFILE_SOURCE. If we have - # __file__, we can work backwards from there to the root. Some - # py2exe/bbfreeze/non-CPython implementations don't do __file__, in which - # case we can only use expanded keywords. - - cfg = get_config() - verbose = cfg.verbose - - try: - return git_versions_from_keywords(get_keywords(), cfg.tag_prefix, verbose) - except NotThisMethod: - pass - - try: - root = os.path.realpath(__file__) - # versionfile_source is the relative path from the top of the source - # tree (where the .git directory might live) to this file. Invert - # this to find the root from __file__. - for i in cfg.versionfile_source.split("/"): - root = os.path.dirname(root) - except NameError: - return { - "version": "0+unknown", - "full-revisionid": None, - "dirty": None, - "error": "unable to find root of source tree", - "date": None, - } - - try: - pieces = git_pieces_from_vcs(cfg.tag_prefix, root, verbose) - return render(pieces, cfg.style) - except NotThisMethod: - pass - - try: - if cfg.parentdir_prefix: - return versions_from_parentdir(cfg.parentdir_prefix, root, verbose) - except NotThisMethod: - pass - - return { - "version": "0+unknown", - "full-revisionid": None, - "dirty": None, - "error": "unable to compute version", - "date": None, - } diff --git a/numpy_groupies/aggregate_numba.py b/numpy_groupies/aggregate_numba.py index 1727064..914a45e 100644 --- a/numpy_groupies/aggregate_numba.py +++ b/numpy_groupies/aggregate_numba.py @@ -1,10 +1,15 @@ -from __future__ import division - import numba as nb import numpy as np -from .utils import aggregate_common_doc, funcs_no_separate_nan, get_func, isstr -from .utils_numpy import aliasing, check_dtype, check_fill_value, input_validation +from .utils import ( + aggregate_common_doc, + aliasing, + check_dtype, + check_fill_value, + funcs_no_separate_nan, + get_func, + input_validation, +) class AggregateOp(object): @@ -63,9 +68,7 @@ def __call__( dtype = check_dtype(dtype, self.func, a, len(group_idx)) check_fill_value(fill_value, dtype, func=self.func) input_dtype = type(a) if np.isscalar(a) else a.dtype - ret, counter, mean, outer = self._initialize( - flat_size, fill_value, dtype, input_dtype, group_idx.size - ) + ret, counter, mean, outer = self._initialize(flat_size, fill_value, dtype, input_dtype, group_idx.size) group_idx = np.ascontiguousarray(group_idx) if not np.isscalar(a): @@ -138,9 +141,7 @@ def inner(ri, val, ret, counter, mean, fill_value): def loop(group_idx, a, ret, counter, mean, outer, fill_value, ddof): # ddof needs to be present for being exchangeable with loop_2pass size = len(ret) - rng = ( - range(len(group_idx) - 1, -1, -1) if reverse else range(len(group_idx)) - ) + rng = range(len(group_idx) - 1, -1, -1) if reverse else range(len(group_idx)) for i in rng: ri = group_idx[i] if ri < 0: @@ -181,9 +182,7 @@ class Aggregate2pass(AggregateOp): def callable(cls, nans=False, reverse=False, scalar=False): # Careful, cls needs to be passed, so that the overwritten methods remain available in # AggregateOp.callable - loop_1st = super(Aggregate2pass, cls).callable( - nans=nans, reverse=reverse, scalar=scalar - ) + loop_1st = super().callable(nans=nans, reverse=reverse, scalar=scalar) _2pass_inner = nb.njit(cls._2pass_inner) @@ -243,18 +242,14 @@ def __call__( axis=None, ddof=0, ): - iv = input_validation( - group_idx, a, size=size, order=order, axis=axis, check_bounds=False - ) + iv = input_validation(group_idx, a, size=size, order=order, axis=axis, check_bounds=False) group_idx, a, flat_size, ndim_idx, size, _ = iv # TODO: The typecheck should be done by the class itself, not by check_dtype dtype = check_dtype(dtype, self.func, a, len(group_idx)) check_fill_value(fill_value, dtype, func=self.func) input_dtype = type(a) if np.isscalar(a) else a.dtype - ret, _, _, _ = self._initialize( - flat_size, fill_value, dtype, input_dtype, group_idx.size - ) + ret, _, _, _ = self._initialize(flat_size, fill_value, dtype, input_dtype, group_idx.size) group_idx = np.ascontiguousarray(group_idx) sortidx = np.argsort(group_idx, kind="mergesort") @@ -535,19 +530,10 @@ def get_funcs(): def aggregate( - group_idx, - a, - func="sum", - size=None, - fill_value=0, - order="C", - dtype=None, - axis=None, - cache=True, - **kwargs + group_idx, a, func="sum", size=None, fill_value=0, order="C", dtype=None, axis=None, cache=True, **kwargs ): func = get_func(func, aliasing, _impl_dict) - if not isstr(func): + if not isinstance(func, str): if cache in (None, False): # Keep None and False in order to accept empty dictionaries aggregate_op = AggregateGeneric(func) @@ -555,9 +541,7 @@ def aggregate( if cache is True: cache = _default_cache aggregate_op = cache.setdefault(func, AggregateGeneric(func)) - return aggregate_op( - group_idx, a, size, fill_value, order, dtype, axis, **kwargs - ) + return aggregate_op(group_idx, a, size, fill_value, order, dtype, axis, **kwargs) else: func = _impl_dict[func] return func(group_idx, a, size, fill_value, order, dtype, axis, **kwargs) diff --git a/numpy_groupies/aggregate_numpy.py b/numpy_groupies/aggregate_numpy.py index dd77fbb..0bfc9a0 100644 --- a/numpy_groupies/aggregate_numpy.py +++ b/numpy_groupies/aggregate_numpy.py @@ -2,21 +2,18 @@ from .utils import ( aggregate_common_doc, - check_boolean, - funcs_no_separate_nan, - get_func, - isstr, -) -from .utils_numpy import ( aliasing, + check_boolean, check_dtype, check_fill_value, + funcs_no_separate_nan, + get_func, input_validation, iscomplexobj, + maxval, minimum_dtype, minimum_dtype_scalar, minval, - maxval, ) @@ -33,9 +30,7 @@ def _sum(group_idx, a, size, fill_value, dtype=None): ret.real = np.bincount(group_idx, weights=a.real, minlength=size) ret.imag = np.bincount(group_idx, weights=a.imag, minlength=size) else: - ret = np.bincount(group_idx, weights=a, minlength=size).astype( - dtype, copy=False - ) + ret = np.bincount(group_idx, weights=a, minlength=size).astype(dtype, copy=False) if fill_value != 0: _fill_untouched(group_idx, ret, fill_value) @@ -118,9 +113,7 @@ def _argmax(group_idx, a, size, fill_value, dtype=int, _nansqueeze=False): ret = np.full(size, fill_value, dtype=dtype) group_idx_max = group_idx[is_max] (argmax,) = is_max.nonzero() - ret[group_idx_max[::-1]] = argmax[ - ::-1 - ] # reverse to ensure first value for each group wins + ret[group_idx_max[::-1]] = argmax[::-1] # reverse to ensure first value for each group wins return ret @@ -132,9 +125,7 @@ def _argmin(group_idx, a, size, fill_value, dtype=int, _nansqueeze=False): ret = np.full(size, fill_value, dtype=dtype) group_idx_min = group_idx[is_min] (argmin,) = is_min.nonzero() - ret[group_idx_min[::-1]] = argmin[ - ::-1 - ] # reverse to ensure first value for each group wins + ret[group_idx_min[::-1]] = argmin[::-1] # reverse to ensure first value for each group wins return ret @@ -165,9 +156,7 @@ def _sum_of_squres(group_idx, a, size, fill_value, dtype=np.dtype(np.float64)): return ret -def _var( - group_idx, a, size, fill_value, dtype=np.dtype(np.float64), sqrt=False, ddof=0 -): +def _var(group_idx, a, size, fill_value, dtype=np.dtype(np.float64), sqrt=False, ddof=0): if np.ndim(a) == 0: raise ValueError("cannot take variance with scalar a") counts = np.bincount(group_idx, minlength=size) @@ -175,9 +164,7 @@ def _var( with np.errstate(divide="ignore", invalid="ignore"): means = sums.astype(dtype, copy=False) / counts counts = np.where(counts > ddof, counts - ddof, 0) - ret = ( - np.bincount(group_idx, (a - means[group_idx]) ** 2, minlength=size) / counts - ) + ret = np.bincount(group_idx, (a - means[group_idx]) ** 2, minlength=size) / counts if sqrt: ret = np.sqrt(ret) # this is now std not var if not np.isnan(fill_value): @@ -207,7 +194,7 @@ def _sort(group_idx, a, size=None, fill_value=None, dtype=None, reverse=False): def _array(group_idx, a, size, fill_value, dtype=None): """groups a into separate arrays, keeping the order intact.""" if fill_value is not None and not (np.isscalar(fill_value) or len(fill_value) == 0): - raise ValueError("fill_value must be None, a scalar or an empty " "sequence") + raise ValueError("fill_value must be None, a scalar or an empty sequence") order_group_idx = np.argsort(group_idx, kind="mergesort") counts = np.bincount(group_idx, minlength=size) ret = np.split(a[order_group_idx], np.cumsum(counts)[:-1]) @@ -217,9 +204,7 @@ def _array(group_idx, a, size, fill_value, dtype=None): return ret -def _generic_callable( - group_idx, a, size, fill_value, dtype=None, func=lambda g: g, **kwargs -): +def _generic_callable(group_idx, a, size, fill_value, dtype=None, func=lambda g: g, **kwargs): """groups a by inds, and then applies foo to each group in turn, placing the results in an array.""" groups = _array(group_idx, a, size, ()) @@ -255,9 +240,7 @@ def _cumsum(group_idx, a, size, fill_value=None, dtype=None): def _nancumsum(group_idx, a, size, fill_value=None, dtype=None): a_nonans = np.where(np.isnan(a), 0, a) - group_idx_nonans = np.where( - np.isnan(group_idx), np.nanmax(group_idx) + 1, group_idx - ) + group_idx_nonans = np.where(np.isnan(group_idx), np.nanmax(group_idx) + 1, group_idx) return _cumsum(group_idx_nonans, a_nonans, size, fill_value=fill_value, dtype=dtype) @@ -284,11 +267,7 @@ def _nancumsum(group_idx, a, size, fill_value=None, dtype=None): sumofsquares=_sum_of_squres, generic=_generic_callable, ) -_impl_dict.update( - ("nan" + k, v) - for k, v in list(_impl_dict.items()) - if k not in funcs_no_separate_nan -) +_impl_dict.update(("nan" + k, v) for k, v in list(_impl_dict.items()) if k not in funcs_no_separate_nan) def _aggregate_base( @@ -302,7 +281,7 @@ def _aggregate_base( axis=None, _impl_dict=_impl_dict, is_pandas=False, - **kwargs + **kwargs, ): iv = input_validation(group_idx, a, size=size, order=order, axis=axis, func=func) group_idx, a, flat_size, ndim_idx, size, unravel_shape = iv @@ -312,7 +291,7 @@ def _aggregate_base( group_idx = group_idx.astype(int) func = get_func(func, aliasing, _impl_dict) - if not isstr(func): + if not isinstance(func, str): # do simple grouping and execute function in loop ret = _impl_dict.get("generic", _generic_callable)( group_idx, a, flat_size, fill_value, func=func, dtype=dtype, **kwargs @@ -335,9 +314,7 @@ def _aggregate_base( dtype = check_dtype(dtype, func, a, flat_size) check_fill_value(fill_value, dtype, func=func) func = _impl_dict[func] - ret = func( - group_idx, a, flat_size, fill_value=fill_value, dtype=dtype, **kwargs - ) + ret = func(group_idx, a, flat_size, fill_value=fill_value, dtype=dtype, **kwargs) # deal with ndimensional indexing if ndim_idx > 1: @@ -351,17 +328,7 @@ def _aggregate_base( return ret -def aggregate( - group_idx, - a, - func="sum", - size=None, - fill_value=0, - order="C", - dtype=None, - axis=None, - **kwargs -): +def aggregate(group_idx, a, func="sum", size=None, fill_value=0, order="C", dtype=None, axis=None, **kwargs): return _aggregate_base( group_idx, a, @@ -372,7 +339,7 @@ def aggregate( func=func, axis=axis, _impl_dict=_impl_dict, - **kwargs + **kwargs, ) diff --git a/numpy_groupies/aggregate_numpy_ufunc.py b/numpy_groupies/aggregate_numpy_ufunc.py index bcf3793..784999f 100644 --- a/numpy_groupies/aggregate_numpy_ufunc.py +++ b/numpy_groupies/aggregate_numpy_ufunc.py @@ -1,8 +1,16 @@ import numpy as np from .aggregate_numpy import _aggregate_base -from .utils import aggregate_common_doc, check_boolean, get_func, isstr -from .utils_numpy import aliasing, minimum_dtype, minimum_dtype_scalar, minval, maxval +from .utils import ( + aggregate_common_doc, + aliasing, + check_boolean, + get_func, + maxval, + minimum_dtype, + minimum_dtype_scalar, + minval, +) def _anynan(group_idx, a, size, fill_value, dtype=None): @@ -89,19 +97,9 @@ def _max(group_idx, a, size, fill_value, dtype=None): ) -def aggregate( - group_idx, - a, - func="sum", - size=None, - fill_value=0, - order="C", - dtype=None, - axis=None, - **kwargs -): +def aggregate(group_idx, a, func="sum", size=None, fill_value=0, order="C", dtype=None, axis=None, **kwargs): func = get_func(func, aliasing, _impl_dict) - if not isstr(func): + if not isinstance(func, str): raise NotImplementedError("No such ufunc available") return _aggregate_base( group_idx, @@ -113,7 +111,7 @@ def aggregate( func=func, axis=axis, _impl_dict=_impl_dict, - **kwargs + **kwargs, ) diff --git a/numpy_groupies/aggregate_pandas.py b/numpy_groupies/aggregate_pandas.py index 4770fc0..02cb625 100644 --- a/numpy_groupies/aggregate_pandas.py +++ b/numpy_groupies/aggregate_pandas.py @@ -4,8 +4,13 @@ import pandas as pd from .aggregate_numpy import _aggregate_base -from .utils import aggregate_common_doc, funcs_no_separate_nan, isstr -from .utils_numpy import allnan, anynan, check_dtype +from .utils import ( + aggregate_common_doc, + allnan, + anynan, + check_dtype, + funcs_no_separate_nan, +) def _wrapper(group_idx, a, size, fill_value, func="sum", dtype=None, ddof=0, **kwargs): @@ -31,9 +36,7 @@ def _wrapper(group_idx, a, size, fill_value, func="sum", dtype=None, ddof=0, **k _supported_funcs = "sum prod all any min max mean var std first last cumsum cumprod cummax cummin".split() _impl_dict = {fn: partial(_wrapper, func=fn) for fn in _supported_funcs} _impl_dict.update( - ("nan" + fn, partial(_wrapper, func=fn)) - for fn in _supported_funcs - if fn not in funcs_no_separate_nan + ("nan" + fn, partial(_wrapper, func=fn)) for fn in _supported_funcs if fn not in funcs_no_separate_nan ) _impl_dict.update( allnan=partial(_wrapper, func=allnan), @@ -48,17 +51,7 @@ def _wrapper(group_idx, a, size, fill_value, func="sum", dtype=None, ddof=0, **k ) -def aggregate( - group_idx, - a, - func="sum", - size=None, - fill_value=0, - order="C", - dtype=None, - axis=None, - **kwargs -): +def aggregate(group_idx, a, func="sum", size=None, fill_value=0, order="C", dtype=None, axis=None, **kwargs): return _aggregate_base( group_idx, a, @@ -70,7 +63,7 @@ def aggregate( axis=axis, _impl_dict=_impl_dict, is_pandas=True, - **kwargs + **kwargs, ) diff --git a/numpy_groupies/aggregate_purepy.py b/numpy_groupies/aggregate_purepy.py index da1d3a3..42af100 100644 --- a/numpy_groupies/aggregate_purepy.py +++ b/numpy_groupies/aggregate_purepy.py @@ -1,22 +1,12 @@ -from __future__ import division - import itertools import math import operator -from .utils import ( - aggregate_common_doc, - aliasing, - funcs_no_separate_nan, - get_func, - isstr, -) +from .utils import aggregate_common_doc +from .utils import aliasing_py as aliasing +from .utils import funcs_no_separate_nan, get_func -# min - builtin -# max - builtin -# sum - builtin -# all - builtin -# any - builtin +# min, max, sum, all, any - builtin def _last(x): @@ -77,9 +67,7 @@ def _sort(group_idx, a, reverse=False): def _argsort(unordered): return sorted(range(len(unordered)), key=lambda k: unordered[k]) - sortidx = _argsort( - list((gi, aj) for gi, aj in zip(group_idx, -a if reverse else a)) - ) + sortidx = _argsort(list((gi, aj) for gi, aj in zip(group_idx, -a if reverse else a))) revidx = _argsort(_argsort(group_idx)) a_srt = [a[si] for si in sortidx] return [a_srt[ri] for ri in revidx] @@ -105,24 +93,10 @@ def _argsort(unordered): argmin=_argmin, len=len, ) -_impl_dict.update( - ("nan" + k, v) - for k, v in list(_impl_dict.items()) - if k not in funcs_no_separate_nan -) +_impl_dict.update(("nan" + k, v) for k, v in list(_impl_dict.items()) if k not in funcs_no_separate_nan) -def aggregate( - group_idx, - a, - func="sum", - size=None, - fill_value=0, - order=None, - dtype=None, - axis=None, - **kwargs -): +def aggregate(group_idx, a, func="sum", size=None, fill_value=0, order=None, dtype=None, axis=None, **kwargs): if axis is not None: raise NotImplementedError("axis arg not supported in purepy implementation.") @@ -131,28 +105,21 @@ def aggregate( try: size = 1 + int(max(group_idx)) except (TypeError, ValueError): - raise NotImplementedError( - "pure python implementation doesn't" " accept ndim idx input." - ) + raise NotImplementedError("pure python implementation doesn't accept ndim idx input.") for i in group_idx: try: i = int(i) except (TypeError, ValueError): if isinstance(i, (list, tuple)): - raise NotImplementedError( - "pure python implementation doesn't" " accept ndim idx input." - ) + raise NotImplementedError("pure python implementation doesn't accept ndim idx input.") else: try: len(i) except TypeError: - raise ValueError("invalid value found in group_idx: %s" % i) + raise ValueError(f"invalid value found in group_idx: {i}") else: - raise NotImplementedError( - "pure python implementation doesn't " - "accept ndim indexed input." - ) + raise NotImplementedError("pure python implementation doesn't accept ndim indexed input.") else: if i < 0: raise ValueError("group_idx contains negative value") @@ -160,20 +127,16 @@ def aggregate( func = get_func(func, aliasing, _impl_dict) if isinstance(a, (int, float)): if func not in ("sum", "prod", "len"): - raise ValueError( - "scalar inputs are supported only for 'sum', " "'prod' and 'len'" - ) + raise ValueError("scalar inputs are supported only for 'sum', 'prod' and 'len'") a = [a] * len(group_idx) elif len(group_idx) != len(a): raise ValueError("group_idx and a must be of the same length") - if isstr(func): + if isinstance(func, str): if func.startswith("nan"): func = func[3:] # remove nans - group_idx, a = zip( - *((ix, val) for ix, val in zip(group_idx, a) if not math.isnan(val)) - ) + group_idx, a = zip(*((ix, val) for ix, val in zip(group_idx, a) if not math.isnan(val))) func = _impl_dict[func] if func is _sort: diff --git a/numpy_groupies/aggregate_weave.py b/numpy_groupies/aggregate_weave.py deleted file mode 100644 index e6b8e26..0000000 --- a/numpy_groupies/aggregate_weave.py +++ /dev/null @@ -1,448 +0,0 @@ -import numpy as np - -try: - from weave import inline -except ImportError: - from scipy.weave import inline - -from .utils import ( - aggregate_common_doc, - check_boolean, - funcs_no_separate_nan, - get_func, - isstr, -) -from .utils_numpy import aliasing, check_dtype, check_fill_value, input_validation - -optimized_funcs = { - "sum", - "min", - "max", - "amin", - "amax", - "mean", - "var", - "std", - "prod", - "len", - "nansum", - "nanmin", - "nanmax", - "nanmean", - "nanvar", - "nanstd", - "nanprod", - "nanlen", - "all", - "any", - "nanall", - "nanany", - "allnan", - "anynan", - "first", - "last", - "nanfirst", - "nanlast", -} - -# c_funcs will contain all generated c code, so it can be read easily for debugging -c_funcs = dict() -c_iter = dict() -c_iter_scalar = dict() -c_finish = dict() - -# Set this for testing, to fail deprecated C-API calls -# c_macros = [('NPY_NO_DEPRECATED_API', 'NPY_1_7_API_VERSION')] -c_macros = [] -c_args = ["-Wno-cpp"] # Suppress the deprecation warnings created by weave - - -def c_size(varname): - return r""" - long L%(varname)s = 1; - for (int n=0; n=0; i--) { - %(ri_redir)s - %(iter)s - } - %(finish)s - """ - -c_iter[ - "sum" -] = r""" - counter[ri] = 0; - ret[ri] += a[i];""" - -c_iter_scalar[ - "sum" -] = r""" - counter[ri] = 0; - ret[ri] += a;""" - -c_iter[ - "prod" -] = r""" - counter[ri] = 0; - ret[ri] *= a[i];""" - -c_iter_scalar[ - "prod" -] = r""" - counter[ri] = 0; - ret[ri] *= a;""" - -c_iter[ - "len" -] = r""" - counter[ri] = 0; - ret[ri] += 1;""" - -c_iter_scalar[ - "len" -] = r""" - counter[ri] = 0; - ret[ri] += 1;""" - -c_iter[ - "all" -] = r""" - counter[ri] = 0; - ret[ri] &= (a[i] != 0);""" - -c_iter[ - "any" -] = r""" - counter[ri] = 0; - ret[ri] |= (a[i] != 0);""" - -c_iter[ - "last" -] = r""" - ret[ri] = a[i];""" - -c_iter[ - "allnan" -] = r""" - counter[ri] = 0; - ret[ri] &= (a[i] != a[i]);""" - -c_iter[ - "anynan" -] = r""" - counter[ri] = 0; - ret[ri] |= (a[i] != a[i]);""" - -c_iter[ - "max" -] = r""" - if (counter[ri]) { - ret[ri] = a[i]; - counter[ri] = 0; - } - else if (ret[ri] < a[i]) ret[ri] = a[i];""" - -c_iter[ - "min" -] = r""" - if (counter[ri]) { - ret[ri] = a[i]; - counter[ri] = 0; - } - else if (ret[ri] > a[i]) ret[ri] = a[i];""" - -c_iter[ - "mean" -] = r""" - counter[ri]++; - ret[ri] += a[i];""" - -c_finish[ - "mean" -] = r""" - for (long ri=0; ri 1: - if unravel_shape is not None: - # A negative fill_value cannot, and should not, be unraveled. - mask = ret == fill_value - ret[mask] = 0 - ret = np.unravel_index(ret, unravel_shape)[axis] - ret[mask] = fill_value - ret = ret.reshape(size, order=order) - return ret - - -aggregate.__doc__ = ( - """ - This is the weave based implementation of aggregate. - - **NOTE:** If weave is installed but fails to run (probably because you - have not setup a suitable compiler) then you can manually select the numpy - implementation by using:: - - - import numpy_groupies as npg - # NOT THIS: npg.aggregate(...) - npg.aggregate_np(...) - - - """ - + aggregate_common_doc -) diff --git a/numpy_groupies/benchmarks/generic.py b/numpy_groupies/benchmarks/generic.py index d141b86..eb24274 100644 --- a/numpy_groupies/benchmarks/generic.py +++ b/numpy_groupies/benchmarks/generic.py @@ -1,13 +1,14 @@ #!/usr/bin/python -B -from __future__ import print_function -import sys + import platform +import sys import timeit from operator import itemgetter + import numpy as np from numpy_groupies.tests import _implementations, aggregate_numpy -from numpy_groupies.utils_numpy import allnan, anynan, nanfirst, nanlast +from numpy_groupies.utils import allnan, anynan, nanfirst, nanlast def aggregate_grouploop(*args, **kwargs): @@ -83,17 +84,14 @@ def benchmark_data(size=5e5, seed=100): nana = a.copy() nana[(nana < 0.2) & (nana != 0)] = np.nan nan_share = np.mean(np.isnan(nana)) - assert 0.15 < nan_share < 0.25, "%3f%% nans" % (nan_share * 100) + assert 0.15 < nan_share < 0.25, f"{nan_share * 100:3f}% nans" return a, nana, group_idx def benchmark(implementations, repeat=5, size=5e5, seed=100, raise_errors=False): a, nana, group_idx = benchmark_data(size=size, seed=seed) - print( - "function" - + "".join(impl.__name__.rsplit("_", 1)[1].rjust(14) for impl in implementations) - ) + print("function" + "".join(impl.__name__.rsplit("_", 1)[1].rjust(14) for impl in implementations)) print("-" * (9 + 14 * len(implementations))) for func in func_list: func_name = getattr(func, "__name__", func) @@ -124,11 +122,11 @@ def benchmark(implementations, repeat=5, size=5e5, seed=100, raise_errors=False) print("FAIL".rjust(14), end="") else: t0 = min( - timeit.Timer( - lambda: aggregatefunc(group_idx, used_a, func=func) - ).repeat(repeat=repeat, number=1) + timeit.Timer(lambda: aggregatefunc(group_idx, used_a, func=func)).repeat( + repeat=repeat, number=1 + ) ) - print(("%.3f" % (t0 * 1000)).rjust(14), end="") + print(f"{t0 * 1000:.3f}".rjust(14), end="") sys.stdout.flush() print() @@ -137,32 +135,18 @@ def benchmark(implementations, repeat=5, size=5e5, seed=100, raise_errors=False) if "numba" in implementation_names: import numba - postfix += ", Numba %s" % numba.__version__ - if "weave" in implementation_names: - import weave - - postfix += ", Weave %s" % weave.__version__ + postfix += f", Numba {numba.__version__}" if "pandas" in implementation_names: import pandas - postfix += ", Pandas %s" % pandas.__version__ + postfix += f", Pandas {pandas.__version__}" print( - "%s(%s), Python %s, Numpy %s%s" - % ( - platform.system(), - platform.machine(), - sys.version.split()[0], - np.version.version, - postfix, - ) + f"{platform.system()}({platform.machine()}), Python {sys.version.split()[0]}, Numpy {np.version.version}" + f"{postfix}" ) if __name__ == "__main__": - implementations = ( - _implementations if "--purepy" in sys.argv else _implementations[1:] - ) - implementations = ( - implementations if "--pandas" in sys.argv else implementations[:-1] - ) + implementations = _implementations if "--purepy" in sys.argv else _implementations[1:] + implementations = implementations if "--pandas" in sys.argv else implementations[:-1] benchmark(implementations, raise_errors=False) diff --git a/numpy_groupies/benchmarks/simple.py b/numpy_groupies/benchmarks/simple.py index 327078d..e134daa 100644 --- a/numpy_groupies/benchmarks/simple.py +++ b/numpy_groupies/benchmarks/simple.py @@ -1,14 +1,12 @@ #!/usr/bin/python -B -# -*- coding: utf-8 -*- -from __future__ import print_function import timeit -import numpy as np +import numpy as np -from numpy_groupies.utils import aliasing -from numpy_groupies import aggregate_py, aggregate_np, aggregate_ufunc +from numpy_groupies import aggregate_np, aggregate_py, aggregate_ufunc from numpy_groupies.aggregate_pandas import aggregate as aggregate_pd +from numpy_groupies.utils import aliasing def aggregate_group_loop(*args, **kwargs): @@ -19,8 +17,6 @@ def aggregate_group_loop(*args, **kwargs): return aggregate_np(*args, func=lambda x: func(x), **kwargs) -print("TODO: use more extensive tests") -print("") print("-----simple examples----------") test_a = np.array([12.0, 3.2, -15, 88, 12.9]) test_group_idx = np.array([1, 0, 1, 4, 1]) @@ -32,11 +28,9 @@ def aggregate_group_loop(*args, **kwargs): print("aggregate(test_group_idx, test_a, sz=8, func='min', fill_value=np.nan):") print(aggregate_np(test_group_idx, test_a, size=8, func="min", fill_value=np.nan)) # array([3.2, -15., nan, 88., nan, nan, nan, nan]) +print("aggregate_py(test_group_idx, test_a, sz=5, func=lambda x: ' + '.join(str(xx) for xx in x),fill_value='')") print( - "aggregate(test_group_idx, test_a, sz=5, func=lambda x: ' + '.join(str(xx) for xx in x),fill_value='')" -) -print( - aggregate_np( + aggregate_py( test_group_idx, test_a, size=5, @@ -49,10 +43,7 @@ def aggregate_group_loop(*args, **kwargs): print("") print("---------testing--------------") print("compare against group-and-loop with numpy") -testable_funcs = { - aliasing[f]: f - for f in (np.sum, np.prod, np.any, np.all, np.min, np.max, np.std, np.var, np.mean) -} +testable_funcs = {aliasing[f]: f for f in (np.sum, np.prod, np.any, np.all, np.min, np.max, np.std, np.var, np.mean)} test_group_idx = np.random.randint(0, int(1e3), int(1e5)) test_a = np.random.rand(int(1e5)) * 100 - 50 test_a[test_a > 25] = 0 # for use with bool functions @@ -82,12 +73,10 @@ def aggregate_group_loop(*args, **kwargs): print("") print("----------benchmarking-------------") -print( - "Note that the actual observed speedup depends on a variety of properties of the input." -) +print("Note that the actual observed speedup depends on a variety of properties of the input.") print("Here we are using 100,000 indices uniformly picked from [0, 1000).") print("Specifically, about 25% of the values are 0 (for use with bool operations),") -print("the remainder are uniformly distribuited on [-50,25).") +print("the remainder are uniformly distributed on [-50,25).") print("Times are scaled to 10 repetitions (actual number of reps used may not be 10).") print( @@ -120,19 +109,12 @@ def aggregate_group_loop(*args, **kwargs): func = f if acc_func is aggregate_group_loop else name reps = 3 if acc_func is aggregate_py else 20 times[ii] = ( - timeit.Timer( - lambda: acc_func(test_group_idx, test_a, func=func) - ).timeit(number=reps) - / reps - * 10 + timeit.Timer(lambda: acc_func(test_group_idx, test_a, func=func)).timeit(number=reps) / reps * 10 ) - print(("%.1fms" % ((times[ii] * 1000))).rjust(13), end="") + print(f"{times[ii] * 1000:.1f}ms".rjust(13), end="") except NotImplementedError: print("no-impl".rjust(13), end="") denom = min(t for t in times if t is not None) - ratios = [ - ("-".center(4) if t is None else str(round(t / denom, 1))).center(5) - for t in times - ] + ratios = [("-".center(4) if t is None else str(round(t / denom, 1))).center(5) for t in times] print(" ", (":".join(ratios))) diff --git a/numpy_groupies/tests/__init__.py b/numpy_groupies/tests/__init__.py index 84b5250..f0dcddd 100644 --- a/numpy_groupies/tests/__init__.py +++ b/numpy_groupies/tests/__init__.py @@ -1,15 +1,11 @@ import pytest -from .. import aggregate_purepy, aggregate_numpy_ufunc, aggregate_numpy +from .. import aggregate_numpy, aggregate_numpy_ufunc, aggregate_purepy try: from .. import aggregate_numba except ImportError: aggregate_numba = None -try: - from .. import aggregate_weave -except ImportError: - aggregate_weave = None try: from .. import aggregate_pandas except ImportError: @@ -20,14 +16,13 @@ aggregate_numpy_ufunc, aggregate_numpy, aggregate_numba, - aggregate_weave, aggregate_pandas, ] _implementations = [i for i in _implementations if i is not None] def _impl_name(impl): - if not impl or type(impl).__name__ == 'NotSetType': + if not impl or type(impl).__name__ == "NotSetType": return return impl.__name__.rsplit("aggregate_", 1)[1].rsplit("_", 1)[-1] @@ -37,22 +32,6 @@ def _impl_name(impl): "purepy": ("cumsum", "cumprod", "cummax", "cummin", "sumofsquares"), "numba": ("array", "list", "sort"), "pandas": ("array", "list", "sort", "sumofsquares", "nansumofsquares"), - "weave": ( - "argmin", - "argmax", - "array", - "list", - "sort", - "cumsum", - "cummax", - "cummin", - "nanargmin", - "nanargmax", - "sumofsquares", - "nansumofsquares", - "", - "custom_callable", - ), "ufunc": "NO_CHECK", } diff --git a/numpy_groupies/tests/test_compare.py b/numpy_groupies/tests/test_compare.py index 6686afe..e4158fc 100644 --- a/numpy_groupies/tests/test_compare.py +++ b/numpy_groupies/tests/test_compare.py @@ -6,18 +6,18 @@ """ import sys from itertools import product + import numpy as np import pytest from . import ( - aggregate_purepy, - aggregate_numpy_ufunc, - aggregate_numpy, - aggregate_weave, + _impl_name, + _wrap_notimplemented_xfail, aggregate_numba, + aggregate_numpy, + aggregate_numpy_ufunc, aggregate_pandas, - _wrap_notimplemented_xfail, - _impl_name, + aggregate_purepy, func_list, ) @@ -27,8 +27,6 @@ class AttrDict(dict): TEST_PAIRS = ["np/py", "ufunc/np", "numba/np", "pandas/np"] -if sys.version_info.major == 2: - TEST_PAIRS.append("weave/np") @pytest.fixture(params=TEST_PAIRS, scope="module") @@ -46,8 +44,6 @@ def aggregate_cmp(request, seed=100): impl = aggregate_numpy_ufunc elif "numba" in request.param: impl = aggregate_numba - elif "weave" in request.param: - impl = aggregate_weave elif "pandas" in request.param: impl = aggregate_pandas else: @@ -109,19 +105,13 @@ def test_cmp(aggregate_cmp, func, fill_value, decimal=10): is_nanfunc = "nan" in getattr(func, "__name__", func) a = aggregate_cmp.nana if is_nanfunc else aggregate_cmp.a try: - ref = aggregate_cmp.func_ref( - aggregate_cmp.group_idx, a, func=func, fill_value=fill_value - ) + ref = aggregate_cmp.func_ref(aggregate_cmp.group_idx, a, func=func, fill_value=fill_value) except ValueError: with pytest.raises(ValueError): - aggregate_cmp.func( - aggregate_cmp.group_idx, a, func=func, fill_value=fill_value - ) + aggregate_cmp.func(aggregate_cmp.group_idx, a, func=func, fill_value=fill_value) else: try: - res = aggregate_cmp.func( - aggregate_cmp.group_idx, a, func=func, fill_value=fill_value - ) + res = aggregate_cmp.func(aggregate_cmp.group_idx, a, func=func, fill_value=fill_value) except ValueError: if np.isnan(fill_value) and aggregate_cmp.test_pair.endswith("py"): pytest.xfail( @@ -135,9 +125,7 @@ def test_cmp(aggregate_cmp, func, fill_value, decimal=10): np.testing.assert_allclose(res, ref, rtol=10**-decimal) except AssertionError: if "arg" in func and aggregate_cmp.test_pair.startswith("pandas"): - pytest.xfail( - "pandas doesn't fill indices for all-nan groups with fill_value, but with -inf instead" - ) + pytest.xfail("pandas doesn't fill indices for all-nan groups with fill_value, but with -inf instead") else: raise diff --git a/numpy_groupies/tests/test_generic.py b/numpy_groupies/tests/test_generic.py index 1c7146f..0519161 100644 --- a/numpy_groupies/tests/test_generic.py +++ b/numpy_groupies/tests/test_generic.py @@ -2,10 +2,11 @@ import itertools import warnings + import numpy as np import pytest -from . import _implementations, _impl_name, _wrap_notimplemented_xfail, func_list +from . import _impl_name, _implementations, _wrap_notimplemented_xfail, func_list @pytest.fixture(params=_implementations, ids=_impl_name) @@ -55,9 +56,7 @@ def test_start_with_offset(aggregate_all): assert "int" in res.dtype.name -@pytest.mark.parametrize( - "floatfunc", [np.std, np.var, np.mean], ids=lambda x: x.__name__ -) +@pytest.mark.parametrize("floatfunc", [np.std, np.var, np.mean], ids=lambda x: x.__name__) def test_float_enforcement(aggregate_all, floatfunc): group_idx = np.arange(10).repeat(3) a = np.arange(group_idx.size) @@ -89,9 +88,7 @@ def test_shape_mismatch(aggregate_all): def test_create_lists(aggregate_all): - res = aggregate_all( - np.array([0, 1, 3, 1, 3]), np.arange(101, 106, dtype=int), func=list - ) + res = aggregate_all(np.array([0, 1, 3, 1, 3]), np.arange(101, 106, dtype=int), func=list) np.testing.assert_array_equal(np.array(res[0]), np.array([101])) assert res[2] == 0 np.testing.assert_array_equal(np.array(res[3]), np.array([103, 105])) @@ -104,9 +101,7 @@ def test_item_counting(aggregate_all): np.testing.assert_array_equal(res, np.array([0, 0, 0, 1, 1, 1, 0, 0, 1])) -@pytest.mark.parametrize( - ["func", "fill_value"], [(np.array, None), (np.sum, -1)], ids=["array", "sum"] -) +@pytest.mark.parametrize(["func", "fill_value"], [(np.array, None), (np.sum, -1)], ids=["array", "sum"]) def test_fill_value(aggregate_all, func, fill_value): group_idx = np.array([0, 2, 2], dtype=int) res = aggregate_all( @@ -122,9 +117,7 @@ def test_fill_value(aggregate_all, func, fill_value): def test_array_ordering(aggregate_all, order, size=10): mat = np.zeros((size, size), order=order, dtype=float) mat.flat[:] = np.arange(size * size) - assert aggregate_all(np.zeros(size, dtype=int), mat[0, :], order=order)[0] == sum( - range(size) - ) + assert aggregate_all(np.zeros(size, dtype=int), mat[0, :], order=order)[0] == sum(range(size)) @pytest.mark.deselect_if(func=_deselect_purepy) @@ -190,15 +183,11 @@ def test_first_last(aggregate_all, first_last): res = aggregate_all(group_idx, a, func=first_last, fill_value=-1) ref = np.zeros(np.max(group_idx) + 1) ref.fill(-1) - ref[::2] = np.arange( - 0 if first_last == "first" else 4, group_idx.size, 5, dtype=int - ) + ref[::2] = np.arange(0 if first_last == "first" else 4, group_idx.size, 5, dtype=int) np.testing.assert_array_equal(res, ref) -@pytest.mark.parametrize( - ["first_last", "nanoffset"], itertools.product(["nanfirst", "nanlast"], [0, 2, 4]) -) +@pytest.mark.parametrize(["first_last", "nanoffset"], itertools.product(["nanfirst", "nanlast"], [0, 2, 4])) def test_nan_first_last(aggregate_all, first_last, nanoffset): group_idx = np.arange(0, 100, 2, dtype=int).repeat(5) a = np.arange(group_idx.size, dtype=float) @@ -230,9 +219,7 @@ def test_ddof(aggregate_all, func, ddof, size=20): def test_scalar_input(aggregate_all, func): group_idx = np.arange(0, 100, dtype=int).repeat(5) if func not in ("sum", "prod"): - pytest.raises( - (ValueError, NotImplementedError), aggregate_all, group_idx, 1, func=func - ) + pytest.raises((ValueError, NotImplementedError), aggregate_all, group_idx, 1, func=func) else: res = aggregate_all(group_idx, 1, func=func) ref = aggregate_all(group_idx, np.ones_like(group_idx, dtype=int), func=func) @@ -295,9 +282,7 @@ def test_argmin_argmax_nans(aggregate_all): @pytest.mark.deselect_if(func=_deselect_purepy) def test_nanargmin_nanargmax_nans(aggregate_all): if aggregate_all.__name__.endswith("pandas"): - pytest.xfail( - "pandas doesn't fill indices for all-nan groups with fill_value but with -inf instead" - ) + pytest.xfail("pandas doesn't fill indices for all-nan groups with fill_value but with -inf instead") group_idx = np.array([0, 0, 0, 0, 3, 3, 3, 3]) a = np.array([4, 4, np.nan, 1, np.nan, np.nan, np.nan, np.nan]) @@ -516,9 +501,7 @@ def test_argreduction_negative_fill_value(aggregate_all): @pytest.mark.deselect_if(func=_deselect_purepy) -@pytest.mark.parametrize( - "nan_inds", (None, tuple([[1, 4, 5], Ellipsis]), tuple((1, (0, 1, 2, 3)))) -) +@pytest.mark.parametrize("nan_inds", (None, tuple([[1, 4, 5], Ellipsis]), tuple((1, (0, 1, 2, 3))))) @pytest.mark.parametrize("ddof", (0, 1)) @pytest.mark.parametrize("func", ("nanvar", "nanstd")) def test_var_with_nan_fill_value(aggregate_all, ddof, nan_inds, func): @@ -533,7 +516,5 @@ def test_var_with_nan_fill_value(aggregate_all, ddof, nan_inds, func): warnings.simplefilter("ignore", RuntimeWarning) expected = getattr(np, func)(a, keepdims=True, axis=-1, ddof=ddof) - actual = aggregate_all( - group_idx, a, axis=-1, fill_value=np.nan, func=func, ddof=ddof - ) + actual = aggregate_all(group_idx, a, axis=-1, fill_value=np.nan, func=func, ddof=ddof) np.testing.assert_equal(actual, expected) diff --git a/numpy_groupies/tests/test_indices.py b/numpy_groupies/tests/test_indices.py index 6e99912..15ff23c 100644 --- a/numpy_groupies/tests/test_indices.py +++ b/numpy_groupies/tests/test_indices.py @@ -1,9 +1,9 @@ -import pytest import numpy as np +import pytest -from . import aggregate_weave, aggregate_numba, _impl_name +from . import _impl_name, aggregate_numba -_implementations = [aggregate_weave, aggregate_numba] +_implementations = [aggregate_numba] _implementations = [i for i in _implementations if i is not None] diff --git a/numpy_groupies/tests/test_utils.py b/numpy_groupies/tests/test_utils.py index e73d38e..c456e99 100644 --- a/numpy_groupies/tests/test_utils.py +++ b/numpy_groupies/tests/test_utils.py @@ -1,6 +1,6 @@ import numpy as np -from ..utils_numpy import check_dtype, unpack +from ..utils import check_dtype, unpack def test_check_dtype(): diff --git a/numpy_groupies/utils.py b/numpy_groupies/utils.py index 4f153d7..6d1b96f 100644 --- a/numpy_groupies/utils.py +++ b/numpy_groupies/utils.py @@ -1,4 +1,6 @@ -"""Common helpers without certain dependencies.""" +"""Common functionality for all aggregate implementations.""" + +import numpy as np aggregate_common_doc = """ See readme file at https://github.com/ml31415/numpy-groupies for a full @@ -9,10 +11,10 @@ group_idx: this is an array of non-negative integers, to be used as the "labels" with which to group the values in ``a``. Although we have so far - assumed that ``group_idx`` is one-dimesnaional, and the same length as + assumed that ``group_idx`` is one-dimensional, and the same length as ``a``, it can in fact be two-dimensional (or some form of nested sequences that can be converted to 2D). When ``group_idx`` is 2D, the - size of the 0th dimension corresponds to the number of dimesnions in + size of the 0th dimension corresponds to the number of dimensions in the output, i.e. ``group_idx[i,j]`` gives the index into the ith dimension in the output for ``a[j]``. Note that ``a`` should still be 1D (or scalar), with @@ -52,7 +54,9 @@ (see above). """ -funcs_common = "first last len mean var std allnan anynan max min argmax argmin sumofsquares cumsum cumprod cummax cummin".split() +funcs_common = ( + "first last len mean var std allnan anynan max min argmax argmin sumofsquares cumsum cumprod cummax cummin".split() +) funcs_no_separate_nan = frozenset(["sort", "rsort", "array", "allnan", "anynan"]) @@ -92,11 +96,50 @@ } +_alias_numpy = { + np.add: "sum", + np.sum: "sum", + np.any: "any", + np.all: "all", + np.multiply: "prod", + np.prod: "prod", + np.amin: "min", + np.min: "min", + np.minimum: "min", + np.amax: "max", + np.max: "max", + np.maximum: "max", + np.argmax: "argmax", + np.argmin: "argmin", + np.mean: "mean", + np.std: "std", + np.var: "var", + np.array: "array", + np.asarray: "array", + np.sort: "sort", + np.nansum: "nansum", + np.nanprod: "nanprod", + np.nanmean: "nanmean", + np.nanvar: "nanvar", + np.nanmax: "nanmax", + np.nanmin: "nanmin", + np.nanstd: "nanstd", + np.nanargmax: "nanargmax", + np.nanargmin: "nanargmin", + np.cumsum: "cumsum", + np.cumprod: "cumprod", +} + + def get_aliasing(*extra): - """The assembles the dict mapping strings and functions to the list of - supported function names: - e.g. alias['add'] = 'sum' and alias[sorted] = 'sort' - This funciton should only be called during import. + """ + Assembles a dictionary that maps both strings and functions to a list of supported function names. + + Examples: + alias['add'] = 'sum' + alias[sorted] = 'sort' + + This function should only be called during import. """ alias = dict((k, k) for k in funcs_common) alias.update(_alias_str) @@ -113,7 +156,8 @@ def get_aliasing(*extra): return alias -aliasing = get_aliasing() +aliasing_py = get_aliasing() +aliasing = get_aliasing(_alias_numpy) def get_func(func, aliasing, implementations): @@ -127,13 +171,10 @@ def get_func(func, aliasing, implementations): if func_str in implementations: return func_str if func_str.startswith("nan") and func_str[3:] in funcs_no_separate_nan: - raise ValueError("%s does not have a nan-version".format(func_str[3:])) + raise ValueError(f"{func_str[3:]} does not have a nan-version") else: raise NotImplementedError("No such function available") - raise ValueError( - "func {} is neither a valid function string nor a " - "callable object".format(func) - ) + raise ValueError(f"func {func} is neither a valid function string nor a callable object") def check_boolean(x): @@ -141,13 +182,479 @@ def check_boolean(x): raise ValueError("Value not boolean") -try: - basestring # Attempt to evaluate basestring +_next_int_dtype = dict( + bool=np.int8, + uint8=np.int16, + int8=np.int16, + uint16=np.int32, + int16=np.int32, + uint32=np.int64, + int32=np.int64, +) + +_next_float_dtype = dict( + float16=np.float32, + float32=np.float64, + float64=np.complex64, + complex64=np.complex128, +) + + +def minimum_dtype(x, dtype=np.bool_): + """ + Returns the "most basic" dtype which represents `x` properly, which provides at least the same + value range as the specified dtype. + """ + + def check_type(x, dtype): + try: + converted = np.array(x).astype(dtype) + except (ValueError, OverflowError, RuntimeWarning): + return False + # False if some overflow has happened + return converted == x or np.isnan(x) + + def type_loop(x, dtype, dtype_dict, default=None): + while True: + try: + dtype = np.dtype(dtype_dict[dtype.name]) + if check_type(x, dtype): + return np.dtype(dtype) + except KeyError: + if default is not None: + return np.dtype(default) + raise ValueError(f"Can not determine dtype of {x!r}") + + dtype = np.dtype(dtype) + if check_type(x, dtype): + return dtype + + if np.issubdtype(dtype, np.inexact): + return type_loop(x, dtype, _next_float_dtype) + else: + return type_loop(x, dtype, _next_int_dtype, default=np.float32) + + +def minimum_dtype_scalar(x, dtype, a): + if dtype is None: + dtype = np.dtype(type(a)) if isinstance(a, (int, float)) else a.dtype + return minimum_dtype(x, dtype) + + +_forced_types = { + "array": object, + "all": bool, + "any": bool, + "nanall": bool, + "nanany": bool, + "len": np.int64, + "nanlen": np.int64, + "allnan": bool, + "anynan": bool, + "argmax": np.int64, + "argmin": np.int64, + "nanargmin": np.int64, + "nanargmax": np.int64, +} +_forced_float_types = {"mean", "var", "std", "nanmean", "nanvar", "nanstd"} +_forced_same_type = { + "min", + "max", + "first", + "last", + "nanmin", + "nanmax", + "nanfirst", + "nanlast", +} + + +def check_dtype(dtype, func_str, a, n): + if np.isscalar(a) or not a.shape: + if func_str not in ("sum", "prod", "len"): + raise ValueError("scalar inputs are supported only for 'sum', 'prod' and 'len'") + a_dtype = np.dtype(type(a)) + else: + a_dtype = a.dtype + + if dtype is not None: + # dtype set by the user + # Careful here: np.bool != np.bool_ ! + if np.issubdtype(dtype, np.bool_) and not ("all" in func_str or "any" in func_str): + raise TypeError(f"function {func_str} requires a more complex datatype than bool") + if not np.issubdtype(dtype, np.integer) and func_str in ("len", "nanlen"): + raise TypeError(f"function {func_str} requires an integer datatype") + # TODO: Maybe have some more checks here + return np.dtype(dtype) + else: + try: + return np.dtype(_forced_types[func_str]) + except KeyError: + if func_str in _forced_float_types: + if np.issubdtype(a_dtype, np.floating): + return a_dtype + else: + return np.dtype(np.float64) + else: + if func_str == "sum": + # Try to guess the minimally required int size + if np.issubdtype(a_dtype, np.int64): + # It's not getting bigger anymore + # TODO: strictly speaking it might need float + return np.dtype(np.int64) + elif np.issubdtype(a_dtype, np.integer): + maxval = np.iinfo(a_dtype).max * n + return minimum_dtype(maxval, a_dtype) + elif np.issubdtype(a_dtype, np.bool_): + return minimum_dtype(n, a_dtype) + else: + # floating, inexact, whatever + return a_dtype + elif func_str in _forced_same_type: + return a_dtype + else: + if isinstance(a_dtype, np.integer): + return np.dtype(np.int64) + else: + return a_dtype + + +def minval(fill_value, dtype): + dtype = minimum_dtype(fill_value, dtype) + if issubclass(dtype.type, np.floating): + return -np.inf + if issubclass(dtype.type, np.integer): + return np.iinfo(dtype).min + return np.finfo(dtype).min + + +def maxval(fill_value, dtype): + dtype = minimum_dtype(fill_value, dtype) + if issubclass(dtype.type, np.floating): + return np.inf + if issubclass(dtype.type, np.integer): + return np.iinfo(dtype).max + return np.finfo(dtype).max + + +def check_fill_value(fill_value, dtype, func=None): + if func in ("all", "any", "allnan", "anynan"): + check_boolean(fill_value) + else: + try: + return dtype.type(fill_value) + except ValueError: + raise ValueError(f"fill_value must be convertible into {dtype.type.__name__}") + + +def check_group_idx(group_idx, a=None, check_min=True): + if a is not None and group_idx.size != a.size: + raise ValueError("The size of group_idx must be the same as a.size") + if not issubclass(group_idx.dtype.type, np.integer): + raise TypeError("group_idx must be of integer type") + if check_min and np.min(group_idx) < 0: + raise ValueError("group_idx contains negative indices") + + +def _ravel_group_idx(group_idx, a, axis, size, order, method="ravel"): + ndim_a = a.ndim + # Create the broadcast-ready multidimensional indexing. + # Note the user could do this themselves, so this is + # very much just a convenience. + size_in = int(np.max(group_idx)) + 1 if size is None else size + group_idx_in = group_idx + group_idx = [] + size = [] + for ii, s in enumerate(a.shape): + if method == "ravel": + ii_idx = group_idx_in if ii == axis else np.arange(s) + ii_shape = [1] * ndim_a + ii_shape[ii] = s + group_idx.append(ii_idx.reshape(ii_shape)) + size.append(size_in if ii == axis else s) + # Use the indexing, and return. It's a bit simpler than + # using trying to keep all the logic below happy + if method == "ravel": + group_idx = np.ravel_multi_index(group_idx, size, order=order, mode="raise") + elif method == "offset": + group_idx = offset_labels(group_idx_in, a.shape, axis, order, size_in) + return group_idx, size + + +def offset_labels(group_idx, inshape, axis, order, size): + """ + Offset group labels by dimension. This is used when we reduce over a subset of the dimensions of + group_idx. It assumes that the reductions dimensions have been flattened in the last dimension + Copied from + https://stackoverflow.com/questions/46256279/bin-elements-per-row-vectorized-2d-bincount-for-numpy + """ + + newaxes = tuple(ax for ax in range(len(inshape)) if ax != axis) + group_idx = np.broadcast_to(np.expand_dims(group_idx, newaxes), inshape) + if axis not in (-1, len(inshape) - 1): + group_idx = np.moveaxis(group_idx, axis, -1) + newshape = group_idx.shape[:-1] + (-1,) + + group_idx = group_idx + np.arange(np.prod(newshape[:-1]), dtype=int).reshape(newshape) * size + if axis not in (-1, len(inshape) - 1): + return np.moveaxis(group_idx, -1, axis) + else: + return group_idx + + +def input_validation( + group_idx, + a, + size=None, + order="C", + axis=None, + ravel_group_idx=True, + check_bounds=True, + func=None, +): + """ + Do some fairly extensive checking of group_idx and a, trying to give the user as much help as + possible with what is wrong. Also, convert ndim-indexing to 1d indexing. + """ + if not isinstance(a, (int, float, complex)) and not is_duck_array(a): + a = np.asanyarray(a) + if not is_duck_array(group_idx): + group_idx = np.asanyarray(group_idx) + + if not np.issubdtype(group_idx.dtype, np.integer): + raise TypeError("group_idx must be of integer type") + + # This check works for multidimensional indexing as well + if check_bounds and np.any(group_idx < 0): + raise ValueError("negative indices not supported") + + ndim_idx = np.ndim(group_idx) + ndim_a = np.ndim(a) + + # Deal with the axis arg: if present, then turn 1d indexing into + # multi-dimensional indexing along the specified axis. + if axis is None: + if ndim_a > 1: + raise ValueError("a must be scalar or 1 dimensional, use .ravel to flatten. Alternatively specify axis.") + elif axis >= ndim_a or axis < -ndim_a: + raise ValueError("axis arg too large for np.ndim(a)") + else: + axis = axis if axis >= 0 else ndim_a + axis # negative indexing + if ndim_idx > 1: + # TODO: we could support a sequence of axis values for multiple + # dimensions of group_idx. + raise NotImplementedError("only 1d indexing currently supported with axis arg.") + elif a.shape[axis] != len(group_idx): + raise ValueError("a.shape[axis] doesn't match length of group_idx.") + elif size is not None and not np.isscalar(size): + raise NotImplementedError("when using axis arg, size must be None or scalar.") + else: + is_form_3 = group_idx.ndim == 1 and a.ndim > 1 and axis is not None + orig_shape = a.shape if is_form_3 else group_idx.shape + if isinstance(func, str) and "arg" in func: + unravel_shape = orig_shape + else: + unravel_shape = None + + method = "offset" if axis == ndim_a - 1 else "ravel" + group_idx, size = _ravel_group_idx(group_idx, a, axis, size, order, method=method) + flat_size = np.prod(size) + ndim_idx = ndim_a + size = orig_shape if is_form_3 and not callable(func) and "cum" in func else size + return ( + group_idx.ravel(), + a.ravel(), + flat_size, + ndim_idx, + size, + unravel_shape, + ) + + if ndim_idx == 1: + if size is None: + size = int(np.max(group_idx)) + 1 + else: + if not np.isscalar(size): + raise ValueError("output size must be scalar or None") + if check_bounds and np.any(group_idx > size - 1): + raise ValueError(f"one or more indices are too large for size {size}") + flat_size = size + else: + if size is None: + size = np.max(group_idx, axis=1).astype(int) + 1 + elif np.isscalar(size): + raise ValueError(f"output size must be of length {len(group_idx)}") + elif len(size) != len(group_idx): + raise ValueError(f"{len(size)} sizes given, but {len(group_idx)} output dimensions specified in index") + if ravel_group_idx: + group_idx = np.ravel_multi_index(group_idx, size, order=order, mode="raise") + flat_size = np.prod(size) + + if not (np.ndim(a) == 0 or len(a) == group_idx.size): + raise ValueError("group_idx and a must be of the same length, or a can be scalar") + + return group_idx, a, flat_size, ndim_idx, size, None + + +# General tools + + +def unpack(group_idx, ret): + """ + Take an aggregate packed array and uncompress it to the size of group_idx. This is equivalent to + ret[group_idx]. + """ + return ret[group_idx] + + +def allnan(x): + return np.all(np.isnan(x)) + + +def anynan(x): + return np.any(np.isnan(x)) + + +def nanfirst(x): + return x[~np.isnan(x)][0] + + +def nanlast(x): + return x[~np.isnan(x)][-1] - def isstr(s): - return isinstance(s, basestring) -except NameError: - # Probably Python 3.x - def isstr(s): - return isinstance(s, str) +def multi_arange(n): + """By example: + + # 0 1 2 3 4 5 6 7 8 + n = [0, 0, 3, 0, 0, 2, 0, 2, 1] + res = [0, 1, 2, 0, 1, 0, 1, 0] + + That is it is equivalent to something like this : + + hstack((arange(n_i) for n_i in n)) + + This version seems quite a bit faster, at least for some possible inputs, and at any rate it + encapsulates a task in a function. + """ + if n.ndim != 1: + raise ValueError("n is supposed to be 1d array.") + + n_mask = n.astype(bool) + n_cumsum = np.cumsum(n) + ret = np.ones(n_cumsum[-1] + 1, dtype=int) + ret[n_cumsum[n_mask]] -= n[n_mask] + ret[0] -= 1 + return np.cumsum(ret)[:-1] + + +def label_contiguous_1d(X): + """ + WARNING: API for this function is not liable to change!!! + + By example: + + X = [F T T F F T F F F T T T] + result = [0 1 1 0 0 2 0 0 0 3 3 3] + + Or: + X = [0 3 3 0 0 5 5 5 1 1 0 2] + result = [0 1 1 0 0 2 2 2 3 3 0 4] + + The ``0`` or ``False`` elements of ``X`` are labeled as ``0`` in the output. If ``X`` is a boolean + array, each contiguous block of ``True`` is given an integer label, if ``X`` is not boolean, then + each contiguous block of identical values is given an integer label. Integer labels are 1, 2, 3, + ..... (i.e. start a 1 and increase by 1 for each block with no skipped numbers.) + """ + + if X.ndim != 1: + raise ValueError("this is for 1d masks only.") + + is_start = np.empty(len(X), dtype=bool) + is_start[0] = X[0] # True if X[0] is True or non-zero + + if X.dtype.kind == "b": + is_start[1:] = ~X[:-1] & X[1:] + M = X + else: + M = X.astype(bool) + is_start[1:] = X[:-1] != X[1:] + is_start[~M] = False + + L = np.cumsum(is_start) + L[~M] = 0 + return L + + +def relabel_groups_unique(group_idx): + """ + See also ``relabel_groups_masked``. + + keep_group: [0 3 3 3 0 2 5 2 0 1 1 0 3 5 5] + ret: [0 3 3 3 0 2 4 2 0 1 1 0 3 4 4] + + Description of above: unique groups in input was ``1,2,3,5``, i.e. + ``4`` was missing, so group 5 was relabled to be ``4``. + Relabeling maintains order, just "compressing" the higher numbers + to fill gaps. + """ + + keep_group = np.zeros(np.max(group_idx) + 1, dtype=bool) + keep_group[0] = True + keep_group[group_idx] = True + return relabel_groups_masked(group_idx, keep_group) + + +def relabel_groups_masked(group_idx, keep_group): + """ + group_idx: [0 3 3 3 0 2 5 2 0 1 1 0 3 5 5] + + 0 1 2 3 4 5 + keep_group: [0 1 0 1 1 1] + + ret: [0 2 2 2 0 0 4 0 0 1 1 0 2 4 4] + + Description of above in words: remove group 2, and relabel group 3,4, and 5 to be 2, 3 and 4 + respectively, in order to fill the gap. Note that group 4 was never used in the input group_idx, + but the user supplied mask said to keep group 4, so group 5 is only moved up by one place to fill + the gap created by removing group 2. + + That is, the mask describes which groups to remove, the remaining groups are relabled to remove the + gaps created by the falsy elements in ``keep_group``. Note that ``keep_group[0]`` has no particular + meaning because it refers to the zero group which cannot be "removed". + + ``keep_group`` should be bool and ``group_idx`` int. Values in ``group_idx`` can be any order. + """ + + keep_group = keep_group.astype(bool, copy=not keep_group[0]) + if not keep_group[0]: # ensuring keep_group[0] is True makes life easier + keep_group[0] = True + + relabel = np.zeros(keep_group.size, dtype=group_idx.dtype) + relabel[keep_group] = np.arange(np.count_nonzero(keep_group)) + return relabel[group_idx] + + +def is_duck_array(value): + """This function was copied from xarray/core/utils.py under the terms of Xarray's Apache-2 license.""" + + if isinstance(value, np.ndarray): + return True + return ( + hasattr(value, "ndim") + and hasattr(value, "shape") + and hasattr(value, "dtype") + and hasattr(value, "__array_function__") + and hasattr(value, "__array_ufunc__") + ) + + +def iscomplexobj(x): + """Copied from np.iscomplexobj so that we place fewer requirements on duck array types.""" + + try: + dtype = x.dtype + type_ = dtype.type + except AttributeError: + type_ = np.asarray(x).dtype.type + return issubclass(type_, np.complexfloating) diff --git a/numpy_groupies/utils_numpy.py b/numpy_groupies/utils_numpy.py deleted file mode 100644 index 7df3ad1..0000000 --- a/numpy_groupies/utils_numpy.py +++ /dev/null @@ -1,555 +0,0 @@ -"""Common helper functions for typing and general numpy tools.""" -import numpy as np - -from .utils import check_boolean, get_aliasing - -_alias_numpy = { - np.add: "sum", - np.sum: "sum", - np.any: "any", - np.all: "all", - np.multiply: "prod", - np.prod: "prod", - np.amin: "min", - np.min: "min", - np.minimum: "min", - np.amax: "max", - np.max: "max", - np.maximum: "max", - np.argmax: "argmax", - np.argmin: "argmin", - np.mean: "mean", - np.std: "std", - np.var: "var", - np.array: "array", - np.asarray: "array", - np.sort: "sort", - np.nansum: "nansum", - np.nanprod: "nanprod", - np.nanmean: "nanmean", - np.nanvar: "nanvar", - np.nanmax: "nanmax", - np.nanmin: "nanmin", - np.nanstd: "nanstd", - np.nanargmax: "nanargmax", - np.nanargmin: "nanargmin", - np.cumsum: "cumsum", - np.cumprod: "cumprod", -} - -aliasing = get_aliasing(_alias_numpy) - -_next_int_dtype = dict( - bool=np.int8, - uint8=np.int16, - int8=np.int16, - uint16=np.int32, - int16=np.int32, - uint32=np.int64, - int32=np.int64, -) - -_next_float_dtype = dict( - float16=np.float32, - float32=np.float64, - float64=np.complex64, - complex64=np.complex128, -) - - -def minimum_dtype(x, dtype=np.bool_): - """returns the "most basic" dtype which represents `x` properly, which - provides at least the same value range as the specified dtype.""" - - def check_type(x, dtype): - try: - converted = np.array(x).astype(dtype) - except (ValueError, OverflowError, RuntimeWarning): - return False - # False if some overflow has happened - return converted == x or np.isnan(x) - - def type_loop(x, dtype, dtype_dict, default=None): - while True: - try: - dtype = np.dtype(dtype_dict[dtype.name]) - if check_type(x, dtype): - return np.dtype(dtype) - except KeyError: - if default is not None: - return np.dtype(default) - raise ValueError("Can not determine dtype of %r" % x) - - dtype = np.dtype(dtype) - if check_type(x, dtype): - return dtype - - if np.issubdtype(dtype, np.inexact): - return type_loop(x, dtype, _next_float_dtype) - else: - return type_loop(x, dtype, _next_int_dtype, default=np.float32) - - -def minimum_dtype_scalar(x, dtype, a): - if dtype is None: - dtype = np.dtype(type(a)) if isinstance(a, (int, float)) else a.dtype - return minimum_dtype(x, dtype) - - -_forced_types = { - "array": object, - "all": bool, - "any": bool, - "nanall": bool, - "nanany": bool, - "len": np.int64, - "nanlen": np.int64, - "allnan": bool, - "anynan": bool, - "argmax": np.int64, - "argmin": np.int64, - "nanargmin": np.int64, - "nanargmax": np.int64, -} -_forced_float_types = {"mean", "var", "std", "nanmean", "nanvar", "nanstd"} -_forced_same_type = { - "min", - "max", - "first", - "last", - "nanmin", - "nanmax", - "nanfirst", - "nanlast", -} - - -def check_dtype(dtype, func_str, a, n): - if np.isscalar(a) or not a.shape: - if func_str not in ("sum", "prod", "len"): - raise ValueError( - "scalar inputs are supported only for 'sum', " "'prod' and 'len'" - ) - a_dtype = np.dtype(type(a)) - else: - a_dtype = a.dtype - - if dtype is not None: - # dtype set by the user - # Careful here: np.bool != np.bool_ ! - if np.issubdtype(dtype, np.bool_) and not ( - "all" in func_str or "any" in func_str - ): - raise TypeError( - "function %s requires a more complex datatype " "than bool" % func_str - ) - if not np.issubdtype(dtype, np.integer) and func_str in ("len", "nanlen"): - raise TypeError("function %s requires an integer datatype" % func_str) - # TODO: Maybe have some more checks here - return np.dtype(dtype) - else: - try: - return np.dtype(_forced_types[func_str]) - except KeyError: - if func_str in _forced_float_types: - if np.issubdtype(a_dtype, np.floating): - return a_dtype - else: - return np.dtype(np.float64) - else: - if func_str == "sum": - # Try to guess the minimally required int size - if np.issubdtype(a_dtype, np.int64): - # It's not getting bigger anymore - # TODO: strictly speaking it might need float - return np.dtype(np.int64) - elif np.issubdtype(a_dtype, np.integer): - maxval = np.iinfo(a_dtype).max * n - return minimum_dtype(maxval, a_dtype) - elif np.issubdtype(a_dtype, np.bool_): - return minimum_dtype(n, a_dtype) - else: - # floating, inexact, whatever - return a_dtype - elif func_str in _forced_same_type: - return a_dtype - else: - if isinstance(a_dtype, np.integer): - return np.dtype(np.int64) - else: - return a_dtype - - -def minval(fill_value, dtype): - dtype = minimum_dtype(fill_value, dtype) - if issubclass(dtype.type, np.floating): - return -np.inf - if issubclass(dtype.type, np.integer): - return np.iinfo(dtype).min - return np.finfo(dtype).min - - -def maxval(fill_value, dtype): - dtype = minimum_dtype(fill_value, dtype) - if issubclass(dtype.type, np.floating): - return np.inf - if issubclass(dtype.type, np.integer): - return np.iinfo(dtype).max - return np.finfo(dtype).max - - -def check_fill_value(fill_value, dtype, func=None): - if func in ("all", "any", "allnan", "anynan"): - check_boolean(fill_value) - else: - try: - return dtype.type(fill_value) - except ValueError: - raise ValueError( - "fill_value must be convertible into %s" % dtype.type.__name__ - ) - - -def check_group_idx(group_idx, a=None, check_min=True): - if a is not None and group_idx.size != a.size: - raise ValueError("The size of group_idx must be the same as " "a.size") - if not issubclass(group_idx.dtype.type, np.integer): - raise TypeError("group_idx must be of integer type") - if check_min and np.min(group_idx) < 0: - raise ValueError("group_idx contains negative indices") - - -def _ravel_group_idx(group_idx, a, axis, size, order, method="ravel"): - ndim_a = a.ndim - # Create the broadcast-ready multidimensional indexing. - # Note the user could do this themselves, so this is - # very much just a convenience. - size_in = int(np.max(group_idx)) + 1 if size is None else size - group_idx_in = group_idx - group_idx = [] - size = [] - for ii, s in enumerate(a.shape): - if method == "ravel": - ii_idx = group_idx_in if ii == axis else np.arange(s) - ii_shape = [1] * ndim_a - ii_shape[ii] = s - group_idx.append(ii_idx.reshape(ii_shape)) - size.append(size_in if ii == axis else s) - # Use the indexing, and return. It's a bit simpler than - # using trying to keep all the logic below happy - if method == "ravel": - group_idx = np.ravel_multi_index(group_idx, size, order=order, mode="raise") - elif method == "offset": - group_idx = offset_labels(group_idx_in, a.shape, axis, order, size_in) - return group_idx, size - - -def offset_labels(group_idx, inshape, axis, order, size): - """ - Offset group labels by dimension. This is used when we - reduce over a subset of the dimensions of by. It assumes that the reductions - dimensions have been flattened in the last dimension - Copied from - https://stackoverflow.com/questions/46256279/bin-elements-per-row-vectorized-2d-bincount-for-numpy - """ - - newaxes = tuple(ax for ax in range(len(inshape)) if ax != axis) - group_idx = np.broadcast_to(np.expand_dims(group_idx, newaxes), inshape) - if axis not in (-1, len(inshape) - 1): - group_idx = np.moveaxis(group_idx, axis, -1) - newshape = group_idx.shape[:-1] + (-1,) - - group_idx = ( - group_idx - + np.arange(np.prod(newshape[:-1]), dtype=int).reshape(newshape) * size - ) - if axis not in (-1, len(inshape) - 1): - return np.moveaxis(group_idx, -1, axis) - else: - return group_idx - - -def input_validation( - group_idx, - a, - size=None, - order="C", - axis=None, - ravel_group_idx=True, - check_bounds=True, - func=None, -): - """Do some fairly extensive checking of group_idx and a, trying to - give the user as much help as possible with what is wrong. Also, - convert ndim-indexing to 1d indexing. - """ - if not isinstance(a, (int, float, complex)) and not is_duck_array(a): - a = np.asanyarray(a) - if not is_duck_array(group_idx): - group_idx = np.asanyarray(group_idx) - - if not np.issubdtype(group_idx.dtype, np.integer): - raise TypeError("group_idx must be of integer type") - - # This check works for multidimensional indexing as well - if check_bounds and np.any(group_idx < 0): - raise ValueError("negative indices not supported") - - ndim_idx = np.ndim(group_idx) - ndim_a = np.ndim(a) - - # Deal with the axis arg: if present, then turn 1d indexing into - # multi-dimensional indexing along the specified axis. - if axis is None: - if ndim_a > 1: - raise ValueError( - "a must be scalar or 1 dimensional, use .ravel to" - " flatten. Alternatively specify axis." - ) - elif axis >= ndim_a or axis < -ndim_a: - raise ValueError("axis arg too large for np.ndim(a)") - else: - axis = axis if axis >= 0 else ndim_a + axis # negative indexing - if ndim_idx > 1: - # TODO: we could support a sequence of axis values for multiple - # dimensions of group_idx. - raise NotImplementedError( - "only 1d indexing currently" "supported with axis arg." - ) - elif a.shape[axis] != len(group_idx): - raise ValueError("a.shape[axis] doesn't match length of group_idx.") - elif size is not None and not np.isscalar(size): - raise NotImplementedError( - "when using axis arg, size must be" "None or scalar." - ) - else: - is_form_3 = group_idx.ndim == 1 and a.ndim > 1 and axis is not None - orig_shape = a.shape if is_form_3 else group_idx.shape - if isinstance(func, str) and "arg" in func: - unravel_shape = orig_shape - else: - unravel_shape = None - - method = "offset" if axis == ndim_a - 1 else "ravel" - group_idx, size = _ravel_group_idx( - group_idx, a, axis, size, order, method=method - ) - flat_size = np.prod(size) - ndim_idx = ndim_a - size = ( - orig_shape - if is_form_3 and not callable(func) and "cum" in func - else size - ) - return ( - group_idx.ravel(), - a.ravel(), - flat_size, - ndim_idx, - size, - unravel_shape, - ) - - if ndim_idx == 1: - if size is None: - size = int(np.max(group_idx)) + 1 - else: - if not np.isscalar(size): - raise ValueError("output size must be scalar or None") - if check_bounds and np.any(group_idx > size - 1): - raise ValueError( - "one or more indices are too large for " "size %d" % size - ) - flat_size = size - else: - if size is None: - size = np.max(group_idx, axis=1).astype(int) + 1 - elif np.isscalar(size): - raise ValueError("output size must be of length %d" % len(group_idx)) - elif len(size) != len(group_idx): - raise ValueError( - "%d sizes given, but %d output dimensions " - "specified in index" % (len(size), len(group_idx)) - ) - if ravel_group_idx: - group_idx = np.ravel_multi_index(group_idx, size, order=order, mode="raise") - flat_size = np.prod(size) - - if not (np.ndim(a) == 0 or len(a) == group_idx.size): - raise ValueError( - "group_idx and a must be of the same length, or a" " can be scalar" - ) - - return group_idx, a, flat_size, ndim_idx, size, None - - -### General tools ### - - -def unpack(group_idx, ret): - """Take an aggregate packed array and uncompress it to the size of group_idx. - This is equivalent to ret[group_idx]. - """ - return ret[group_idx] - - -def allnan(x): - return np.all(np.isnan(x)) - - -def anynan(x): - return np.any(np.isnan(x)) - - -def nanfirst(x): - return x[~np.isnan(x)][0] - - -def nanlast(x): - return x[~np.isnan(x)][-1] - - -def multi_arange(n): - """By example: - - # 0 1 2 3 4 5 6 7 8 - n = [0, 0, 3, 0, 0, 2, 0, 2, 1] - res = [0, 1, 2, 0, 1, 0, 1, 0] - - That is it is equivalent to something like this : - - hstack((arange(n_i) for n_i in n)) - - This version seems quite a bit faster, at least for some - possible inputs, and at any rate it encapsulates a task - in a function. - """ - if n.ndim != 1: - raise ValueError("n is supposed to be 1d array.") - - n_mask = n.astype(bool) - n_cumsum = np.cumsum(n) - ret = np.ones(n_cumsum[-1] + 1, dtype=int) - ret[n_cumsum[n_mask]] -= n[n_mask] - ret[0] -= 1 - return np.cumsum(ret)[:-1] - - -def label_contiguous_1d(X): - """ - WARNING: API for this function is not liable to change!!! - - By example: - - X = [F T T F F T F F F T T T] - result = [0 1 1 0 0 2 0 0 0 3 3 3] - - Or: - X = [0 3 3 0 0 5 5 5 1 1 0 2] - result = [0 1 1 0 0 2 2 2 3 3 0 4] - - The ``0`` or ``False`` elements of ``X`` are labeled as ``0`` in the output. If ``X`` - is a boolean array, each contiguous block of ``True`` is given an integer - label, if ``X`` is not boolean, then each contiguous block of identical values - is given an integer label. Integer labels are 1, 2, 3,..... (i.e. start a 1 - and increase by 1 for each block with no skipped numbers.) - - """ - - if X.ndim != 1: - raise ValueError("this is for 1d masks only.") - - is_start = np.empty(len(X), dtype=bool) - is_start[0] = X[0] # True if X[0] is True or non-zero - - if X.dtype.kind == "b": - is_start[1:] = ~X[:-1] & X[1:] - M = X - else: - M = X.astype(bool) - is_start[1:] = X[:-1] != X[1:] - is_start[~M] = False - - L = np.cumsum(is_start) - L[~M] = 0 - return L - - -def relabel_groups_unique(group_idx): - """ - See also ``relabel_groups_masked``. - - keep_group: [0 3 3 3 0 2 5 2 0 1 1 0 3 5 5] - ret: [0 3 3 3 0 2 4 2 0 1 1 0 3 4 4] - - Description of above: unique groups in input was ``1,2,3,5``, i.e. - ``4`` was missing, so group 5 was relabled to be ``4``. - Relabeling maintains order, just "compressing" the higher numbers - to fill gaps. - """ - - keep_group = np.zeros(np.max(group_idx) + 1, dtype=bool) - keep_group[0] = True - keep_group[group_idx] = True - return relabel_groups_masked(group_idx, keep_group) - - -def relabel_groups_masked(group_idx, keep_group): - """ - group_idx: [0 3 3 3 0 2 5 2 0 1 1 0 3 5 5] - - 0 1 2 3 4 5 - keep_group: [0 1 0 1 1 1] - - ret: [0 2 2 2 0 0 4 0 0 1 1 0 2 4 4] - - Description of above in words: remove group 2, and relabel group 3,4, and 5 - to be 2, 3 and 4 respecitvely, in order to fill the gap. Note that group 4 was never used - in the input group_idx, but the user supplied mask said to keep group 4, so group - 5 is only moved up by one place to fill the gap created by removing group 2. - - That is, the mask describes which groups to remove, - the remaining groups are relabled to remove the gaps created by the falsy - elements in ``keep_group``. Note that ``keep_group[0]`` has no particular meaning because it refers - to the zero group which cannot be "removed". - - ``keep_group`` should be bool and ``group_idx`` int. - Values in ``group_idx`` can be any order, and - """ - - keep_group = keep_group.astype(bool, copy=not keep_group[0]) - if not keep_group[0]: # ensuring keep_group[0] is True makes life easier - keep_group[0] = True - - relabel = np.zeros(keep_group.size, dtype=group_idx.dtype) - relabel[keep_group] = np.arange(np.count_nonzero(keep_group)) - return relabel[group_idx] - - -def is_duck_array(value): - """ - This function was copied from xarray/core/utils.py under the terms - of Xarray's Apache-2 license - """ - if isinstance(value, np.ndarray): - return True - return ( - hasattr(value, "ndim") - and hasattr(value, "shape") - and hasattr(value, "dtype") - and hasattr(value, "__array_function__") - and hasattr(value, "__array_ufunc__") - ) - - -def iscomplexobj(x): - """ - Copied from np.iscomplexobj so that we place fewer requirements - on duck array types. - """ - try: - dtype = x.dtype - type_ = dtype.type - except AttributeError: - type_ = np.asarray(x).dtype.type - return issubclass(type_, np.complexfloating) diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..0ad513e --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,60 @@ +[build-system] +requires = ["setuptools", "setuptools-scm"] +build-backend = "setuptools.build_meta" + +[project] +name = "numpy-groupies" +description = "Optimised tools for group-indexing operations: aggregated sum and more." +dynamic = ["version"] +readme = {file = "README.md", content-type = "text/markdown"} +license = {file = "LICENSE.txt"} +authors = [ + {name = "Michael Löffler", email = "ml@occam.com.ua"}, + {name = "Daniel Manson", email = "danielmanson.uk@gmail.com"} +] +maintainers = [ + {name = "Deepak Cherian", email = "dcherian@ucar.edu"} +] +classifiers = [ + "Development Status :: 4 - Beta", + "Intended Audience :: Science/Research", + "Intended Audience :: Developers", + "Operating System :: OS Independent", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3.11", + "Topic :: Scientific/Engineering", + "Topic :: Software Development :: Libraries", + "License :: OSI Approved :: BSD License", +] +keywords = ["accumarray", "aggregate", "groupby", "grouping", "indexing"] +requires-python = ">=3.9" +dependencies = ["numpy"] + +[project.optional-dependencies] +fast = [ + "numba", +] +dev = [ + "pytest", + "numba", + "pandas", +] + +[project.urls] +source = "https://github.com/ml31415/numpy-groupies" +tracker = "https://github.com/ml31415/numpy-groupies/issues" + +[tool.black] +line-length = 120 + +[tool.isort] +profile = "black" +honor_noqa = true + +[tool.setuptools.packages.find] +include = ["numpy_groupies*"] + +[tool.setuptools_scm] +write_to = "numpy_groupies/_version.py" diff --git a/setup.cfg b/setup.cfg deleted file mode 100644 index e0fea06..0000000 --- a/setup.cfg +++ /dev/null @@ -1,25 +0,0 @@ -[metadata] -description_file = README.md - -[aliases] -test=pytest - -[versioneer] -VCS = git -style = pep440 -versionfile_source = numpy_groupies/_version.py -versionfile_build = numpy_groupies/_version.py -tag_prefix = v -parentdir_prefix = numpy_groupies- - -[pep8] -max_line_length = 120 -aggressive = 1 - -[isort] -line_length = 100 -combine_as_imports = true -multi_line_output = 0 -skip_gitignore = true -default_section = THIRDPARTY -known_first_party = numpy_groupies diff --git a/setup.py b/setup.py deleted file mode 100644 index b48cdbc..0000000 --- a/setup.py +++ /dev/null @@ -1,86 +0,0 @@ -#!/usr/bin/env python - -import os -import versioneer -from setuptools import setup, Command -from shutil import rmtree - -base_path = os.path.dirname(os.path.abspath(__file__)) - -long_description = """ -This package consists of a couple of optimised tools for doing things that can roughly be -considered "group-indexing operations". The most prominent tool is `aggregate`. - -`aggregate` takes an array of values, and an array giving the group number for each of those -values. It then returns the sum (or mean, or std, or any, ...etc.) of the values in each group. -You have probably come across this idea before, using `matlab` accumarray, `pandas` groupby, -or generally MapReduce algorithms and histograms. - -There are different implementations of `aggregate` provided, based on plain `numpy`, `numba` -and `weave`. Performance is a main concern, and so far we comfortably beat similar -implementations in other packages (check the benchmarks). -""" - - -class Clean(Command): - description = "clean up temporary files from 'build' command" - user_options = [] - - def initialize_options(self): - pass - - def finalize_options(self): - pass - - def run(self): - for folder in ("build", "dist", "numpy_groupies.egg-info"): - path = os.path.join(base_path, folder) - if os.path.isdir(path): - print("removing '{}' (and everything under it)".format(path)) - if not self.dry_run: - rmtree(path) - self._rm_walk() - - def _rm_walk(self): - for path, dirs, files in os.walk(base_path): - if any(p.startswith(".") for p in path.split(os.path.sep)): - # Skip hidden directories like the git folder right away - continue - if path.endswith("__pycache__"): - print("removing '{}' (and everything under it)".format(path)) - if not self.dry_run: - rmtree(path) - else: - for fname in files: - if fname.endswith(".pyc") or fname.endswith(".so"): - fpath = os.path.join(path, fname) - print("removing '{}'".format(fpath)) - if not self.dry_run: - os.remove(fpath) - - -setup( - name="numpy_groupies", - version=versioneer.get_version(), - author="@ml31415 and @d1manson", - author_email="npgroupies@occam.com.ua", - license="BSD", - description="Optimised tools for group-indexing operations: aggregated sum and more.", - long_description=long_description, - long_description_content_type="text/markdown", - url="https://github.com/ml31415/numpy-groupies", - download_url="https://github.com/ml31415/numpy-groupies/archive/master.zip", - keywords=["accumarray", "aggregate", "groupby", "grouping", "indexing"], - packages=["numpy_groupies"], - install_requires=["numpy"], - extras_require={"tests": ["pytest"]}, - classifiers=[ - "Development Status :: 4 - Beta", - "Intended Audience :: Science/Research", - "Programming Language :: Python :: 3.7", - "Programming Language :: Python :: 3.8", - "Programming Language :: Python :: 3.9", - "Programming Language :: Python :: 3.10", - ], - cmdclass=dict(clean=Clean, **versioneer.get_cmdclass()), -) diff --git a/versioneer.py b/versioneer.py deleted file mode 100644 index 2b54540..0000000 --- a/versioneer.py +++ /dev/null @@ -1,1885 +0,0 @@ -# Version: 0.18 - -"""The Versioneer - like a rocketeer, but for versions. - -The Versioneer -============== - -* like a rocketeer, but for versions! -* https://github.com/warner/python-versioneer -* Brian Warner -* License: Public Domain -* Compatible With: python2.6, 2.7, 3.2, 3.3, 3.4, 3.5, 3.6, and pypy -* [![Latest Version] -(https://pypip.in/version/versioneer/badge.svg?style=flat) -](https://pypi.python.org/pypi/versioneer/) -* [![Build Status] -(https://travis-ci.org/warner/python-versioneer.png?branch=master) -](https://travis-ci.org/warner/python-versioneer) - -This is a tool for managing a recorded version number in distutils-based -python projects. The goal is to remove the tedious and error-prone "update -the embedded version string" step from your release process. Making a new -release should be as easy as recording a new tag in your version-control -system, and maybe making new tarballs. - - -## Quick Install - -* `pip install versioneer` to somewhere to your $PATH -* add a `[versioneer]` section to your setup.cfg (see below) -* run `versioneer install` in your source tree, commit the results - -## Version Identifiers - -Source trees come from a variety of places: - -* a version-control system checkout (mostly used by developers) -* a nightly tarball, produced by build automation -* a snapshot tarball, produced by a web-based VCS browser, like github's - "tarball from tag" feature -* a release tarball, produced by "setup.py sdist", distributed through PyPI - -Within each source tree, the version identifier (either a string or a number, -this tool is format-agnostic) can come from a variety of places: - -* ask the VCS tool itself, e.g. "git describe" (for checkouts), which knows - about recent "tags" and an absolute revision-id -* the name of the directory into which the tarball was unpacked -* an expanded VCS keyword ($Id$, etc) -* a `_version.py` created by some earlier build step - -For released software, the version identifier is closely related to a VCS -tag. Some projects use tag names that include more than just the version -string (e.g. "myproject-1.2" instead of just "1.2"), in which case the tool -needs to strip the tag prefix to extract the version identifier. For -unreleased software (between tags), the version identifier should provide -enough information to help developers recreate the same tree, while also -giving them an idea of roughly how old the tree is (after version 1.2, before -version 1.3). Many VCS systems can report a description that captures this, -for example `git describe --tags --dirty --always` reports things like -"0.7-1-g574ab98-dirty" to indicate that the checkout is one revision past the -0.7 tag, has a unique revision id of "574ab98", and is "dirty" (it has -uncommitted changes. - -The version identifier is used for multiple purposes: - -* to allow the module to self-identify its version: `myproject.__version__` -* to choose a name and prefix for a 'setup.py sdist' tarball - -## Theory of Operation - -Versioneer works by adding a special `_version.py` file into your source -tree, where your `__init__.py` can import it. This `_version.py` knows how to -dynamically ask the VCS tool for version information at import time. - -`_version.py` also contains `$Revision$` markers, and the installation -process marks `_version.py` to have this marker rewritten with a tag name -during the `git archive` command. As a result, generated tarballs will -contain enough information to get the proper version. - -To allow `setup.py` to compute a version too, a `versioneer.py` is added to -the top level of your source tree, next to `setup.py` and the `setup.cfg` -that configures it. This overrides several distutils/setuptools commands to -compute the version when invoked, and changes `setup.py build` and `setup.py -sdist` to replace `_version.py` with a small static file that contains just -the generated version data. - -## Installation - -See [INSTALL.md](./INSTALL.md) for detailed installation instructions. - -## Version-String Flavors - -Code which uses Versioneer can learn about its version string at runtime by -importing `_version` from your main `__init__.py` file and running the -`get_versions()` function. From the "outside" (e.g. in `setup.py`), you can -import the top-level `versioneer.py` and run `get_versions()`. - -Both functions return a dictionary with different flavors of version -information: - -* `['version']`: A condensed version string, rendered using the selected - style. This is the most commonly used value for the project's version - string. The default "pep440" style yields strings like `0.11`, - `0.11+2.g1076c97`, or `0.11+2.g1076c97.dirty`. See the "Styles" section - below for alternative styles. - -* `['full-revisionid']`: detailed revision identifier. For Git, this is the - full SHA1 commit id, e.g. "1076c978a8d3cfc70f408fe5974aa6c092c949ac". - -* `['date']`: Date and time of the latest `HEAD` commit. For Git, it is the - commit date in ISO 8601 format. This will be None if the date is not - available. - -* `['dirty']`: a boolean, True if the tree has uncommitted changes. Note that - this is only accurate if run in a VCS checkout, otherwise it is likely to - be False or None - -* `['error']`: if the version string could not be computed, this will be set - to a string describing the problem, otherwise it will be None. It may be - useful to throw an exception in setup.py if this is set, to avoid e.g. - creating tarballs with a version string of "unknown". - -Some variants are more useful than others. Including `full-revisionid` in a -bug report should allow developers to reconstruct the exact code being tested -(or indicate the presence of local changes that should be shared with the -developers). `version` is suitable for display in an "about" box or a CLI -`--version` output: it can be easily compared against release notes and lists -of bugs fixed in various releases. - -The installer adds the following text to your `__init__.py` to place a basic -version in `YOURPROJECT.__version__`: - - from ._version import get_versions - __version__ = get_versions()['version'] - del get_versions - -## Styles - -The setup.cfg `style=` configuration controls how the VCS information is -rendered into a version string. - -The default style, "pep440", produces a PEP440-compliant string, equal to the -un-prefixed tag name for actual releases, and containing an additional "local -version" section with more detail for in-between builds. For Git, this is -TAG[+DISTANCE.gHEX[.dirty]] , using information from `git describe --tags ---dirty --always`. For example "0.11+2.g1076c97.dirty" indicates that the -tree is like the "1076c97" commit but has uncommitted changes (".dirty"), and -that this commit is two revisions ("+2") beyond the "0.11" tag. For released -software (exactly equal to a known tag), the identifier will only contain the -stripped tag, e.g. "0.11". - -Other styles are available. See [details.md](details.md) in the Versioneer -source tree for descriptions. - -## Debugging - -Versioneer tries to avoid fatal errors: if something goes wrong, it will tend -to return a version of "0+unknown". To investigate the problem, run `setup.py -version`, which will run the version-lookup code in a verbose mode, and will -display the full contents of `get_versions()` (including the `error` string, -which may help identify what went wrong). - -## Known Limitations - -Some situations are known to cause problems for Versioneer. This details the -most significant ones. More can be found on Github -[issues page](https://github.com/warner/python-versioneer/issues). - -### Subprojects - -Versioneer has limited support for source trees in which `setup.py` is not in -the root directory (e.g. `setup.py` and `.git/` are *not* siblings). The are -two common reasons why `setup.py` might not be in the root: - -* Source trees which contain multiple subprojects, such as - [Buildbot](https://github.com/buildbot/buildbot), which contains both - "master" and "slave" subprojects, each with their own `setup.py`, - `setup.cfg`, and `tox.ini`. Projects like these produce multiple PyPI - distributions (and upload multiple independently-installable tarballs). -* Source trees whose main purpose is to contain a C library, but which also - provide bindings to Python (and perhaps other langauges) in subdirectories. - -Versioneer will look for `.git` in parent directories, and most operations -should get the right version string. However `pip` and `setuptools` have bugs -and implementation details which frequently cause `pip install .` from a -subproject directory to fail to find a correct version string (so it usually -defaults to `0+unknown`). - -`pip install --editable .` should work correctly. `setup.py install` might -work too. - -Pip-8.1.1 is known to have this problem, but hopefully it will get fixed in -some later version. - -[Bug #38](https://github.com/warner/python-versioneer/issues/38) is tracking -this issue. The discussion in -[PR #61](https://github.com/warner/python-versioneer/pull/61) describes the -issue from the Versioneer side in more detail. -[pip PR#3176](https://github.com/pypa/pip/pull/3176) and -[pip PR#3615](https://github.com/pypa/pip/pull/3615) contain work to improve -pip to let Versioneer work correctly. - -Versioneer-0.16 and earlier only looked for a `.git` directory next to the -`setup.cfg`, so subprojects were completely unsupported with those releases. - -### Editable installs with setuptools <= 18.5 - -`setup.py develop` and `pip install --editable .` allow you to install a -project into a virtualenv once, then continue editing the source code (and -test) without re-installing after every change. - -"Entry-point scripts" (`setup(entry_points={"console_scripts": ..})`) are a -convenient way to specify executable scripts that should be installed along -with the python package. - -These both work as expected when using modern setuptools. When using -setuptools-18.5 or earlier, however, certain operations will cause -`pkg_resources.DistributionNotFound` errors when running the entrypoint -script, which must be resolved by re-installing the package. This happens -when the install happens with one version, then the egg_info data is -regenerated while a different version is checked out. Many setup.py commands -cause egg_info to be rebuilt (including `sdist`, `wheel`, and installing into -a different virtualenv), so this can be surprising. - -[Bug #83](https://github.com/warner/python-versioneer/issues/83) describes -this one, but upgrading to a newer version of setuptools should probably -resolve it. - -### Unicode version strings - -While Versioneer works (and is continually tested) with both Python 2 and -Python 3, it is not entirely consistent with bytes-vs-unicode distinctions. -Newer releases probably generate unicode version strings on py2. It's not -clear that this is wrong, but it may be surprising for applications when then -write these strings to a network connection or include them in bytes-oriented -APIs like cryptographic checksums. - -[Bug #71](https://github.com/warner/python-versioneer/issues/71) investigates -this question. - - -## Updating Versioneer - -To upgrade your project to a new release of Versioneer, do the following: - -* install the new Versioneer (`pip install -U versioneer` or equivalent) -* edit `setup.cfg`, if necessary, to include any new configuration settings - indicated by the release notes. See [UPGRADING](./UPGRADING.md) for details. -* re-run `versioneer install` in your source tree, to replace - `SRC/_version.py` -* commit any changed files - -## Future Directions - -This tool is designed to make it easily extended to other version-control -systems: all VCS-specific components are in separate directories like -src/git/ . The top-level `versioneer.py` script is assembled from these -components by running make-versioneer.py . In the future, make-versioneer.py -will take a VCS name as an argument, and will construct a version of -`versioneer.py` that is specific to the given VCS. It might also take the -configuration arguments that are currently provided manually during -installation by editing setup.py . Alternatively, it might go the other -direction and include code from all supported VCS systems, reducing the -number of intermediate scripts. - - -## License - -To make Versioneer easier to embed, all its code is dedicated to the public -domain. The `_version.py` that it creates is also in the public domain. -Specifically, both are released under the Creative Commons "Public Domain -Dedication" license (CC0-1.0), as described in -https://creativecommons.org/publicdomain/zero/1.0/ . - -""" - -from __future__ import print_function - -try: - import configparser -except ImportError: - import ConfigParser as configparser -import errno -import json -import os -import re -import subprocess -import sys - - -class VersioneerConfig: - """Container for Versioneer configuration parameters.""" - - -def get_root(): - """Get the project root directory. - - We require that all commands are run from the project root, i.e. the - directory that contains setup.py, setup.cfg, and versioneer.py . - """ - root = os.path.realpath(os.path.abspath(os.getcwd())) - setup_py = os.path.join(root, "setup.py") - versioneer_py = os.path.join(root, "versioneer.py") - if not (os.path.exists(setup_py) or os.path.exists(versioneer_py)): - # allow 'python path/to/setup.py COMMAND' - root = os.path.dirname(os.path.realpath(os.path.abspath(sys.argv[0]))) - setup_py = os.path.join(root, "setup.py") - versioneer_py = os.path.join(root, "versioneer.py") - if not (os.path.exists(setup_py) or os.path.exists(versioneer_py)): - err = ( - "Versioneer was unable to run the project root directory. " - "Versioneer requires setup.py to be executed from " - "its immediate directory (like 'python setup.py COMMAND'), " - "or in a way that lets it use sys.argv[0] to find the root " - "(like 'python path/to/setup.py COMMAND')." - ) - raise VersioneerBadRootError(err) - try: - # Certain runtime workflows (setup.py install/develop in a setuptools - # tree) execute all dependencies in a single python process, so - # "versioneer" may be imported multiple times, and python's shared - # module-import table will cache the first one. So we can't use - # os.path.dirname(__file__), as that will find whichever - # versioneer.py was first imported, even in later projects. - me = os.path.realpath(os.path.abspath(__file__)) - me_dir = os.path.normcase(os.path.splitext(me)[0]) - vsr_dir = os.path.normcase(os.path.splitext(versioneer_py)[0]) - if me_dir != vsr_dir: - print( - "Warning: build in %s is using versioneer.py from %s" - % (os.path.dirname(me), versioneer_py) - ) - except NameError: - pass - return root - - -def get_config_from_root(root): - """Read the project setup.cfg file to determine Versioneer config.""" - # This might raise EnvironmentError (if setup.cfg is missing), or - # configparser.NoSectionError (if it lacks a [versioneer] section), or - # configparser.NoOptionError (if it lacks "VCS="). See the docstring at - # the top of versioneer.py for instructions on writing your setup.cfg . - setup_cfg = os.path.join(root, "setup.cfg") - parser = configparser.SafeConfigParser() - with open(setup_cfg, "r") as f: - parser.readfp(f) - VCS = parser.get("versioneer", "VCS") # mandatory - - def get(parser, name): - if parser.has_option("versioneer", name): - return parser.get("versioneer", name) - return None - - cfg = VersioneerConfig() - cfg.VCS = VCS - cfg.style = get(parser, "style") or "" - cfg.versionfile_source = get(parser, "versionfile_source") - cfg.versionfile_build = get(parser, "versionfile_build") - cfg.tag_prefix = get(parser, "tag_prefix") - if cfg.tag_prefix in ("''", '""'): - cfg.tag_prefix = "" - cfg.parentdir_prefix = get(parser, "parentdir_prefix") - cfg.verbose = get(parser, "verbose") - return cfg - - -class NotThisMethod(Exception): - """Exception raised if a method is not valid for the current scenario.""" - - -# these dictionaries contain VCS-specific tools -LONG_VERSION_PY = {} -HANDLERS = {} - - -def register_vcs_handler(vcs, method): # decorator - """Decorator to mark a method as the handler for a particular VCS.""" - - def decorate(f): - """Store f in HANDLERS[vcs][method].""" - if vcs not in HANDLERS: - HANDLERS[vcs] = {} - HANDLERS[vcs][method] = f - return f - - return decorate - - -def run_command(commands, args, cwd=None, verbose=False, hide_stderr=False, env=None): - """Call the given command(s).""" - assert isinstance(commands, list) - p = None - for c in commands: - try: - dispcmd = str([c] + args) - # remember shell=False, so use git.cmd on windows, not just git - p = subprocess.Popen( - [c] + args, - cwd=cwd, - env=env, - stdout=subprocess.PIPE, - stderr=(subprocess.PIPE if hide_stderr else None), - ) - break - except EnvironmentError: - e = sys.exc_info()[1] - if e.errno == errno.ENOENT: - continue - if verbose: - print("unable to run %s" % dispcmd) - print(e) - return None, None - else: - if verbose: - print("unable to find command, tried %s" % (commands,)) - return None, None - stdout = p.communicate()[0].strip() - if sys.version_info[0] >= 3: - stdout = stdout.decode() - if p.returncode != 0: - if verbose: - print("unable to run %s (error)" % dispcmd) - print("stdout was %s" % stdout) - return None, p.returncode - return stdout, p.returncode - - -LONG_VERSION_PY[ - "git" -] = ''' -# This file helps to compute a version number in source trees obtained from -# git-archive tarball (such as those provided by githubs download-from-tag -# feature). Distribution tarballs (built by setup.py sdist) and build -# directories (produced by setup.py build) will contain a much shorter file -# that just contains the computed version number. - -# This file is released into the public domain. Generated by -# versioneer-0.18 (https://github.com/warner/python-versioneer) - -"""Git implementation of _version.py.""" - -import errno -import os -import re -import subprocess -import sys - - -def get_keywords(): - """Get the keywords needed to look up the version information.""" - # these strings will be replaced by git during git-archive. - # setup.py/versioneer.py will grep for the variable names, so they must - # each be defined on a line of their own. _version.py will just call - # get_keywords(). - git_refnames = "%(DOLLAR)sFormat:%%d%(DOLLAR)s" - git_full = "%(DOLLAR)sFormat:%%H%(DOLLAR)s" - git_date = "%(DOLLAR)sFormat:%%ci%(DOLLAR)s" - keywords = {"refnames": git_refnames, "full": git_full, "date": git_date} - return keywords - - -class VersioneerConfig: - """Container for Versioneer configuration parameters.""" - - -def get_config(): - """Create, populate and return the VersioneerConfig() object.""" - # these strings are filled in when 'setup.py versioneer' creates - # _version.py - cfg = VersioneerConfig() - cfg.VCS = "git" - cfg.style = "%(STYLE)s" - cfg.tag_prefix = "%(TAG_PREFIX)s" - cfg.parentdir_prefix = "%(PARENTDIR_PREFIX)s" - cfg.versionfile_source = "%(VERSIONFILE_SOURCE)s" - cfg.verbose = False - return cfg - - -class NotThisMethod(Exception): - """Exception raised if a method is not valid for the current scenario.""" - - -LONG_VERSION_PY = {} -HANDLERS = {} - - -def register_vcs_handler(vcs, method): # decorator - """Decorator to mark a method as the handler for a particular VCS.""" - def decorate(f): - """Store f in HANDLERS[vcs][method].""" - if vcs not in HANDLERS: - HANDLERS[vcs] = {} - HANDLERS[vcs][method] = f - return f - return decorate - - -def run_command(commands, args, cwd=None, verbose=False, hide_stderr=False, - env=None): - """Call the given command(s).""" - assert isinstance(commands, list) - p = None - for c in commands: - try: - dispcmd = str([c] + args) - # remember shell=False, so use git.cmd on windows, not just git - p = subprocess.Popen([c] + args, cwd=cwd, env=env, - stdout=subprocess.PIPE, - stderr=(subprocess.PIPE if hide_stderr - else None)) - break - except EnvironmentError: - e = sys.exc_info()[1] - if e.errno == errno.ENOENT: - continue - if verbose: - print("unable to run %%s" %% dispcmd) - print(e) - return None, None - else: - if verbose: - print("unable to find command, tried %%s" %% (commands,)) - return None, None - stdout = p.communicate()[0].strip() - if sys.version_info[0] >= 3: - stdout = stdout.decode() - if p.returncode != 0: - if verbose: - print("unable to run %%s (error)" %% dispcmd) - print("stdout was %%s" %% stdout) - return None, p.returncode - return stdout, p.returncode - - -def versions_from_parentdir(parentdir_prefix, root, verbose): - """Try to determine the version from the parent directory name. - - Source tarballs conventionally unpack into a directory that includes both - the project name and a version string. We will also support searching up - two directory levels for an appropriately named parent directory - """ - rootdirs = [] - - for i in range(3): - dirname = os.path.basename(root) - if dirname.startswith(parentdir_prefix): - return {"version": dirname[len(parentdir_prefix):], - "full-revisionid": None, - "dirty": False, "error": None, "date": None} - else: - rootdirs.append(root) - root = os.path.dirname(root) # up a level - - if verbose: - print("Tried directories %%s but none started with prefix %%s" %% - (str(rootdirs), parentdir_prefix)) - raise NotThisMethod("rootdir doesn't start with parentdir_prefix") - - -@register_vcs_handler("git", "get_keywords") -def git_get_keywords(versionfile_abs): - """Extract version information from the given file.""" - # the code embedded in _version.py can just fetch the value of these - # keywords. When used from setup.py, we don't want to import _version.py, - # so we do it with a regexp instead. This function is not used from - # _version.py. - keywords = {} - try: - f = open(versionfile_abs, "r") - for line in f.readlines(): - if line.strip().startswith("git_refnames ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["refnames"] = mo.group(1) - if line.strip().startswith("git_full ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["full"] = mo.group(1) - if line.strip().startswith("git_date ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["date"] = mo.group(1) - f.close() - except EnvironmentError: - pass - return keywords - - -@register_vcs_handler("git", "keywords") -def git_versions_from_keywords(keywords, tag_prefix, verbose): - """Get version information from git keywords.""" - if not keywords: - raise NotThisMethod("no keywords at all, weird") - date = keywords.get("date") - if date is not None: - # git-2.2.0 added "%%cI", which expands to an ISO-8601 -compliant - # datestamp. However we prefer "%%ci" (which expands to an "ISO-8601 - # -like" string, which we must then edit to make compliant), because - # it's been around since git-1.5.3, and it's too difficult to - # discover which version we're using, or to work around using an - # older one. - date = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - refnames = keywords["refnames"].strip() - if refnames.startswith("$Format"): - if verbose: - print("keywords are unexpanded, not using") - raise NotThisMethod("unexpanded keywords, not a git-archive tarball") - refs = set([r.strip() for r in refnames.strip("()").split(",")]) - # starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of - # just "foo-1.0". If we see a "tag: " prefix, prefer those. - TAG = "tag: " - tags = set([r[len(TAG):] for r in refs if r.startswith(TAG)]) - if not tags: - # Either we're using git < 1.8.3, or there really are no tags. We use - # a heuristic: assume all version tags have a digit. The old git %%d - # expansion behaves like git log --decorate=short and strips out the - # refs/heads/ and refs/tags/ prefixes that would let us distinguish - # between branches and tags. By ignoring refnames without digits, we - # filter out many common branch names like "release" and - # "stabilization", as well as "HEAD" and "master". - tags = set([r for r in refs if re.search(r'\d', r)]) - if verbose: - print("discarding '%%s', no digits" %% ",".join(refs - tags)) - if verbose: - print("likely tags: %%s" %% ",".join(sorted(tags))) - for ref in sorted(tags): - # sorting will prefer e.g. "2.0" over "2.0rc1" - if ref.startswith(tag_prefix): - r = ref[len(tag_prefix):] - if verbose: - print("picking %%s" %% r) - return {"version": r, - "full-revisionid": keywords["full"].strip(), - "dirty": False, "error": None, - "date": date} - # no suitable tags, so version is "0+unknown", but full hex is still there - if verbose: - print("no suitable tags, using unknown + full revision id") - return {"version": "0+unknown", - "full-revisionid": keywords["full"].strip(), - "dirty": False, "error": "no suitable tags", "date": None} - - -@register_vcs_handler("git", "pieces_from_vcs") -def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command): - """Get version from 'git describe' in the root of the source tree. - - This only gets called if the git-archive 'subst' keywords were *not* - expanded, and _version.py hasn't already been rewritten with a short - version string, meaning we're inside a checked out source tree. - """ - GITS = ["git"] - if sys.platform == "win32": - GITS = ["git.cmd", "git.exe"] - - out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root, - hide_stderr=True) - if rc != 0: - if verbose: - print("Directory %%s not under git control" %% root) - raise NotThisMethod("'git rev-parse --git-dir' returned error") - - # if there is a tag matching tag_prefix, this yields TAG-NUM-gHEX[-dirty] - # if there isn't one, this yields HEX[-dirty] (no NUM) - describe_out, rc = run_command(GITS, ["describe", "--tags", "--dirty", - "--always", "--long", - "--match", "%%s*" %% tag_prefix], - cwd=root) - # --long was added in git-1.5.5 - if describe_out is None: - raise NotThisMethod("'git describe' failed") - describe_out = describe_out.strip() - full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root) - if full_out is None: - raise NotThisMethod("'git rev-parse' failed") - full_out = full_out.strip() - - pieces = {} - pieces["long"] = full_out - pieces["short"] = full_out[:7] # maybe improved later - pieces["error"] = None - - # parse describe_out. It will be like TAG-NUM-gHEX[-dirty] or HEX[-dirty] - # TAG might have hyphens. - git_describe = describe_out - - # look for -dirty suffix - dirty = git_describe.endswith("-dirty") - pieces["dirty"] = dirty - if dirty: - git_describe = git_describe[:git_describe.rindex("-dirty")] - - # now we have TAG-NUM-gHEX or HEX - - if "-" in git_describe: - # TAG-NUM-gHEX - mo = re.search(r'^(.+)-(\d+)-g([0-9a-f]+)$', git_describe) - if not mo: - # unparseable. Maybe git-describe is misbehaving? - pieces["error"] = ("unable to parse git-describe output: '%%s'" - %% describe_out) - return pieces - - # tag - full_tag = mo.group(1) - if not full_tag.startswith(tag_prefix): - if verbose: - fmt = "tag '%%s' doesn't start with prefix '%%s'" - print(fmt %% (full_tag, tag_prefix)) - pieces["error"] = ("tag '%%s' doesn't start with prefix '%%s'" - %% (full_tag, tag_prefix)) - return pieces - pieces["closest-tag"] = full_tag[len(tag_prefix):] - - # distance: number of commits since tag - pieces["distance"] = int(mo.group(2)) - - # commit: short hex revision ID - pieces["short"] = mo.group(3) - - else: - # HEX: no tags - pieces["closest-tag"] = None - count_out, rc = run_command(GITS, ["rev-list", "HEAD", "--count"], - cwd=root) - pieces["distance"] = int(count_out) # total number of commits - - # commit date: see ISO-8601 comment in git_versions_from_keywords() - date = run_command(GITS, ["show", "-s", "--format=%%ci", "HEAD"], - cwd=root)[0].strip() - pieces["date"] = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - - return pieces - - -def plus_or_dot(pieces): - """Return a + if we don't already have one, else return a .""" - if "+" in pieces.get("closest-tag", ""): - return "." - return "+" - - -def render_pep440(pieces): - """Build up version string, with post-release "local version identifier". - - Our goal: TAG[+DISTANCE.gHEX[.dirty]] . Note that if you - get a tagged build and then dirty it, you'll get TAG+0.gHEX.dirty - - Exceptions: - 1: no tags. git_describe was just HEX. 0+untagged.DISTANCE.gHEX[.dirty] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += plus_or_dot(pieces) - rendered += "%%d.g%%s" %% (pieces["distance"], pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - else: - # exception #1 - rendered = "0+untagged.%%d.g%%s" %% (pieces["distance"], - pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - return rendered - - -def render_pep440_pre(pieces): - """TAG[.post.devDISTANCE] -- No -dirty. - - Exceptions: - 1: no tags. 0.post.devDISTANCE - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += ".post.dev%%d" %% pieces["distance"] - else: - # exception #1 - rendered = "0.post.dev%%d" %% pieces["distance"] - return rendered - - -def render_pep440_post(pieces): - """TAG[.postDISTANCE[.dev0]+gHEX] . - - The ".dev0" means dirty. Note that .dev0 sorts backwards - (a dirty tree will appear "older" than the corresponding clean one), - but you shouldn't be releasing software with -dirty anyways. - - Exceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%%d" %% pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += plus_or_dot(pieces) - rendered += "g%%s" %% pieces["short"] - else: - # exception #1 - rendered = "0.post%%d" %% pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += "+g%%s" %% pieces["short"] - return rendered - - -def render_pep440_old(pieces): - """TAG[.postDISTANCE[.dev0]] . - - The ".dev0" means dirty. - - Eexceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%%d" %% pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - else: - # exception #1 - rendered = "0.post%%d" %% pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - return rendered - - -def render_git_describe(pieces): - """TAG[-DISTANCE-gHEX][-dirty]. - - Like 'git describe --tags --dirty --always'. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += "-%%d-g%%s" %% (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render_git_describe_long(pieces): - """TAG-DISTANCE-gHEX[-dirty]. - - Like 'git describe --tags --dirty --always -long'. - The distance/hash is unconditional. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - rendered += "-%%d-g%%s" %% (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render(pieces, style): - """Render the given version pieces into the requested style.""" - if pieces["error"]: - return {"version": "unknown", - "full-revisionid": pieces.get("long"), - "dirty": None, - "error": pieces["error"], - "date": None} - - if not style or style == "default": - style = "pep440" # the default - - if style == "pep440": - rendered = render_pep440(pieces) - elif style == "pep440-pre": - rendered = render_pep440_pre(pieces) - elif style == "pep440-post": - rendered = render_pep440_post(pieces) - elif style == "pep440-old": - rendered = render_pep440_old(pieces) - elif style == "git-describe": - rendered = render_git_describe(pieces) - elif style == "git-describe-long": - rendered = render_git_describe_long(pieces) - else: - raise ValueError("unknown style '%%s'" %% style) - - return {"version": rendered, "full-revisionid": pieces["long"], - "dirty": pieces["dirty"], "error": None, - "date": pieces.get("date")} - - -def get_versions(): - """Get version information or return default if unable to do so.""" - # I am in _version.py, which lives at ROOT/VERSIONFILE_SOURCE. If we have - # __file__, we can work backwards from there to the root. Some - # py2exe/bbfreeze/non-CPython implementations don't do __file__, in which - # case we can only use expanded keywords. - - cfg = get_config() - verbose = cfg.verbose - - try: - return git_versions_from_keywords(get_keywords(), cfg.tag_prefix, - verbose) - except NotThisMethod: - pass - - try: - root = os.path.realpath(__file__) - # versionfile_source is the relative path from the top of the source - # tree (where the .git directory might live) to this file. Invert - # this to find the root from __file__. - for i in cfg.versionfile_source.split('/'): - root = os.path.dirname(root) - except NameError: - return {"version": "0+unknown", "full-revisionid": None, - "dirty": None, - "error": "unable to find root of source tree", - "date": None} - - try: - pieces = git_pieces_from_vcs(cfg.tag_prefix, root, verbose) - return render(pieces, cfg.style) - except NotThisMethod: - pass - - try: - if cfg.parentdir_prefix: - return versions_from_parentdir(cfg.parentdir_prefix, root, verbose) - except NotThisMethod: - pass - - return {"version": "0+unknown", "full-revisionid": None, - "dirty": None, - "error": "unable to compute version", "date": None} -''' - - -@register_vcs_handler("git", "get_keywords") -def git_get_keywords(versionfile_abs): - """Extract version information from the given file.""" - # the code embedded in _version.py can just fetch the value of these - # keywords. When used from setup.py, we don't want to import _version.py, - # so we do it with a regexp instead. This function is not used from - # _version.py. - keywords = {} - try: - f = open(versionfile_abs, "r") - for line in f.readlines(): - if line.strip().startswith("git_refnames ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["refnames"] = mo.group(1) - if line.strip().startswith("git_full ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["full"] = mo.group(1) - if line.strip().startswith("git_date ="): - mo = re.search(r'=\s*"(.*)"', line) - if mo: - keywords["date"] = mo.group(1) - f.close() - except EnvironmentError: - pass - return keywords - - -@register_vcs_handler("git", "keywords") -def git_versions_from_keywords(keywords, tag_prefix, verbose): - """Get version information from git keywords.""" - if not keywords: - raise NotThisMethod("no keywords at all, weird") - date = keywords.get("date") - if date is not None: - # git-2.2.0 added "%cI", which expands to an ISO-8601 -compliant - # datestamp. However we prefer "%ci" (which expands to an "ISO-8601 - # -like" string, which we must then edit to make compliant), because - # it's been around since git-1.5.3, and it's too difficult to - # discover which version we're using, or to work around using an - # older one. - date = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - refnames = keywords["refnames"].strip() - if refnames.startswith("$Format"): - if verbose: - print("keywords are unexpanded, not using") - raise NotThisMethod("unexpanded keywords, not a git-archive tarball") - refs = set([r.strip() for r in refnames.strip("()").split(",")]) - # starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of - # just "foo-1.0". If we see a "tag: " prefix, prefer those. - TAG = "tag: " - tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)]) - if not tags: - # Either we're using git < 1.8.3, or there really are no tags. We use - # a heuristic: assume all version tags have a digit. The old git %d - # expansion behaves like git log --decorate=short and strips out the - # refs/heads/ and refs/tags/ prefixes that would let us distinguish - # between branches and tags. By ignoring refnames without digits, we - # filter out many common branch names like "release" and - # "stabilization", as well as "HEAD" and "master". - tags = set([r for r in refs if re.search(r"\d", r)]) - if verbose: - print("discarding '%s', no digits" % ",".join(refs - tags)) - if verbose: - print("likely tags: %s" % ",".join(sorted(tags))) - for ref in sorted(tags): - # sorting will prefer e.g. "2.0" over "2.0rc1" - if ref.startswith(tag_prefix): - r = ref[len(tag_prefix) :] - if verbose: - print("picking %s" % r) - return { - "version": r, - "full-revisionid": keywords["full"].strip(), - "dirty": False, - "error": None, - "date": date, - } - # no suitable tags, so version is "0+unknown", but full hex is still there - if verbose: - print("no suitable tags, using unknown + full revision id") - return { - "version": "0+unknown", - "full-revisionid": keywords["full"].strip(), - "dirty": False, - "error": "no suitable tags", - "date": None, - } - - -@register_vcs_handler("git", "pieces_from_vcs") -def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command): - """Get version from 'git describe' in the root of the source tree. - - This only gets called if the git-archive 'subst' keywords were *not* - expanded, and _version.py hasn't already been rewritten with a short - version string, meaning we're inside a checked out source tree. - """ - GITS = ["git"] - if sys.platform == "win32": - GITS = ["git.cmd", "git.exe"] - - out, rc = run_command(GITS, ["rev-parse", "--git-dir"], cwd=root, hide_stderr=True) - if rc != 0: - if verbose: - print("Directory %s not under git control" % root) - raise NotThisMethod("'git rev-parse --git-dir' returned error") - - # if there is a tag matching tag_prefix, this yields TAG-NUM-gHEX[-dirty] - # if there isn't one, this yields HEX[-dirty] (no NUM) - describe_out, rc = run_command( - GITS, - [ - "describe", - "--tags", - "--dirty", - "--always", - "--long", - "--match", - "%s*" % tag_prefix, - ], - cwd=root, - ) - # --long was added in git-1.5.5 - if describe_out is None: - raise NotThisMethod("'git describe' failed") - describe_out = describe_out.strip() - full_out, rc = run_command(GITS, ["rev-parse", "HEAD"], cwd=root) - if full_out is None: - raise NotThisMethod("'git rev-parse' failed") - full_out = full_out.strip() - - pieces = {} - pieces["long"] = full_out - pieces["short"] = full_out[:7] # maybe improved later - pieces["error"] = None - - # parse describe_out. It will be like TAG-NUM-gHEX[-dirty] or HEX[-dirty] - # TAG might have hyphens. - git_describe = describe_out - - # look for -dirty suffix - dirty = git_describe.endswith("-dirty") - pieces["dirty"] = dirty - if dirty: - git_describe = git_describe[: git_describe.rindex("-dirty")] - - # now we have TAG-NUM-gHEX or HEX - - if "-" in git_describe: - # TAG-NUM-gHEX - mo = re.search(r"^(.+)-(\d+)-g([0-9a-f]+)$", git_describe) - if not mo: - # unparseable. Maybe git-describe is misbehaving? - pieces["error"] = "unable to parse git-describe output: '%s'" % describe_out - return pieces - - # tag - full_tag = mo.group(1) - if not full_tag.startswith(tag_prefix): - if verbose: - fmt = "tag '%s' doesn't start with prefix '%s'" - print(fmt % (full_tag, tag_prefix)) - pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % ( - full_tag, - tag_prefix, - ) - return pieces - pieces["closest-tag"] = full_tag[len(tag_prefix) :] - - # distance: number of commits since tag - pieces["distance"] = int(mo.group(2)) - - # commit: short hex revision ID - pieces["short"] = mo.group(3) - - else: - # HEX: no tags - pieces["closest-tag"] = None - count_out, rc = run_command(GITS, ["rev-list", "HEAD", "--count"], cwd=root) - pieces["distance"] = int(count_out) # total number of commits - - # commit date: see ISO-8601 comment in git_versions_from_keywords() - date = run_command(GITS, ["show", "-s", "--format=%ci", "HEAD"], cwd=root)[ - 0 - ].strip() - pieces["date"] = date.strip().replace(" ", "T", 1).replace(" ", "", 1) - - return pieces - - -def do_vcs_install(manifest_in, versionfile_source, ipy): - """Git-specific installation logic for Versioneer. - - For Git, this means creating/changing .gitattributes to mark _version.py - for export-subst keyword substitution. - """ - GITS = ["git"] - if sys.platform == "win32": - GITS = ["git.cmd", "git.exe"] - files = [manifest_in, versionfile_source] - if ipy: - files.append(ipy) - try: - me = __file__ - if me.endswith(".pyc") or me.endswith(".pyo"): - me = os.path.splitext(me)[0] + ".py" - versioneer_file = os.path.relpath(me) - except NameError: - versioneer_file = "versioneer.py" - files.append(versioneer_file) - present = False - try: - f = open(".gitattributes", "r") - for line in f.readlines(): - if line.strip().startswith(versionfile_source): - if "export-subst" in line.strip().split()[1:]: - present = True - f.close() - except EnvironmentError: - pass - if not present: - f = open(".gitattributes", "a+") - f.write("%s export-subst\n" % versionfile_source) - f.close() - files.append(".gitattributes") - run_command(GITS, ["add", "--"] + files) - - -def versions_from_parentdir(parentdir_prefix, root, verbose): - """Try to determine the version from the parent directory name. - - Source tarballs conventionally unpack into a directory that includes both - the project name and a version string. We will also support searching up - two directory levels for an appropriately named parent directory - """ - rootdirs = [] - - for i in range(3): - dirname = os.path.basename(root) - if dirname.startswith(parentdir_prefix): - return { - "version": dirname[len(parentdir_prefix) :], - "full-revisionid": None, - "dirty": False, - "error": None, - "date": None, - } - else: - rootdirs.append(root) - root = os.path.dirname(root) # up a level - - if verbose: - print( - "Tried directories %s but none started with prefix %s" - % (str(rootdirs), parentdir_prefix) - ) - raise NotThisMethod("rootdir doesn't start with parentdir_prefix") - - -SHORT_VERSION_PY = """ -# This file was generated by 'versioneer.py' (0.18) from -# revision-control system data, or from the parent directory name of an -# unpacked source archive. Distribution tarballs contain a pre-generated copy -# of this file. - -import json - -version_json = ''' -%s -''' # END VERSION_JSON - - -def get_versions(): - return json.loads(version_json) -""" - - -def versions_from_file(filename): - """Try to determine the version from _version.py if present.""" - try: - with open(filename) as f: - contents = f.read() - except EnvironmentError: - raise NotThisMethod("unable to read _version.py") - mo = re.search( - r"version_json = '''\n(.*)''' # END VERSION_JSON", contents, re.M | re.S - ) - if not mo: - mo = re.search( - r"version_json = '''\r\n(.*)''' # END VERSION_JSON", contents, re.M | re.S - ) - if not mo: - raise NotThisMethod("no version_json in _version.py") - return json.loads(mo.group(1)) - - -def write_to_version_file(filename, versions): - """Write the given version number to the given _version.py file.""" - os.unlink(filename) - contents = json.dumps(versions, sort_keys=True, indent=1, separators=(",", ": ")) - with open(filename, "w") as f: - f.write(SHORT_VERSION_PY % contents) - - print("set %s to '%s'" % (filename, versions["version"])) - - -def plus_or_dot(pieces): - """Return a + if we don't already have one, else return a .""" - if "+" in pieces.get("closest-tag", ""): - return "." - return "+" - - -def render_pep440(pieces): - """Build up version string, with post-release "local version identifier". - - Our goal: TAG[+DISTANCE.gHEX[.dirty]] . Note that if you - get a tagged build and then dirty it, you'll get TAG+0.gHEX.dirty - - Exceptions: - 1: no tags. git_describe was just HEX. 0+untagged.DISTANCE.gHEX[.dirty] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += plus_or_dot(pieces) - rendered += "%d.g%s" % (pieces["distance"], pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - else: - # exception #1 - rendered = "0+untagged.%d.g%s" % (pieces["distance"], pieces["short"]) - if pieces["dirty"]: - rendered += ".dirty" - return rendered - - -def render_pep440_pre(pieces): - """TAG[.post.devDISTANCE] -- No -dirty. - - Exceptions: - 1: no tags. 0.post.devDISTANCE - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += ".post.dev%d" % pieces["distance"] - else: - # exception #1 - rendered = "0.post.dev%d" % pieces["distance"] - return rendered - - -def render_pep440_post(pieces): - """TAG[.postDISTANCE[.dev0]+gHEX] . - - The ".dev0" means dirty. Note that .dev0 sorts backwards - (a dirty tree will appear "older" than the corresponding clean one), - but you shouldn't be releasing software with -dirty anyways. - - Exceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += plus_or_dot(pieces) - rendered += "g%s" % pieces["short"] - else: - # exception #1 - rendered = "0.post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - rendered += "+g%s" % pieces["short"] - return rendered - - -def render_pep440_old(pieces): - """TAG[.postDISTANCE[.dev0]] . - - The ".dev0" means dirty. - - Eexceptions: - 1: no tags. 0.postDISTANCE[.dev0] - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"] or pieces["dirty"]: - rendered += ".post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - else: - # exception #1 - rendered = "0.post%d" % pieces["distance"] - if pieces["dirty"]: - rendered += ".dev0" - return rendered - - -def render_git_describe(pieces): - """TAG[-DISTANCE-gHEX][-dirty]. - - Like 'git describe --tags --dirty --always'. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - if pieces["distance"]: - rendered += "-%d-g%s" % (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render_git_describe_long(pieces): - """TAG-DISTANCE-gHEX[-dirty]. - - Like 'git describe --tags --dirty --always -long'. - The distance/hash is unconditional. - - Exceptions: - 1: no tags. HEX[-dirty] (note: no 'g' prefix) - """ - if pieces["closest-tag"]: - rendered = pieces["closest-tag"] - rendered += "-%d-g%s" % (pieces["distance"], pieces["short"]) - else: - # exception #1 - rendered = pieces["short"] - if pieces["dirty"]: - rendered += "-dirty" - return rendered - - -def render(pieces, style): - """Render the given version pieces into the requested style.""" - if pieces["error"]: - return { - "version": "unknown", - "full-revisionid": pieces.get("long"), - "dirty": None, - "error": pieces["error"], - "date": None, - } - - if not style or style == "default": - style = "pep440" # the default - - if style == "pep440": - rendered = render_pep440(pieces) - elif style == "pep440-pre": - rendered = render_pep440_pre(pieces) - elif style == "pep440-post": - rendered = render_pep440_post(pieces) - elif style == "pep440-old": - rendered = render_pep440_old(pieces) - elif style == "git-describe": - rendered = render_git_describe(pieces) - elif style == "git-describe-long": - rendered = render_git_describe_long(pieces) - else: - raise ValueError("unknown style '%s'" % style) - - return { - "version": rendered, - "full-revisionid": pieces["long"], - "dirty": pieces["dirty"], - "error": None, - "date": pieces.get("date"), - } - - -class VersioneerBadRootError(Exception): - """The project root directory is unknown or missing key files.""" - - -def get_versions(verbose=False): - """Get the project version from whatever source is available. - - Returns dict with two keys: 'version' and 'full'. - """ - if "versioneer" in sys.modules: - # see the discussion in cmdclass.py:get_cmdclass() - del sys.modules["versioneer"] - - root = get_root() - cfg = get_config_from_root(root) - - assert cfg.VCS is not None, "please set [versioneer]VCS= in setup.cfg" - handlers = HANDLERS.get(cfg.VCS) - assert handlers, "unrecognized VCS '%s'" % cfg.VCS - verbose = verbose or cfg.verbose - assert ( - cfg.versionfile_source is not None - ), "please set versioneer.versionfile_source" - assert cfg.tag_prefix is not None, "please set versioneer.tag_prefix" - - versionfile_abs = os.path.join(root, cfg.versionfile_source) - - # extract version from first of: _version.py, VCS command (e.g. 'git - # describe'), parentdir. This is meant to work for developers using a - # source checkout, for users of a tarball created by 'setup.py sdist', - # and for users of a tarball/zipball created by 'git archive' or github's - # download-from-tag feature or the equivalent in other VCSes. - - get_keywords_f = handlers.get("get_keywords") - from_keywords_f = handlers.get("keywords") - if get_keywords_f and from_keywords_f: - try: - keywords = get_keywords_f(versionfile_abs) - ver = from_keywords_f(keywords, cfg.tag_prefix, verbose) - if verbose: - print("got version from expanded keyword %s" % ver) - return ver - except NotThisMethod: - pass - - try: - ver = versions_from_file(versionfile_abs) - if verbose: - print("got version from file %s %s" % (versionfile_abs, ver)) - return ver - except NotThisMethod: - pass - - from_vcs_f = handlers.get("pieces_from_vcs") - if from_vcs_f: - try: - pieces = from_vcs_f(cfg.tag_prefix, root, verbose) - ver = render(pieces, cfg.style) - if verbose: - print("got version from VCS %s" % ver) - return ver - except NotThisMethod: - pass - - try: - if cfg.parentdir_prefix: - ver = versions_from_parentdir(cfg.parentdir_prefix, root, verbose) - if verbose: - print("got version from parentdir %s" % ver) - return ver - except NotThisMethod: - pass - - if verbose: - print("unable to compute version") - - return { - "version": "0+unknown", - "full-revisionid": None, - "dirty": None, - "error": "unable to compute version", - "date": None, - } - - -def get_version(): - """Get the short version string for this project.""" - return get_versions()["version"] - - -def get_cmdclass(): - """Get the custom setuptools/distutils subclasses used by Versioneer.""" - if "versioneer" in sys.modules: - del sys.modules["versioneer"] - # this fixes the "python setup.py develop" case (also 'install' and - # 'easy_install .'), in which subdependencies of the main project are - # built (using setup.py bdist_egg) in the same python process. Assume - # a main project A and a dependency B, which use different versions - # of Versioneer. A's setup.py imports A's Versioneer, leaving it in - # sys.modules by the time B's setup.py is executed, causing B to run - # with the wrong versioneer. Setuptools wraps the sub-dep builds in a - # sandbox that restores sys.modules to it's pre-build state, so the - # parent is protected against the child's "import versioneer". By - # removing ourselves from sys.modules here, before the child build - # happens, we protect the child from the parent's versioneer too. - # Also see https://github.com/warner/python-versioneer/issues/52 - - cmds = {} - - # we add "version" to both distutils and setuptools - from distutils.core import Command - - class cmd_version(Command): - description = "report generated version string" - user_options = [] - boolean_options = [] - - def initialize_options(self): - pass - - def finalize_options(self): - pass - - def run(self): - vers = get_versions(verbose=True) - print("Version: %s" % vers["version"]) - print(" full-revisionid: %s" % vers.get("full-revisionid")) - print(" dirty: %s" % vers.get("dirty")) - print(" date: %s" % vers.get("date")) - if vers["error"]: - print(" error: %s" % vers["error"]) - - cmds["version"] = cmd_version - - # we override "build_py" in both distutils and setuptools - # - # most invocation pathways end up running build_py: - # distutils/build -> build_py - # distutils/install -> distutils/build ->.. - # setuptools/bdist_wheel -> distutils/install ->.. - # setuptools/bdist_egg -> distutils/install_lib -> build_py - # setuptools/install -> bdist_egg ->.. - # setuptools/develop -> ? - # pip install: - # copies source tree to a tempdir before running egg_info/etc - # if .git isn't copied too, 'git describe' will fail - # then does setup.py bdist_wheel, or sometimes setup.py install - # setup.py egg_info -> ? - - # we override different "build_py" commands for both environments - if "setuptools" in sys.modules: - from setuptools.command.build_py import build_py as _build_py - else: - from distutils.command.build_py import build_py as _build_py - - class cmd_build_py(_build_py): - def run(self): - root = get_root() - cfg = get_config_from_root(root) - versions = get_versions() - _build_py.run(self) - # now locate _version.py in the new build/ directory and replace - # it with an updated value - if cfg.versionfile_build: - target_versionfile = os.path.join(self.build_lib, cfg.versionfile_build) - print("UPDATING %s" % target_versionfile) - write_to_version_file(target_versionfile, versions) - - cmds["build_py"] = cmd_build_py - - if "cx_Freeze" in sys.modules: # cx_freeze enabled? - from cx_Freeze.dist import build_exe as _build_exe - - # nczeczulin reports that py2exe won't like the pep440-style string - # as FILEVERSION, but it can be used for PRODUCTVERSION, e.g. - # setup(console=[{ - # "version": versioneer.get_version().split("+", 1)[0], # FILEVERSION - # "product_version": versioneer.get_version(), - # ... - - class cmd_build_exe(_build_exe): - def run(self): - root = get_root() - cfg = get_config_from_root(root) - versions = get_versions() - target_versionfile = cfg.versionfile_source - print("UPDATING %s" % target_versionfile) - write_to_version_file(target_versionfile, versions) - - _build_exe.run(self) - os.unlink(target_versionfile) - with open(cfg.versionfile_source, "w") as f: - LONG = LONG_VERSION_PY[cfg.VCS] - f.write( - LONG - % { - "DOLLAR": "$", - "STYLE": cfg.style, - "TAG_PREFIX": cfg.tag_prefix, - "PARENTDIR_PREFIX": cfg.parentdir_prefix, - "VERSIONFILE_SOURCE": cfg.versionfile_source, - } - ) - - cmds["build_exe"] = cmd_build_exe - del cmds["build_py"] - - if "py2exe" in sys.modules: # py2exe enabled? - try: - from py2exe.distutils_buildexe import py2exe as _py2exe # py3 - except ImportError: - from py2exe.build_exe import py2exe as _py2exe # py2 - - class cmd_py2exe(_py2exe): - def run(self): - root = get_root() - cfg = get_config_from_root(root) - versions = get_versions() - target_versionfile = cfg.versionfile_source - print("UPDATING %s" % target_versionfile) - write_to_version_file(target_versionfile, versions) - - _py2exe.run(self) - os.unlink(target_versionfile) - with open(cfg.versionfile_source, "w") as f: - LONG = LONG_VERSION_PY[cfg.VCS] - f.write( - LONG - % { - "DOLLAR": "$", - "STYLE": cfg.style, - "TAG_PREFIX": cfg.tag_prefix, - "PARENTDIR_PREFIX": cfg.parentdir_prefix, - "VERSIONFILE_SOURCE": cfg.versionfile_source, - } - ) - - cmds["py2exe"] = cmd_py2exe - - # we override different "sdist" commands for both environments - if "setuptools" in sys.modules: - from setuptools.command.sdist import sdist as _sdist - else: - from distutils.command.sdist import sdist as _sdist - - class cmd_sdist(_sdist): - def run(self): - versions = get_versions() - self._versioneer_generated_versions = versions - # unless we update this, the command will keep using the old - # version - self.distribution.metadata.version = versions["version"] - return _sdist.run(self) - - def make_release_tree(self, base_dir, files): - root = get_root() - cfg = get_config_from_root(root) - _sdist.make_release_tree(self, base_dir, files) - # now locate _version.py in the new base_dir directory - # (remembering that it may be a hardlink) and replace it with an - # updated value - target_versionfile = os.path.join(base_dir, cfg.versionfile_source) - print("UPDATING %s" % target_versionfile) - write_to_version_file( - target_versionfile, self._versioneer_generated_versions - ) - - cmds["sdist"] = cmd_sdist - - return cmds - - -CONFIG_ERROR = """ -setup.cfg is missing the necessary Versioneer configuration. You need -a section like: - - [versioneer] - VCS = git - style = pep440 - versionfile_source = src/myproject/_version.py - versionfile_build = myproject/_version.py - tag_prefix = - parentdir_prefix = myproject- - -You will also need to edit your setup.py to use the results: - - import versioneer - setup(version=versioneer.get_version(), - cmdclass=versioneer.get_cmdclass(), ...) - -Please read the docstring in ./versioneer.py for configuration instructions, -edit setup.cfg, and re-run the installer or 'python versioneer.py setup'. -""" - -SAMPLE_CONFIG = """ -# See the docstring in versioneer.py for instructions. Note that you must -# re-run 'versioneer.py setup' after changing this section, and commit the -# resulting files. - -[versioneer] -#VCS = git -#style = pep440 -#versionfile_source = -#versionfile_build = -#tag_prefix = -#parentdir_prefix = - -""" - -INIT_PY_SNIPPET = """ -from ._version import get_versions -__version__ = get_versions()['version'] -del get_versions -""" - - -def do_setup(): - """Main VCS-independent setup function for installing Versioneer.""" - root = get_root() - try: - cfg = get_config_from_root(root) - except ( - EnvironmentError, - configparser.NoSectionError, - configparser.NoOptionError, - ) as e: - if isinstance(e, (EnvironmentError, configparser.NoSectionError)): - print("Adding sample versioneer config to setup.cfg", file=sys.stderr) - with open(os.path.join(root, "setup.cfg"), "a") as f: - f.write(SAMPLE_CONFIG) - print(CONFIG_ERROR, file=sys.stderr) - return 1 - - print(" creating %s" % cfg.versionfile_source) - with open(cfg.versionfile_source, "w") as f: - LONG = LONG_VERSION_PY[cfg.VCS] - f.write( - LONG - % { - "DOLLAR": "$", - "STYLE": cfg.style, - "TAG_PREFIX": cfg.tag_prefix, - "PARENTDIR_PREFIX": cfg.parentdir_prefix, - "VERSIONFILE_SOURCE": cfg.versionfile_source, - } - ) - - ipy = os.path.join(os.path.dirname(cfg.versionfile_source), "__init__.py") - if os.path.exists(ipy): - try: - with open(ipy, "r") as f: - old = f.read() - except EnvironmentError: - old = "" - if INIT_PY_SNIPPET not in old: - print(" appending to %s" % ipy) - with open(ipy, "a") as f: - f.write(INIT_PY_SNIPPET) - else: - print(" %s unmodified" % ipy) - else: - print(" %s doesn't exist, ok" % ipy) - ipy = None - - # Make sure both the top-level "versioneer.py" and versionfile_source - # (PKG/_version.py, used by runtime code) are in MANIFEST.in, so - # they'll be copied into source distributions. Pip won't be able to - # install the package without this. - manifest_in = os.path.join(root, "MANIFEST.in") - simple_includes = set() - try: - with open(manifest_in, "r") as f: - for line in f: - if line.startswith("include "): - for include in line.split()[1:]: - simple_includes.add(include) - except EnvironmentError: - pass - # That doesn't cover everything MANIFEST.in can do - # (http://docs.python.org/2/distutils/sourcedist.html#commands), so - # it might give some false negatives. Appending redundant 'include' - # lines is safe, though. - if "versioneer.py" not in simple_includes: - print(" appending 'versioneer.py' to MANIFEST.in") - with open(manifest_in, "a") as f: - f.write("include versioneer.py\n") - else: - print(" 'versioneer.py' already in MANIFEST.in") - if cfg.versionfile_source not in simple_includes: - print( - " appending versionfile_source ('%s') to MANIFEST.in" - % cfg.versionfile_source - ) - with open(manifest_in, "a") as f: - f.write("include %s\n" % cfg.versionfile_source) - else: - print(" versionfile_source already in MANIFEST.in") - - # Make VCS-specific changes. For git, this means creating/changing - # .gitattributes to mark _version.py for export-subst keyword - # substitution. - do_vcs_install(manifest_in, cfg.versionfile_source, ipy) - return 0 - - -def scan_setup_py(): - """Validate the contents of setup.py against Versioneer's expectations.""" - found = set() - setters = False - errors = 0 - with open("setup.py", "r") as f: - for line in f.readlines(): - if "import versioneer" in line: - found.add("import") - if "versioneer.get_cmdclass()" in line: - found.add("cmdclass") - if "versioneer.get_version()" in line: - found.add("get_version") - if "versioneer.VCS" in line: - setters = True - if "versioneer.versionfile_source" in line: - setters = True - if len(found) != 3: - print("") - print("Your setup.py appears to be missing some important items") - print("(but I might be wrong). Please make sure it has something") - print("roughly like the following:") - print("") - print(" import versioneer") - print(" setup( version=versioneer.get_version(),") - print(" cmdclass=versioneer.get_cmdclass(), ...)") - print("") - errors += 1 - if setters: - print("You should remove lines like 'versioneer.VCS = ' and") - print("'versioneer.versionfile_source = ' . This configuration") - print("now lives in setup.cfg, and should be removed from setup.py") - print("") - errors += 1 - return errors - - -if __name__ == "__main__": - cmd = sys.argv[1] - if cmd == "setup": - errors = do_setup() - errors += scan_setup_py() - if errors: - sys.exit(1)