Implement, test py::object as taxon type #47

mmore500 · 2023-11-23T16:04:21Z

This will allow the user to use arbitrary types as their taxa, including numpy stuff, instead of just strings as previously implemented.

Probably could get a speed boost by specializing for string, float, int, etc. instead of using py::object but this works!

Because <format> isn't available on compilers in CI

TODO: Empirical should support quote escaping csv entries containing "," so that we don't have to url encode taxon info representations containg ","

codecov-commenter · 2023-11-23T18:34:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e96c8a4) 100.00% compared to head (6c9fcb7) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #47   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            1         1           
  Lines            5         5           
=========================================
  Hits             5         5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mmore500 · 2023-11-23T18:37:03Z

It looks like Empirical CSV engine doesn't support quote-escaping csv content containing "," at the moment? Is this on the roadmap? (Quote escapes are part of the CSV standard https://datatracker.ietf.org/doc/html/rfc4180)

If we add quote escapes to Empirical csv engine, we can have a much nicer serialized representation of containers (lists, arrays, etc)

mmore500 · 2023-11-23T18:38:02Z

Ready for review/merge

Apparently new release isn't compat with old windows python

emilydolson · 2023-12-02T06:10:12Z

This is mostly looking good, but things are messed up when the actual taxon or taxon info is a string (super common use case that definitely needs to be supported before we can merge this). I added a fix to support loading from file when the info type is a string, but there's still an issue when the taxon calculation function returns a string (demonstrated in #57).

… stringinfo

emilydolson · 2023-12-03T00:22:42Z

Oops, no I was making a dumb mistake in my tests. With my fix to de-serialization, everything seems fine when info is a string. The only remaining problem is that we still need to override py::object equals if we don't want the super unintuitve behavior where equality is based on pointer equality rather than info equality.

emilydolson · 2023-12-03T23:36:28Z

New equals operator is working except for numpy arrays, because apparently == for numpy arrays returns an array of Trues and Falses, not a single True/False

mmore500 · 2023-12-04T08:42:09Z

Oops, no I was making a dumb mistake in my tests. With my fix to de-serialization, everything seems fine when info is a string. The only remaining problem is that we still need to override py::object equals if we don't want the super unintuitve behavior where equality is based on pointer equality rather than info equality.

Agreed here. Wonder if we could do a try catch with numpy.array_equal as the first choice override for py::object == (very nicely, this works as regular == for non-numpy objects) with fallback to use plain builtins.__eq__.

emilydolson · 2023-12-04T15:36:20Z

That sounds like a pretty expensive thing to do every time we check equality. I'm experimenting with an alternative where we cache an == operator when a python object is constructed, but it's not going great and honestly still feels like a lot of overhead to support what I'm not convinced is a common use case. Is there reason to think using raw NumPy arrays as taxon info is likely to happen a lot? (e.g. is this important for your long-term parallelization plans?)

In general, I don't want us to be responsible for adding lots of additional checks to support objects with unconventional == operators. I would generally think that if a user wants to use an object like that as taxon info, we should let them be responsible for wrapping it in a class with an == operator that returns a bool. But if you think there's reason to make an exception for NumPy, I can keep trying

emilydolson · 2023-12-04T19:35:40Z

Okay, I did get the constructor thing working (in #57). It requires reaching into python in the constructor, grabbing the correct equals operator, and then storing that in the py::object. Two slight issues:

Bad things happen if you mix taxon_info types that have incompatible equals operators (pretty sure that's always going to be true, but it throws a less obvious error than would be ideal)
We're storing a ton of copies of the same equals operator. Ideally, we should just have one (maybe a static variable?) that gets stored once on construction of a Systematics manager.

That said, these are both sufficiently minor that I think I'd be okay with merging at this point (probably adding these as issues)

emilydolson · 2023-12-04T19:39:14Z

Oh shoot, looks like that breaks if numpy isn't installed. I'll go back to my original question of how important numpy support is.

emilydolson · 2023-12-05T01:28:57Z

Okay I fixed it for realsies now. Going to go ahead and merge it into this branch and merge this branch into master.

mmore500 · 2023-12-05T02:48:42Z

Beautiful!

think we could get rid of memory overhead you mention by using a static variable initialized using an immediately invoked lambda that does the try catch block you have. Agree that existing solution 100% works fine for the moment.

emilydolson · 2023-12-05T03:05:09Z

Potentially? I tried something like that really quickly but there's something messy about having py::objects as static variables (they don't get destructed at the right time relative to the Python infrastructure and end up segfaulting?)

mmore500 added 3 commits November 23, 2023 09:27

Add missing <string> include

8234b2a

Implement, test py::object as taxon type

57e3859

Probably could get a speed boost by specializing for string, float, int, etc. instead of using py::object but this works!

Make org -> taxon transform default to identity

4f50bb7

mmore500 force-pushed the taxontype branch from ab1ec53 to 4f50bb7 Compare November 23, 2023 16:20

mmore500 requested a review from emilydolson November 23, 2023 16:20

mmore500 added 4 commits November 23, 2023 11:31

Use oldest-supported-numpy for cross-release pin

f50db23

Replace std::format use

848962a

Because <format> isn't available on compilers in CI

Bugfix: support optional desc in add_snapshot_fun

6d3fbeb

Fixup, test full taxon info save/load

b70b3dd

TODO: Empirical should support quote escaping csv entries containing "," so that we don't have to url encode taxon info representations containg ","

mmore500 and others added 18 commits November 23, 2023 23:47

Disable wheel build failfast

ff1e7f1

Try bumping windows runner to fix wheel build

f3ab5fe

Add scaffold for direct test dispatch

a422b47

Try pinning windows CI to oldest-supported-numpy

a6611ca

Use older oldest-supported-numpy

748f5f4

Apparently new release isn't compat with old windows python

Add pytest dep to windows build

997e78f

Pin windows to older pytest

21e9184

Provide explicit test command to CI build wheel

3945567

Clean up import statement

f89a790

Specify asset path independent of pwd

1034000

Use {project} template instead of hardcoding

a078bb4

Try excluding python3.6 build

ec762cf

Make wheel builds run right away

411ce76

Make build sdist run right away, too

b3bed86

Test build sdist in non-deployment CI

281f4e5

Try adding py37 to test matrix

be99704

Restore py36, try excluding local source discovery

b1871ec

Ignore python version requirement in test install

d4d76c5

emilydolson and others added 6 commits December 1, 2023 12:47

Fix pytest exclude

bf9d84e

Fix pytest exclude

762ab89

needs double quotes

7c01b1d

Pin numpy via oldest-supported-numpy

d2ab275

Pull commented-out np test to @mark.nowheel fn

ff4cc1b

Demonstrate string bug

cf07ec2

emilydolson added 4 commits December 2, 2023 02:28

Merge branch 'main' into taxontype

0b69ce1

Merge branch 'taxontype' into stringinfo

2c01236

Remove incorrect test

f79f3e9

Merge branch 'stringinfo' of github.com:emilydolson/phylotrackpy into…

6f9b306

… stringinfo

Add new equals operator

026fedc

Check equals operator on construction

cbd81a9

emilydolson added 3 commits December 4, 2023 18:06

Avoid numpy import error?

c3b7f31

Fix numpy dependency

b694881

Fix numpy dependency?

00ebf15

emilydolson added 2 commits December 4, 2023 20:29

Merge pull request #57 from emilydolson/stringinfo

ad18812

Merge branch 'main' into taxontype

6c9fcb7

emilydolson merged commit 66989d4 into main Dec 5, 2023
49 checks passed

emilydolson deleted the taxontype branch December 5, 2023 02:00

emilydolson mentioned this pull request Dec 5, 2023

Third argument to add_snapshot_fun should be optional #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement, test py::object as taxon type #47

Implement, test py::object as taxon type #47

mmore500 commented Nov 23, 2023

codecov-commenter commented Nov 23, 2023 •

edited

Loading

mmore500 commented Nov 23, 2023 •

edited

Loading

mmore500 commented Nov 23, 2023

emilydolson commented Dec 2, 2023

emilydolson commented Dec 3, 2023

emilydolson commented Dec 3, 2023

mmore500 commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 5, 2023

mmore500 commented Dec 5, 2023

emilydolson commented Dec 5, 2023

Implement, test py::object as taxon type #47

Implement, test py::object as taxon type #47

Conversation

mmore500 commented Nov 23, 2023

codecov-commenter commented Nov 23, 2023 • edited Loading

Codecov Report

mmore500 commented Nov 23, 2023 • edited Loading

mmore500 commented Nov 23, 2023

emilydolson commented Dec 2, 2023

emilydolson commented Dec 3, 2023

emilydolson commented Dec 3, 2023

mmore500 commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 4, 2023

emilydolson commented Dec 5, 2023

mmore500 commented Dec 5, 2023

emilydolson commented Dec 5, 2023

codecov-commenter commented Nov 23, 2023 •

edited

Loading

mmore500 commented Nov 23, 2023 •

edited

Loading