-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement, test py::object as taxon type #47
Conversation
Probably could get a speed boost by specializing for string, float, int, etc. instead of using py::object but this works!
Because <format> isn't available on compilers in CI
TODO: Empirical should support quote escaping csv entries containing "," so that we don't have to url encode taxon info representations containg ","
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #47 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 1 1
Lines 5 5
=========================================
Hits 5 5 ☔ View full report in Codecov by Sentry. |
It looks like Empirical CSV engine doesn't support quote-escaping csv content containing "," at the moment? Is this on the roadmap? (Quote escapes are part of the CSV standard https://datatracker.ietf.org/doc/html/rfc4180) If we add quote escapes to Empirical csv engine, we can have a much nicer serialized representation of containers (lists, arrays, etc) |
Ready for review/merge |
Apparently new release isn't compat with old windows python
This is mostly looking good, but things are messed up when the actual taxon or taxon info is a string (super common use case that definitely needs to be supported before we can merge this). I added a fix to support loading from file when the info type is a string, but there's still an issue when the taxon calculation function returns a string (demonstrated in #57). |
Oops, no I was making a dumb mistake in my tests. With my fix to de-serialization, everything seems fine when info is a string. The only remaining problem is that we still need to override py::object equals if we don't want the super unintuitve behavior where equality is based on pointer equality rather than info equality. |
New equals operator is working except for numpy arrays, because apparently |
Agreed here. Wonder if we could do a try catch with |
That sounds like a pretty expensive thing to do every time we check equality. I'm experimenting with an alternative where we cache an == operator when a python object is constructed, but it's not going great and honestly still feels like a lot of overhead to support what I'm not convinced is a common use case. Is there reason to think using raw NumPy arrays as taxon info is likely to happen a lot? (e.g. is this important for your long-term parallelization plans?) In general, I don't want us to be responsible for adding lots of additional checks to support objects with unconventional |
Okay, I did get the constructor thing working (in #57). It requires reaching into python in the constructor, grabbing the correct equals operator, and then storing that in the py::object. Two slight issues:
That said, these are both sufficiently minor that I think I'd be okay with merging at this point (probably adding these as issues) |
Oh shoot, looks like that breaks if numpy isn't installed. I'll go back to my original question of how important numpy support is. |
Okay I fixed it for realsies now. Going to go ahead and merge it into this branch and merge this branch into master. |
Beautiful! think we could get rid of memory overhead you mention by using a static variable initialized using an immediately invoked lambda that does the try catch block you have. Agree that existing solution 100% works fine for the moment. |
Potentially? I tried something like that really quickly but there's something messy about having |
This will allow the user to use arbitrary types as their taxa, including numpy stuff, instead of just strings as previously implemented.