Update sparse-dot-topn to v1 #77

RUrlus · 2024-04-15T18:41:18Z

We recently refactored sparse-dot-topn significantly and released v1.

The most significant improvements are:

Faster implementation with lower memory overhead
new bindings using Nanobind which avoids the installation issues with Cython
Default parallelism with OpenMP

The changes are significant enough that we released a new major version which deprecates awsome_cossim_topn.
I also noticed that you encountered a bug when top-n is 1, I added a test-case for this and the issue no longer exists.

The new implementation does not sort the scores but rather returns the matrix in the order as if you didn't select the top-n,
i.e. sp_matmul(A, B) == sp_matmul_topn(A, B, B.shape[1]).
It wasn't directly clear to me if you (implicitly) depend on the result being sorted so I left sorting on (it has no performance penalty).

MaartenGr · 2024-04-18T14:08:40Z

Thanks for sharing this, I completely missed this new release!

It wasn't directly clear to me if you (implicitly) depend on the result being sorted so I left sorting on (it has no performance penalty).

Thanks for already sorting this, I might have missed it otherwise. Indeed, the code expects it to be left sorted.

I see that the tests fail but they also use quite old Python versions which have minimal/no support anymore. Could that be the issue?

RUrlus · 2024-04-18T15:50:55Z

Ah yes, I hadn't realized. We don't support 3.7 through the binding library which is 3.8+.
So unfortunately that's a hard limit for us.

We could condition the minimum version based on the python version and add a thin wrapper around the old API to make it compatible with the new one.

polyfuzz/models/_utils.py

RUrlus · 2024-04-22T06:16:51Z

Yes, my bad I changed the filename last minute. I'll push a fix in a bit.

MaartenGr · 2024-04-26T08:30:39Z

Thanks for the changes, it seems that the pipeline has problems importing the function you created.

RUrlus · 2024-04-26T14:02:29Z

Sorry, my laptop doesn't support 3.7 so I'd perhaps relied too much on the CICD for this work smoothly. I figured out the issue, apparently sys.version_info > (3, 7) is true on CPython 3.7...
Fixed now, the models/test_tfidf.py passed locally on 3.7

MaartenGr · 2024-04-30T07:39:07Z

It seems that there are some tests failing. Not sure why that is happening though.

RUrlus · 2024-06-27T10:54:16Z

Hi @MaartenGr, sorry for the stall on this, I'm hoping to pick this back up soon.
I did run into a (known) issue with multiple OpenMP binaries being loaded but I'll turn of the multi-threading on our end.

RUrlus added 2 commits April 15, 2024 20:16

Fix deprecated call to awsome_cossim_topn

4d8afee

Update sparse-dot-topn dependency to 1.1

191dcfa

MaartenGr reviewed Apr 22, 2024

View reviewed changes

polyfuzz/models/_utils.py Outdated Show resolved Hide resolved

RUrlus force-pushed the sparse_dot_update branch from 6a99cdb to ae1b93b Compare April 22, 2024 07:34

Condition sparse-dot-topn version on Python version

1ff33b4

RUrlus force-pushed the sparse_dot_update branch from ae1b93b to 1ff33b4 Compare April 26, 2024 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update sparse-dot-topn to v1 #77

Update sparse-dot-topn to v1 #77

RUrlus commented Apr 15, 2024 •

edited

Loading

MaartenGr commented Apr 18, 2024

RUrlus commented Apr 18, 2024

RUrlus commented Apr 22, 2024

MaartenGr commented Apr 26, 2024

RUrlus commented Apr 26, 2024

MaartenGr commented Apr 30, 2024

RUrlus commented Jun 27, 2024

Update sparse-dot-topn to v1 #77

Are you sure you want to change the base?

Update sparse-dot-topn to v1 #77

Conversation

RUrlus commented Apr 15, 2024 • edited Loading

MaartenGr commented Apr 18, 2024

RUrlus commented Apr 18, 2024

RUrlus commented Apr 22, 2024

MaartenGr commented Apr 26, 2024

RUrlus commented Apr 26, 2024

MaartenGr commented Apr 30, 2024

RUrlus commented Jun 27, 2024

RUrlus commented Apr 15, 2024 •

edited

Loading