Feature/factor fix gnu tests with num prime #6110

cre4ture · 2024-03-23T18:32:33Z

this addresses #6109 which is about failing gnu test tests/factor/factor.pl

changes:

replace custom implementation for factorisation with third party crate num_prime (Apache-2.0 license).
this allows for input numbers > u128
adapt gnu-test expected error messages with custom ones by applying patch
added unittest for the failing gnu tests

sylvestre · 2024-03-23T18:40:11Z

impressive. how many lines it remove ?
did you look at performances ?
See: https://github.com/uutils/coreutils/blob/main/src/uu/factor/BENCHMARKING.md

github-actions · 2024-03-23T19:07:34Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Congrats! The gnu test tests/truncate/truncate-parameters is no longer failing!

cre4ture · 2024-03-23T20:34:33Z

@sylvestre It removed ~1400 lines of code.

As you requested, I also looked into performance now.
Reference main branch:

PR branch with num_prime crate:

At first glance it might look promising, but actually there is a factor 1000 difference (us vs. ms) :-/ thats sad

sylvestre · 2024-03-23T20:38:46Z

are you sure you built in release mode ? thanks

cre4ture · 2024-03-24T11:43:10Z

OK, I found the issue with the performance comparison.
The existing benchmark was measuring only a part of the full factorization step.
I misunderstood this and thus the comparison was unfair.

I adapted the performance test such that it now compares the full facorization of uutils with num_prime in the same run such that also compiler optimization is guaranteed to be the same.
The result shows that the performance is comparable. num_prime even seems to be a bit more performant in average.
Therefor I think its fine to do the replacement.

cre4ture · 2024-03-24T11:48:15Z

@sylvestre what can I do about this error in the cargo-deny step?

I've never had this issue before.

can I ignore the warning?
do I need to fix the issue with the duplicated entries by aligning the versions? How do I do this?

sylvestre · 2024-03-24T12:04:24Z

Ignore the warning in deny.toml
you can also try to contribute to lru to hashbrown

sylvestre · 2024-03-24T12:39:18Z

The existing benchmark was measuring only a part of the full factorization step.

maybe update the benchmark accordingly ?
cc @nbraud if you are curious :)

cre4ture · 2024-03-24T13:44:50Z

I submitted the changes I did for the direct comparison here:
https://github.com/cre4ture/coreutils/tree/test/factor_compare_different_implementations

You can see that I also tested different other third-party crates:

prime_factorization = "1.0.4"
prime-factor = "0.4.6"
primeshor = "0.2.0"
factor-rs = "1.2.1"
slow_primes = "0.1.14"

But none of it had better performance results than num_prime

github-actions · 2024-03-24T14:35:11Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions · 2024-03-24T15:38:29Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions · 2024-03-24T17:11:15Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

tertsdiepraam · 2024-03-24T18:34:40Z

Cool! For performance, it would be interesting to see if we can use factorize128 and factorize64 too if the input number is small enough. But maybe for another PR and maybe the library does that already internally.

tdulcet · 2024-03-24T19:55:45Z

Just a humble user of uutils' factor here, but I would not recommend removing all of this code. @nbraud put years of work into this and it looks like the num_prime library you are replacing it with has not been updated since 2022.

The result shows that the performance is comparable.

Considering that the performance of uutils' factor is already up to 15× slower than GNU factor and over 20× slower other programs (see the recent benchmarks in #1456), if the performance of this num_prime library is only "comparable" to the existing code, that likely means uutils' factor would never be able to match the performance of GNU factor if this library is used. According to the documentation, this library does not even implement all of the algorithms of GNU factor, let alone any modern algorithms like ECM used by other tools.

tertsdiepraam · 2024-03-24T20:27:29Z

@tdulcet You raise some good points. The work by @nbraud is awesome, as shown by the fact that most of the libraries out there are slower than their implementation. However, I don't think anybody currently maintaining the project has the expertise/time to really improve this further within this repository. Therefore, I think we actually should rely on another library stands more on its own. The way I see it, there are several options:

Use a num-prime or another third-party library. Ideally one that is well-maintained and very fast.
Extract the factorization code into a separate library. Set up even better benchmarking and start improving it there. And then hope we can attract some contributors who want to improve it.
Leave it in and keep it until someone magically comes along and improves it.

I really hope that someone in the Rust ecosystem comes along who can write factor algorithms on par with GNU, but I'm pretty sure it's not going to be us.

if the performance of this num_prime library is only "comparable" to the existing code, that likely means uutils' factor would never be able to match the performance of GNU factor if this library is used.

This is why I mentioned factorize64. That might be a big improvement. Also remember that it's comparable in performance, but supports numbers larger than u128. So it's probably much faster than if we would convert all u128 in uutils to big decimals.

tertsdiepraam · 2024-03-24T20:34:44Z

@cre4ture as another data point, have you tried primal?

tdulcet · 2024-03-25T11:23:01Z

The way I see it, there are several options

Option 1 would be great if such a Rust library exists, but based on @cre4ture's benchmarks above, it does not seem like this is the case currently. I did benchmark a library in #1456 (comment) that is 3.5× faster than GNU factor for 64-bit numbers, 29× faster for 96-bit numbers and 334× faster for 127-bit numbers, but it is written in C++. I am not sure how difficult it might be to port to Rust.

Option 2 may be the best option, as it would allow uutils to retain full control of the performance. If it were possible to mend fences with @nbraud, they would obviously be the most qualified contributor to develop this new library. Something like CodSpeed could be used to measure the performance in the CI.

This is why I mentioned factorize64.

It is definitely worth a try, but it looks like this function uses the same algorithms and thus is still missing several of those used by GNU factor.

So it's probably much faster than if we would convert all u128 in uutils to big decimals.

The existing code actually only supports u64 (see #1559). Presumably it would be easy to update it support u128 (or larger), which would also allow the GNU tests to pass, although I suspect the performance would be abysmal, which it why no one has done this yet. I extrapolated the numbers in #1456 and found uutils' factor could be over 10,000× slower than that library I mentioned above for some 128-bit numbers.

Note that GNU factor has three code paths, one for up 63-bit integers, one for up 127-bit integers and one for arbitrary precision integers using the GNU MP library, so I suspect uutils' factor may need to do the same for performance, similar to what you were proposing with factorize64 and factorize128. My personal interest is factoring ~64-256-bit numbers, which is where GNU factor is particularly slow and I am hoping uutils' factor can eventually fill that void.

tertsdiepraam · 2024-03-25T11:52:20Z

I'm honestly still surprised that no good rust library has emerged yet. It feels like an obvious thing to tackle.

cre4ture · 2024-03-25T22:37:48Z

@cre4ture as another data point, have you tried primal?

I added primal now. But it seems to be quite slow. I canceled the execution of the performance test because it takes so long. And it only supports usize which is u64 on most modern PCs.
Meanwhile I stumbled accross another crate called number-theory, which does support >u128, but is super slow.

In general I need to state that I'm far away from beeing an expert in primal factorization. So maybe I'm doing something wrong in my comparison. So everybody is welcome to do reviews/suggestions. My current branch for performance comparison is here: https://github.com/cre4ture/coreutils/tree/test/factor_compare_different_implementations .

I'm with @tertsdiepraam. I think the coreutils crate is not an ideal place to maintain a factorisation algorithm implementation. There are many usecases out there (I assume) which could make use of a smart implementation such that its clearly worth to have it in an own crate. Of course, I respect the authors of the uutils implementation. I could never have done it like this.
In addtion to the options mentioned by @tertsdiepraam , I see also the option that someone could merge the implemenation of uutils and num-prime to get best from both crates.

The good thing is, that the interface to all those libraries that I tested is small and always similar. An exchange of the library for the implementation of the algorithm is low effort and easy. So even if we drop the uutils implementation for now, we could easily re-introduce it in future if someone improved it.
So I hope nobody is angry at me if I still suggest to do the replacement now.

sylvestre · 2024-03-26T08:17:32Z

So I hope nobody is angry at me if I still suggest to do the replacement now.

nope, we love this kind of work, thanks :)

github-actions · 2024-03-30T17:32:36Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!

github-actions · 2024-03-30T21:00:43Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

src/uu/factor/src/factor.rs

sylvestre · 2024-03-30T21:50:12Z

this is terrific!

github-actions · 2024-03-31T02:23:14Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!

cre4ture · 2024-03-31T14:06:31Z

@sylvestre how do I ignore the issue with the duplicate "hashbrown" crate?

I think to fix we would need to upgrade both sides as both are using a outdate version of hashbrown. Latest is 14.3.

cakebaker · 2024-03-31T14:09:00Z

@cre4ture you can add the hashbrown crate to the skip list in deny.toml.

github-actions · 2024-03-31T15:37:58Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions · 2024-04-01T13:10:17Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions · 2024-04-06T21:24:01Z

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/factor/factor is no longer failing!

github-actions · 2024-04-20T21:58:54Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!

cre4ture · 2024-04-21T09:29:50Z

@sylvestre can we merge this PR? or is there something open to do?

sylvestre · 2024-04-21T09:38:00Z

yeah, let's take it
but let's see the result of the last run of the CI

github-actions · 2024-04-21T09:52:47Z

GNU testsuite comparison:

Congrats! The gnu test tests/factor/factor is no longer failing!

sylvestre · 2024-04-21T10:49:44Z

Congrats! The gnu test tests/factor/factor is no longer failing!

and CI fully green!

sylvestre · 2024-04-21T10:52:37Z

well done!

nbraud · 2024-05-02T15:36:55Z

The work by @nbraud is awesome, as shown by the fact that most of the libraries out there are slower than their implementation.

I'm honestly surprised: I hadn't found a not-horribly-slow and actively maintained lib for this, back when I started working on uufactor, but I never got around to doing much more than picking the low-hanging fruits.

I had planned to implement some methods based on square forms, elliptic curves (EECM and HECM) and possibly the quadratic sieve once I got around to implementing arbitrary-sized inputs... but instead discovered that the I/O path was the main perf. issue at the time.

I still fixed the most glaring issues there, but the lack of deep interest in this particular bit of I/O plumbing, and hostile maintainer, killed all the momentum and interest I had going.

PS: Thanks for the kind words, though. ❤️
PS²: In case anyone else wants to work on factorization performance, I recall finding out that batching was pretty helpful but IDK how much of that was already implemented in uutils when I stopped contributing:

amortizes out the cost of accessing tables during preprocessing ;
interleaving factoring operations can be helpful to leverage the superscalar nature of modern CPUs, or even write explicit SIMD code if there are enough parallel factorizations to be done.

Use a num-prime or another third-party library. Ideally one that is well-maintained and very fast.

Extract the factorization code into a separate library. Set up even better benchmarking and start improving it there. And then hope we can attract some contributors who want to improve it.

[...] I really hope that someone in the Rust ecosystem comes along who can write factor algorithms on par with GNU, but I'm pretty sure it's not going to be us.

I could potentially be interested in doing just that, either within an existing lib or extracting my last commit from uufactor... though TBH my current priorities involve surviving the next year: in principle, being unemployed leaves a lot of time for writing cool soffware, but in practice... 😬

cakebaker linked an issue Mar 24, 2024 that may be closed by this pull request

factor fix GNU tests tests/factor/factor.pl and tests/factor/factor #6109

Closed

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from a44499d to c06c580 Compare March 24, 2024 14:06

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from c06c580 to 43c6a91 Compare March 24, 2024 15:10

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch 2 times, most recently from 0f6d4d1 to 44eb3da Compare March 24, 2024 16:42

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from ef42dea to b34a099 Compare March 30, 2024 20:32

sylvestre reviewed Mar 30, 2024

View reviewed changes

src/uu/factor/src/factor.rs Outdated Show resolved Hide resolved

sylvestre reviewed Mar 30, 2024

View reviewed changes

src/uu/factor/src/factor.rs Outdated Show resolved Hide resolved

sylvestre force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from b34a099 to 8bad2ef Compare March 30, 2024 21:52

cre4ture requested a review from sylvestre March 30, 2024 22:49

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from df9b03c to f90166e Compare April 1, 2024 12:23

sylvestre force-pushed the main branch from b696822 to cac7155 Compare April 1, 2024 15:11

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from f90166e to a3639d4 Compare April 6, 2024 20:55

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from a3639d4 to c3c7ef8 Compare April 20, 2024 21:29

use num_prime for factorisation to fix gnu test

d202bab

cre4ture force-pushed the feature/factor_fix_gnu_tests_with_num_prime branch from c3c7ef8 to d202bab Compare April 21, 2024 09:24

sylvestre merged commit 0b78b77 into uutils:main Apr 21, 2024
70 checks passed

Feature/factor fix gnu tests with num prime #6110

Feature/factor fix gnu tests with num prime #6110

Conversation

cre4ture commented Mar 23, 2024 • edited Loading

sylvestre commented Mar 23, 2024

github-actions bot commented Mar 23, 2024

cre4ture commented Mar 23, 2024

sylvestre commented Mar 23, 2024

cre4ture commented Mar 24, 2024 • edited Loading

cre4ture commented Mar 24, 2024

sylvestre commented Mar 24, 2024

sylvestre commented Mar 24, 2024 • edited Loading

cre4ture commented Mar 24, 2024

github-actions bot commented Mar 24, 2024

github-actions bot commented Mar 24, 2024

github-actions bot commented Mar 24, 2024

tertsdiepraam commented Mar 24, 2024

tdulcet commented Mar 24, 2024

tertsdiepraam commented Mar 24, 2024

tertsdiepraam commented Mar 24, 2024

tdulcet commented Mar 25, 2024

tertsdiepraam commented Mar 25, 2024

cre4ture commented Mar 25, 2024

sylvestre commented Mar 26, 2024

github-actions bot commented Mar 30, 2024

github-actions bot commented Mar 30, 2024

sylvestre commented Mar 30, 2024

github-actions bot commented Mar 31, 2024

cre4ture commented Mar 31, 2024 • edited Loading

cakebaker commented Mar 31, 2024 • edited Loading

github-actions bot commented Mar 31, 2024

github-actions bot commented Apr 1, 2024

github-actions bot commented Apr 6, 2024

github-actions bot commented Apr 20, 2024

cre4ture commented Apr 21, 2024

sylvestre commented Apr 21, 2024

github-actions bot commented Apr 21, 2024

sylvestre commented Apr 21, 2024

sylvestre commented Apr 21, 2024

nbraud commented May 2, 2024 • edited Loading

cre4ture commented Mar 23, 2024 •

edited

Loading

cre4ture commented Mar 24, 2024 •

edited

Loading

sylvestre commented Mar 24, 2024 •

edited

Loading

cre4ture commented Mar 31, 2024 •

edited

Loading

cakebaker commented Mar 31, 2024 •

edited

Loading

nbraud commented May 2, 2024 •

edited

Loading