Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower performance for 32-bit integers #1

Open
tdulcet opened this issue Sep 10, 2023 · 0 comments
Open

Slower performance for 32-bit integers #1

tdulcet opened this issue Sep 10, 2023 · 0 comments

Comments

@tdulcet
Copy link

tdulcet commented Sep 10, 2023

Thank you for creating this wonderful library!

I compared the performance to other similar libraries/programs and it is significantly faster for larger 128-bit integers. For example, compared to GNU factor, it is over 330× faster:

$ ./time.sh -i -r 5 "seq $(bc <<<'(2^127) - (10^2) - 1') $(bc <<<'(2^127) - 1') | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor
  Time (mean ± σ std dev):     143.1562s ±  0.5082s          [User: 143.1556s, System: 0.0000s]
  Range (min … median … max):  142.371s … 143.422s … 143.732s   CPU: 100.0%, 5 runs

Benchmark #2: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor
  Time (mean ± σ std dev):     142.3252s ±  0.7053s          [User: 141.9832s, System: 0.0060s]
  Range (min … median … max):  141.651s … 141.996s … 143.588s   CPU:  99.8%, 5 runs

Benchmark #3: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example
  Time (mean ± σ std dev):      0.9602s ±  0.7591s          [User: 0.5456s, System: 0.0080s]
  Range (min … median … max):   0.507s …  0.564s …  2.473s   CPU:  57.7%, 5 runs

Benchmark #4: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example
  Time (mean ± σ std dev):      0.4254s ±  0.1519s          [User: 0.3314s, System: 0.0080s]
  Range (min … median … max):   0.312s …  0.356s …  0.719s   CPU:  79.8%, 5 runs

Summary
  #4 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example’ ran
  336.521 ± 120.198 times (33,552.1%) faster than #1 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor’
  334.568 ± 119.506 times (33,356.8%) faster than #2 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor’
    2.257 ± 1.958 times (125.7%) faster than #3 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example’

For a couple of 128-bit integers with large 64-bit factors (from the GNU test suite), it is almost 2,000× faster:

$ ./time.sh -i -m 2 "echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor
  Time (mean ± σ std dev):     319.4225s ±  0.3625s          [User: 319.4225s, System: 0.0000s]
  Range (min … median … max):  319.060s … 319.422s … 319.785s   CPU: 100.0%, 2 runs

Benchmark #2: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor
  Time (mean ± σ std dev):     318.5655s ±  0.4765s          [User: 317.7500s, System: 0.0005s]
  Range (min … median … max):  318.089s … 318.566s … 319.042s   CPU:  99.7%, 2 runs

Benchmark #3: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example
  Time (mean ± σ std dev):      1.1580s ±  0.8340s          [User: 0.2590s, System: 0.0340s]
  Range (min … median … max):   0.324s …  1.158s …  1.992s   CPU:  25.3%, 2 runs

Benchmark #4: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example
  Time (mean ± σ std dev):      0.1634s ±  0.0169s          [User: 0.1365s, System: 0.0059s]
  Range (min … median … max):   0.152s …  0.158s …  0.219s   CPU:  87.2%, 13 runs

Summary
  #4 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example’ ran
1,955.034 ± 202.348 times (195,403.4%) faster than #1 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor’
1,949.789 ± 201.814 times (194,878.9%) faster than #2 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor’
    7.088 ± 5.157 times (608.8%) faster than #3 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example’

However, for smaller integers under 32-bits, it seems to be up to 11× slower than GNU factor:

$ ./time.sh -i -r 5 'seq 2 10000000 | '{factor,./gnu/factor,./uu/factor,./factoring/gcc_example,./factoring/clang_example,'./numbers -p'}
Benchmark #1: seq 2 10000000 | factor
  Time (mean ± σ std dev):      2.2944s ±  0.0112s          [User: 2.0142s, System: 0.4204s]
  Range (min … median … max):   2.281s …  2.291s …  2.313s   CPU: 106.1%, 5 runs

Benchmark #2: seq 2 10000000 | ./gnu/factor
  Time (mean ± σ std dev):      2.2020s ±  0.0087s          [User: 1.8820s, System: 0.4534s]
  Range (min … median … max):   2.188s …  2.201s …  2.213s   CPU: 106.1%, 5 runs

Benchmark #3: seq 2 10000000 | ./uu/factor
  Time (mean ± σ std dev):     12.6390s ±  0.0296s          [User: 10.2822s, System: 2.4632s]
  Range (min … median … max):  12.591s … 12.637s … 12.684s   CPU: 100.8%, 5 runs

Benchmark #4: seq 2 10000000 | ./factoring/gcc_example
  Time (mean ± σ std dev):     15.2668s ±  0.1457s          [User: 12.6782s, System: 2.7610s]
  Range (min … median … max):  15.093s … 15.291s … 15.512s   CPU: 101.1%, 5 runs

Benchmark #5: seq 2 10000000 | ./factoring/clang_example
  Time (mean ± σ std dev):     15.2658s ±  0.0899s          [User: 12.7338s, System: 2.7064s]
  Range (min … median … max):  15.172s … 15.212s … 15.419s   CPU: 101.1%, 5 runs

Benchmark #6: seq 2 10000000 | ./numbers -p
  Time (mean ± σ std dev):     19.5844s ±  0.2708s          [User: 17.0100s, System: 2.7530s]
  Range (min … median … max):  19.241s … 19.561s … 20.072s   CPU: 100.9%, 5 runs

Summary
  #2 ‘seq 2 10000000 | ./gnu/factor’ ran
    1.042 ± 0.007 times (4.2%) faster than #1 ‘seq 2 10000000 | factor’
    5.740 ± 0.026 times (474.0%) faster than #3 ‘seq 2 10000000 | ./uu/factor’
    6.933 ± 0.072 times (593.3%) faster than #4 ‘seq 2 10000000 | ./factoring/gcc_example’
    6.933 ± 0.049 times (593.3%) faster than #5 ‘seq 2 10000000 | ./factoring/clang_example’
    8.894 ± 0.128 times (789.4%) faster than #6 ‘seq 2 10000000 | ./numbers -p’

It seems to get progressively faster as the integers get larger. For integers less than 104, it is about 10.9× slower, for 105 about 8.1× slower, for 106 about 7.2× slower, for 107 about 6.9× slower and for 108 about 4.6× slower.

The above benchmarks were created with my Benchmarking Tool and include the system GNU factor (factor), a locally compiled version of GNU factor with -O3 and LTO enabled (./gnu/factor), uutils' Rust port of GNU factor (./uu/factor), a simple wrapper program for this library adapted from your example without CMake compiled with both GCC (./factoring/gcc_example) and Clang (./factoring/clang_example) and with inline ASM enabled, and lastly my simple Numbers Tool program which includes a partial C++ port of GNU factor (./numbers -p).

Hopefully I am using your library correctly to get maximum performance. For reference, the wrapper program I wrote for your library is available here: https://gist.github.com/tdulcet/6393865769644aaf9244561f3eb4e9c2. The commands I used to compile and run it are on top of the file. I used GCC 11.4 and Clang 14.0. Note that Clang did generate 42 compiler warnings, but in my benchmarking it was always faster than GCC with your library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant