some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

migueldiascosta · 2024-02-01T08:24:19Z

data and some plots at https://gist.github.com/migueldiascosta/0a0dbe061982bc4cc2bc7171785a4b86, as requested by @scarrazza

scarrazza · 2024-02-01T10:46:24Z

Hi @migueldiascosta thank you so much for those benchmarks, this architecture looks interesting. Do you have also number/plots with A100? (cc @andrea-pasquale).

migueldiascosta · 2024-02-02T00:58:32Z

Hi @migueldiascosta thank you so much for those benchmarks, this architecture looks interesting. Do you have also number/plots with A100? (cc @andrea-pasquale).

added A100 data (ran at NSCC) to https://gist.github.com/migueldiascosta/0a0dbe061982bc4cc2bc7171785a4b86

scarrazza · 2024-02-02T09:34:57Z

Thanks a lot, these are quite interesting performance results for GH200.

renatomello · 2024-02-02T09:43:14Z

@migueldiascosta @scarrazza I know this is not directly related here, but I think it could be interesting to run benchmarks on the Clifford simulator that @BrunoLiegiBastonLiegi is integrating with qibojit right now. He already ran benchmarks on the cluster's A6000.

scarrazza · 2024-02-02T10:06:55Z

I think this is a good idea, so we could have some numbers for A6000, A100 and GH200.

migueldiascosta · 2024-02-03T04:33:11Z

Will look into that - btw, are those A6000 benchmarks with library_benchmarks or with circuit_benchmarks? The plots currently in the gist where I mix your data with mine may not be an apples-to-apples comparison.

renatomello · 2024-02-03T04:35:32Z

Will look into that - btw, are those A6000 benchmarks with library_benchmarks or with circuit_benchmarks? The plots currently in the gist where I mix your data with mine may not be an apples-to-apples comparison.

I don't know what those names mean, which I guess means it's with neither

migueldiascosta · 2024-02-03T04:40:44Z

see e.g. qiboteam/qibojit-benchmarks#45

the current GH200 data in the gist was obtained with qibojit-benchmarks's main.py / circuit_benchmarks, not with compare.py/ library_benchmarks

(because I had seen the latter spend most of the time on the single-CPU-thread conversion of the final state vector to a numpy array, and it was not what I was interested in benchmarking...)

scarrazza · 2024-02-03T07:31:57Z

I believe the numbers quoted there have been obtained with compare.py. @stavros11, could you please confirm?

stavros11 · 2024-02-03T08:32:57Z

I believe the numbers quoted there have been obtained with compare.py. @stavros11, could you please confirm?

Indeed, all the numbers in the qibojit paper were obtained with compare.py. Looking at the bash scripts in the benchmark repository and also the numbers used to generate the plots, the data keys agree with compare.py (library_benchmark).

Therefore @migueldiascosta is right, if main.py was used for the new benchmarks, for GPUs it is not apples-to-apples comparison because the transfer-to-host (numpy) time is logged seperately in that script. For CPUs (numba) it shouldn't make a difference because numpy array is used throughout the simulation.

migueldiascosta · 2024-02-03T09:04:38Z

For CPUs (numba) it shouldn't make a difference because numpy array is used throughout the simulation.

Indeed, but maybe there are other differences between library_benchmark and circuit_benchmark? i.e., the huge difference in my plots between EPYC and Grace for smaller circuits is suspicious (and in general for the paper data, there seems to be a constant time that dominates for smaller circuits, the curves always start basically flat at around one second until about 20 qubits, mine don't)

stavros11 · 2024-02-03T09:27:47Z

Indeed, but maybe there are other differences between library_benchmark and circuit_benchmark? i.e., the huge difference in my plots between EPYC and Grace for smaller circuits is suspicious

I also noticed that and I am not sure how to explain. One thing that could have changed other than the scripts is the libraries versions. It has been two years since publication so qibo, qibojit and probably dependencies as well may have changed during that time. That is unless you are using the older versions.

Given that we still have access to most of the hardware we did the benchmarks on, we could retry the benchmarks from our side using the same versions and script you used. This way we will have a much more accurate comparison.

migueldiascosta · 2024-02-03T09:54:50Z

Yes, there could also be differences there, but now I'm thinking the ~1s constant time in your data is simply the import time, which is added to the "total_simulation_time" in load_data for the plots - it's also added to mine, but my import time is much shorter, which could be simply about disk IO (the system I'm using has NVMe drives and I'm loading from them, not from a network filesystem) and/or caching

migueldiascosta · 2024-02-03T10:03:27Z

Actually, that's mentioned in the paper: "Furthermore, a constant of about one second is required to import the library, which can be relevant (comparable or larger than execution time) for simulation of small circuits. This is unlikely to impede practical usage as it is only a small constant overhead that is independent of the total simulation load."

migueldiascosta · 2024-02-03T10:17:12Z

indeed, if I remove the import time the comparison looks more reasonable, e.g.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

migueldiascosta commented Feb 1, 2024

scarrazza commented Feb 1, 2024

migueldiascosta commented Feb 2, 2024

scarrazza commented Feb 2, 2024

renatomello commented Feb 2, 2024

scarrazza commented Feb 2, 2024

migueldiascosta commented Feb 3, 2024

renatomello commented Feb 3, 2024 •

edited

Loading

migueldiascosta commented Feb 3, 2024 •

edited

Loading

scarrazza commented Feb 3, 2024

stavros11 commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024 •

edited

Loading

stavros11 commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024 •

edited

Loading

migueldiascosta commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

Comments

migueldiascosta commented Feb 1, 2024

scarrazza commented Feb 1, 2024

migueldiascosta commented Feb 2, 2024

scarrazza commented Feb 2, 2024

renatomello commented Feb 2, 2024

scarrazza commented Feb 2, 2024

migueldiascosta commented Feb 3, 2024

renatomello commented Feb 3, 2024 • edited Loading

migueldiascosta commented Feb 3, 2024 • edited Loading

scarrazza commented Feb 3, 2024

stavros11 commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024 • edited Loading

stavros11 commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024 • edited Loading

migueldiascosta commented Feb 3, 2024

migueldiascosta commented Feb 3, 2024

renatomello commented Feb 3, 2024 •

edited

Loading

migueldiascosta commented Feb 3, 2024 •

edited

Loading

migueldiascosta commented Feb 3, 2024 •

edited

Loading

migueldiascosta commented Feb 3, 2024 •

edited

Loading