Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

Open
migueldiascosta opened this issue Feb 1, 2024 · 15 comments
Open

some qibojit benchmarks on NVIDIA Grace-Hopper (WIP) #165

migueldiascosta opened this issue Feb 1, 2024 · 15 comments

Comments

@migueldiascosta
Copy link

data and some plots at https://gist.github.com/migueldiascosta/0a0dbe061982bc4cc2bc7171785a4b86, as requested by @scarrazza

@scarrazza
Copy link
Member

Hi @migueldiascosta thank you so much for those benchmarks, this architecture looks interesting. Do you have also number/plots with A100? (cc @andrea-pasquale).

@migueldiascosta
Copy link
Author

Hi @migueldiascosta thank you so much for those benchmarks, this architecture looks interesting. Do you have also number/plots with A100? (cc @andrea-pasquale).

added A100 data (ran at NSCC) to https://gist.github.com/migueldiascosta/0a0dbe061982bc4cc2bc7171785a4b86

@scarrazza
Copy link
Member

Thanks a lot, these are quite interesting performance results for GH200.

@renatomello
Copy link
Contributor

@migueldiascosta @scarrazza I know this is not directly related here, but I think it could be interesting to run benchmarks on the Clifford simulator that @BrunoLiegiBastonLiegi is integrating with qibojit right now. He already ran benchmarks on the cluster's A6000.

@scarrazza
Copy link
Member

I think this is a good idea, so we could have some numbers for A6000, A100 and GH200.

@migueldiascosta
Copy link
Author

Will look into that - btw, are those A6000 benchmarks with library_benchmarks or with circuit_benchmarks? The plots currently in the gist where I mix your data with mine may not be an apples-to-apples comparison.

@renatomello
Copy link
Contributor

renatomello commented Feb 3, 2024

Will look into that - btw, are those A6000 benchmarks with library_benchmarks or with circuit_benchmarks? The plots currently in the gist where I mix your data with mine may not be an apples-to-apples comparison.

I don't know what those names mean, which I guess means it's with neither

@migueldiascosta
Copy link
Author

migueldiascosta commented Feb 3, 2024

see e.g. qiboteam/qibojit-benchmarks#45

the current GH200 data in the gist was obtained with qibojit-benchmarks's main.py / circuit_benchmarks, not with compare.py/ library_benchmarks

(because I had seen the latter spend most of the time on the single-CPU-thread conversion of the final state vector to a numpy array, and it was not what I was interested in benchmarking...)

@scarrazza
Copy link
Member

I believe the numbers quoted there have been obtained with compare.py. @stavros11, could you please confirm?

@stavros11
Copy link
Member

I believe the numbers quoted there have been obtained with compare.py. @stavros11, could you please confirm?

Indeed, all the numbers in the qibojit paper were obtained with compare.py. Looking at the bash scripts in the benchmark repository and also the numbers used to generate the plots, the data keys agree with compare.py (library_benchmark).

Therefore @migueldiascosta is right, if main.py was used for the new benchmarks, for GPUs it is not apples-to-apples comparison because the transfer-to-host (numpy) time is logged seperately in that script. For CPUs (numba) it shouldn't make a difference because numpy array is used throughout the simulation.

@migueldiascosta
Copy link
Author

migueldiascosta commented Feb 3, 2024

For CPUs (numba) it shouldn't make a difference because numpy array is used throughout the simulation.

Indeed, but maybe there are other differences between library_benchmark and circuit_benchmark? i.e., the huge difference in my plots between EPYC and Grace for smaller circuits is suspicious (and in general for the paper data, there seems to be a constant time that dominates for smaller circuits, the curves always start basically flat at around one second until about 20 qubits, mine don't)

qibo_scaling_qft_total_simulation_time_double cpu

@stavros11
Copy link
Member

Indeed, but maybe there are other differences between library_benchmark and circuit_benchmark? i.e., the huge difference in my plots between EPYC and Grace for smaller circuits is suspicious

I also noticed that and I am not sure how to explain. One thing that could have changed other than the scripts is the libraries versions. It has been two years since publication so qibo, qibojit and probably dependencies as well may have changed during that time. That is unless you are using the older versions.

Given that we still have access to most of the hardware we did the benchmarks on, we could retry the benchmarks from our side using the same versions and script you used. This way we will have a much more accurate comparison.

@migueldiascosta
Copy link
Author

migueldiascosta commented Feb 3, 2024

Yes, there could also be differences there, but now I'm thinking the ~1s constant time in your data is simply the import time, which is added to the "total_simulation_time" in load_data for the plots - it's also added to mine, but my import time is much shorter, which could be simply about disk IO (the system I'm using has NVMe drives and I'm loading from them, not from a network filesystem) and/or caching

@migueldiascosta
Copy link
Author

Actually, that's mentioned in the paper: "Furthermore, a constant of about one second is required to import the library, which can be relevant (comparable or larger than execution time) for simulation of small circuits. This is unlikely to impede practical usage as it is only a small constant overhead that is independent of the total simulation load."

@migueldiascosta
Copy link
Author

indeed, if I remove the import time the comparison looks more reasonable, e.g.

qibo_scaling_qft_total_simulation_time_double cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants