Use async memcpy for copying to device #580

neworderofjamie · 2023-04-14T21:10:50Z

cudaMemcpyAsync follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I think this means it would be safe to switch to cudaMemcpyAsync for all pushXXXToDevice operations which should reduce synchronisation overhead when streaming data from host->device significantly.

Furthermore, allocateMem and freeMem could almost certainly be sped up by using cudaMallocAsync and cudaFreeAsync (with a barrier at the end of the functions for safety)

The text was updated successfully, but these errors were encountered:

neworderofjamie added enhancement CUDA backend labels Apr 14, 2023

neworderofjamie modified the milestones: GeNN 5.0.0, GeNN 4.9.0 Apr 14, 2023

neworderofjamie modified the milestones: GeNN 4.9.0, GeNN 5.0.0 Oct 11, 2023

neworderofjamie modified the milestones: GeNN 5.0.0, GeNN 5.X Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use async memcpy for copying to device #580

Use async memcpy for copying to device #580

neworderofjamie commented Apr 14, 2023 •

edited

Loading

Use async memcpy for copying to device #580

Use async memcpy for copying to device #580

Comments

neworderofjamie commented Apr 14, 2023 • edited Loading

neworderofjamie commented Apr 14, 2023 •

edited

Loading