You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cudaMemcpyAsync follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I think this means it would be safe to switch to cudaMemcpyAsync for all pushXXXToDevice operations which should reduce synchronisation overhead when streaming data from host->device significantly.
Furthermore, allocateMem and freeMem could almost certainly be sped up by using cudaMallocAsync and cudaFreeAsync (with a barrier at the end of the functions for safety)
The text was updated successfully, but these errors were encountered:
cudaMemcpyAsync
follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I think this means it would be safe to switch tocudaMemcpyAsync
for allpushXXXToDevice
operations which should reduce synchronisation overhead when streaming data from host->device significantly.Furthermore,
allocateMem
andfreeMem
could almost certainly be sped up by usingcudaMallocAsync
andcudaFreeAsync
(with a barrier at the end of the functions for safety)The text was updated successfully, but these errors were encountered: