-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance in single-buffer test #45
Comments
Hi @Blackbird52. I can't say what is wrong with your test. As you may know, the biggest benefit for using the multi-buffer hashing comes from submitting multiple independent jobs at once. We do have some special single-buffer or even two-buffer optimizations for when we detect the number of lanes filled is only at 1 or 2. |
@Blackbird52 , could you give more information like CPU, memory and other related system configuration? |
Have the same problem. We try to test openssl and isa-l,found the performance of isa-l single buffer for SHA256 is worse than Openssl. |
Hardware & Software Ingredients
|
I see the same on mac and linux containers. And this is my machine:
Here is my only diff in md5_mb/md5_mb_vs_ossl_perf.c:
building:
running:
I can also get similar 2x slower times in linux VM. Let me know if I can provide more information. |
@guymguym I think there is an obvious issue with your results.
I would expect multiple gigabytes/s here. Could it be that the host is not passing on all the native instruction set support to the VM? If configured to pass native, the VM should see AVX2. @Blackbird52, is the included multi-buffer test showing expected results? |
@gbtucker , yes, multi-buffer test got expected results. |
@gbtucker @Blackbird52 did you notice my change to Isn't that the same as the original issue? |
@gbtucker |
Sorry @guymguym, I did miss the As I said there are some optimizations for single and dual buffer but many of these are for later CPUs. If your system will primarily only utilize a single buffer at a time you may find the integration for multi-buffer hashing is not worth it. It may be difficult to say where the crossover point is for your integration without some experimentation. |
thanks @gbtucker. Even with 2 buffers I already get better performance:
However, I am not sure how to use multibuffers in my case - perhaps you can help me understand. My server is processing multiple streams of md5 concurrently (an S3 endpoint). When a stream starts it initializes an md5 context, and then whenever a buffer is read from the socket it is submitted to the mgr with its stream's context. But then I immediately have to call flush, because I want to submit the next buffer of that stream, so essentially to use more than one multibuffer per flush, what kind of event loop synchronization or "tick" should I implement? Is there a reference project that uses multibuffer for data streaming? |
@guymguym, it sounds like you want to split hash jobs into partial updates. This is entirely possible with the multi-buffer interface without having to flush after each. Typically the updates are tracked in the same job context with I would suggest having a context pool worker that manages this. Also |
Just a late bit of (hopefully useful) info for anyone who stumbles across this: From my quick scan of the code, it seems like the 1 or 2 buffer optimisations @gbtucker mentions only applies to SHA1/SHA256. For MD5, which is what @guymguym is testing here, there doesn't appear to be any optimisations for fewer than the ideal number of buffers, so ISA-L will always use the full SIMD width with two interleaved vectors regardless of how many lanes are active. |
We test run single hash jobs on one core:
OpenSSL is better than ISA-L(same result in update test):
If the test is wrong, please look through my steps above and tell me what I am doing wrong.
If not, please tell me why this result.
Thanks
The text was updated successfully, but these errors were encountered: