Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can SSE4.2 CRC be used as hash function? #96

Open
speeder opened this issue Dec 5, 2016 · 4 comments
Open

Can SSE4.2 CRC be used as hash function? #96

speeder opened this issue Dec 5, 2016 · 4 comments

Comments

@speeder
Copy link

speeder commented Dec 5, 2016

I saw that seemly SSE4.2 CRC is the fastest hash function for hash tables, sometimes having speeds 10 times faster than FNV for example.

Thus I was wondering if uthash can use it somehow.

@Quuxplusone
Copy link
Collaborator

Am I correct that you're referring to _mm_crc32_u64 from <smmintrin.h>?

My employer actually has a patch for this already, but it's a bit annoying to integrate because the definition of the HASH_CRC macro relies on _mm_crc32_u64, which won't be defined unless you've somehow #included <smmintrin.h> by the time you start using the HASH_ADD macro... and <uthash.h> itself cannot unconditionally #include <smmintrin.h> because that would break platforms that don't have that header.

I haven't thought about this much, but at the moment I think the most straightforward approach would be to create a new "add-on" header file, <uthash-sse4.h>, which does this:

#ifndef UTHASH_SSE4_H
#define UTHASH_SSE4_H
#include <smmintrin.h>
#include <stdint.h>
#define HASH_CRC(key, keylen, hashv)                               \
do {                                                               \
    unsigned _hc_num_longs = (keylen) / sizeof(uint64_t);          \
    unsigned _hc_num_bytes = (keylen) % sizeof(uint64_t);          \
    const uint64_t *_hc_longs = (uint64_t *)(key);                 \
    (hashv) = 0;                                                   \
    while (_hc_num_longs--) {                                      \
        (hashv) = _mm_crc32_u64(hashv, *_hc_longs++);              \
    }                                                              \
    const uint8_t *_hc_bytes = (const uint8_t*)_hc_longs;          \
    while (_hc_num_bytes--) {                                      \
        (hashv) = _mm_crc32_u8(hashv, *_hc_bytes++);               \
    }                                                              \
} while (0)
#ifndef HASH_FUNCTION
#define HASH_FUNCTION HASH_CRC
#endif /* HASH_FUNCTION */
#include "uthash.h"
#endif /* UTHASH_SSE4_H */

Is this the sort of thing you had in mind? If you've got a production codebase where you could actually try this out and give some feedback on where the rough edges are, whether this "add-on header" idea makes sense, etc., I'd be amenable to accepting (and/or writing ;)) a patch.

@speeder
Copy link
Author

speeder commented Dec 5, 2016

Yep, that is what I mean!

But I don't have a production codebase to test this yet, I came across uthash yesterday when choosen how I would design some stuff in first place (ie: my project is mostly in design phase, it has no code, yet...)

Depending on how things progress I might write the patch myself or something, I just wanted to know how possible (or not) it was.

EDIT: random comment, for correctness.

I just found out while googling more details, that the correct header is nmmintrin, with smmintrin being actually "wrong" on GCC (smmintrin is supposed to have exclusively SSE4.1 routines in them, the CRC one is from SSE4.2), and some other compilers (like clang) emulate GCC behaviour because how popular GCC is.

@tbeu
Copy link
Contributor

tbeu commented Sep 12, 2017

What about checks for compilers/platforms know to support SSE4.2 (e.g. taken from https://stackoverflow.com/questions/11228855/header-files-for-x86-simd-intrinsics)?

#if defined(_MSC_VER) && _MSC_VER >= 1800
#define HAVE_SSE4_2 1
#elif (defined(__GNUC__) && __GNUC__ >= 5) || \
      (defined(__GNUC__) && defined(__GNUC_MINOR__) && __GNUC__ == 4 && __GNUC_MINOR__ >= 5)
#if defined(__SSE4_2__)
#define HAVE_SSE4_2 1
#endif
#elif defined(__has_include) && (__has_include(nmmintrin.h))
#if defined(__SSE4_2__)
#define HAVE_SSE4_2 1
#endif
#endif

#if defined(HAVE_SSE4_2)
#include <nmmintrin.h>
#define HASH_CRC(key, keylen, hashv) \
do { \
...
} while(0)
#else
#define HASH_CRC(key, keylen, hashv) uthash_fatal("SSE4.2 instruction set not enabled")
#endif

Note: _mm_crc32_u64 needs extra check for _M_X64.

@Quuxplusone
Copy link
Collaborator

I'm inclined to close this issue as WONTFIX. If there were a pull request, it might un-stagnate, but this open issue doesn't reflect anything I consider to be on a "to-do list" for uthash.

The user can totally define their own SSE4.2-based hash function and expose it via -DHASH_FUNCTION=. If uthash were doing something that made that workflow unnecessarily difficult, that'd be a good uthash issue. But this issue seems to be just "uthash should write that SSE4.2 code for me and provide it out of the box"; that I consider WONTFIX. And more importantly WONTMAINTAIN. ;)

Any objections to closing this issue out? (Note: objections might not stop me from closing it anyway. But I'm curious how the submitter(s) feel, four years on.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants