Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Xor filters #239

Open
data-man opened this issue Jan 29, 2022 · 4 comments
Open

Integrate Xor filters #239

data-man opened this issue Jan 29, 2022 · 4 comments

Comments

@data-man
Copy link

The reference implementation: xor_singleheader
If you don't mind the idea, I can start working on it. ;)

@Quuxplusone
Copy link
Collaborator

I guess I have no specific objection; an xor filter looks like just a slightly more memory-efficient form of Bloom filter?
If so, I would expect that your patch would simply hook into the existing HASH_BLOOM macros: where today it says

#ifdef HASH_BLOOM

you'd make it more like

#if defined(HASH_BLOOM) && HASH_BLOOM_USE_XORFILTER
[...your new definitions for the bloom-filter macros...]
#elif defined(HASH_BLOOM)
[...existing bloom-filter macros...]
#else
#define HASH_BLOOM_MAKE(tbl,oomed)
#define HASH_BLOOM_FREE(tbl)
#define HASH_BLOOM_ADD(tbl,hashv)
#define HASH_BLOOM_TEST(tbl,hashv) (1)
#define HASH_BLOOM_BYTELEN 0U
#endif

(Think about the naming of HASH_BLOOM_USE_XORFILTER and justify your choice, because I didn't think about it hard at all. Should it analogize to picking a different hash function? Should it analogize to picking an alternative malloc or strlen implementation?)

@data-man
Copy link
Author

an xor filter looks like just a slightly more memory-efficient form of Bloom filter?

No.
But I'm sure @lemire will answer better. :)

See also: https://github.com/skeeto/xf8 (C99).

@lemire
Copy link

lemire commented Jan 30, 2022

@data-man We have fuse filters, which are even better. Our C implementation is available there:

https://github.com/FastFilter/xor_singleheader

I invite you to read the paper...

https://arxiv.org/abs/2201.01174

It can be much faster and much smaller than a Bloom filter, but it is immutable.

@lemire
Copy link

lemire commented Jan 30, 2022

(That is, you provide your keys and you build the filter once. If your keys change, you rebuild the filter.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants