-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache: Allow using shared memory #446
Comments
First of all thank you Jason for all of the wonderful work you have been doing. Your articles on the ZODB are just brilliant.
https://dev.nextthought.com/blog/2019/10/intro-zodb.html
https://dev.nextthought.com/blog/2019/11/relstorage-30tml
I also hugely appreciate your efforts to document what you plan on doing. So let me expand on your email for beginners, and to ask you a few obvious questions.
SHARED PICKLE CACHE
You said:
“Currently, the storage pickle cache is private memory, allocated per-process.
”Your articles said:
“ • Multiple threads in the same process share a high-performance in-memory pickle cache to reduce the number of queries to the RDBMS. This is similar to ZEO, and the ZEO cache trace tools are supported.
”
Just to be clear, currently the cache is in the same process, and only in RelStorage, not in FileStorage. You want to expand this to shared memory across processes in RelStorage, but still not in FileStorage. Presumably because FileStorage can only write from one process.
CACHE INVALIDATION
So I thought that the shared pickle cache eliminated the need for Cache invalidation. Is that true? I understand that databases with server process do the cache invalidation. But how did SQLite do cache invalidation if it did not have a fully shared pickle cache, and no shared server process?
FILE STORAGE CACHE
I want to port the existing shared pickle cache to a single shared process in FileStorage. Which begs the question why can SQLite write from multiple processes, but FileStorage can only write from one process? The file lock could be acquired by any process.
I hope that this user feedback helps you.
Warm Regards
Christopher Lozinski
https://PythonLinks.info
US tel: +1 650 614 1836
EU tel: +48 32 361 3136
Skype: clozinski
|
(ETA some clarifications.)
Yes.
No. Because, just like the SQLite backend in RelStorage, FileStorage wouldn't benefit. Because the data exists only as a file on one machine and there is no server involved, FileStorage uses the operating system's filesystem cache as its pickle cache. It's automatically as big as it can be without impacting application memory needs.
The shared pickle cache has nothing at all to do with invalidation. All ZODB storages have to deal with invalidation in one way or another. ZEO does it via pushing invalidations from the server to clients. RelStorage does it via polling the server in each client (SQLite counts as a server for this purpose; by "server" I just mean "the central data store"). Changes in RelStorage 3 made that polling more efficient by sharing some state between different connections in the same process. (That state could also be moved to shared memory and re-used between processes, but (a) I don't have any indication that would actually be a significant benefit anymore — polling has gotten pretty fast already — and (b) the design of that state is all in Python objects and would be much harder to move compared to the pickle cache, which is already implemented in C++.)
I wouldn't recommend that.
RelStorage and SQLite were designed to be used from multiple processes, FileStorage wasn't. It keeps certain state in-memory (e.g., the index in the |
Currently, the storage pickle cache is private memory, allocated per-process.
A common architecture for servers (e.g., gunicorn) is to spawn many worker processes on a single machine as a way to utilize multiple cores. Each such worker process gets its own pickle cache (per RelStorage storage, which could be greater than 1 in a multi-db scenario).
As the number of cores and workers goes up, the amount of memory needed to keep a reasonable-sized RelStorage cache also goes up. Even if the memory was initially shared due to
fork()
, because of the nature of the cache, the pages quickly become dirty and have to be copied.I've been investigating, and think it should be possible to move the storage caches into shared memory on Unix and Windows. The option that requires the least code changes and keeps most of the caching logic intact uses boost.interprocess (we're already using
boost.intrusive
in the cache).Benefits include:
zc.resumelb
in use that tries to direct similar work to the same worker) this should result in overall better hit rates.Possible drawbacks/open questions include:
byte
objects is the cache directly, meaning there is no memory copy involved to read or write to the cache. Shared memory will require at least a write copy; it may or may not be possible to implement 0-copy reads.Initially, for the smallest code changes, shared memory caches will only work with processes on Unix that are related via
fork()
: this is because the C++ objects have vtables in them and those same vtable pointers must be valid in all processes accessing the cache. Only child processes have that guarantee (and only if RelStorage was loaded in the parent process before thefork()
). Over time, it should be possible to remove this restriction.The text was updated successfully, but these errors were encountered: