Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large PostgreSQL database with blobs #500

Open
tflorac opened this issue Oct 6, 2023 · 3 comments
Open

Large PostgreSQL database with blobs #500

tflorac opened this issue Oct 6, 2023 · 3 comments

Comments

@tflorac
Copy link

tflorac commented Oct 6, 2023

Hi,
I've built a Pyramid "file management" application using RelStorage with a PostgreSQL back-end.
The ZODB is actually storing 2 millions files (for more than 2 Tb of storage) which are stored as ZODB blobs in "shared" mode (which is not recommended anymore), using NFS to share the storage between clients (which are also using a local NFS cache).
I'm thinking about switching to a new environment, using native PostgreSQL blobs, with streaming replication to several read-only servers (which is handled natively by RelStorage), but:

  • is there any drawback to PostgreSQL blobs against ZODB shared blobs with this kind of metrics?
  • I was using a simple pg_dump before a file system backup, but this doesn't seem possible anymore; what is the best way to manage PostgreSQL backups when using native blobs of this size?

Best regards,
Thierry

@jamadden
Copy link
Member

jamadden commented Oct 6, 2023

I can only speak to my own experience and the RelStorage code.

A previous company I worked for had a similarly sized ZODB deployment with tons of blobs. We never used shared blobs because managing a separate highly-available NFS deployment was another layer of complication we didn't want to deal with. Native PG blobs and the local blob cache were plenty performant for our uses.

If you have any concurrent write activity at all, shared blobs absolutely kill RelStorage/PG performance by essentially eliminating concurrent commits.

I never dealt with trying to backup a large shared blob deployment so I have no recommendations (other than "don't use shared blobs" 😄 )

@tflorac
Copy link
Author

tflorac commented Oct 7, 2023

Thank for your reply !
Backing up shared blobs is not a big problem, as it's just a basic filesystem backup (even if you have to make this backup synchronized with your database).
The problem is to make an online backup of a very large PostgreSQL database, without doing a pg_dump which would require double storage space...

@mamico
Copy link
Contributor

mamico commented Nov 24, 2023

The problem is to make an online backup of a very large PostgreSQL database, without doing a pg_dump which would require double storage space...

@tflorac My 5 cents, probably you should look for a system backup solution (raw storage backup), rather than a logical one (sql dump). In the lands of OSS a solution might be https://github.com/pgbackrest/pgbackrest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants