Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore options on zstd for compression performance #642

Open
gtkramer opened this issue Nov 8, 2019 · 15 comments
Open

Explore options on zstd for compression performance #642

gtkramer opened this issue Nov 8, 2019 · 15 comments
Labels

Comments

@gtkramer
Copy link

gtkramer commented Nov 8, 2019

Either from a CPU or memory perspective, let's explore if using the options for zstd affects compression performance to see if it can get better than xz by a degree enough to switch to zstd by default.

@ashleshaAtrey ashleshaAtrey self-assigned this Nov 8, 2019
@ashleshaAtrey
Copy link
Contributor

I explored various flag of zstd like fast, adapt, ultra and tried different levels. The size vs time trade off is not worth to switch zstd as default .

@bryteise
Copy link
Member

@ashleshaAtrey I'm interested in if you still have those numbers around. It might be something that is worth exposing as a configuration option to the user. Some may find the trade-off to be worth it depending on the particular content that they are providing as eventually this would be used by 3rd-party bundles as well.

@rchiossi
Copy link
Contributor

@bryteise It is already exposed. You can use it by setting COMPRESSION = ["external-zstd"] in builder.conf

@ashleshaAtrey
Copy link
Contributor

Total size before compression 43184317440
Zstd
Total size after compression 16528392879
CREATE FULLFILES 13m33.7s

zstd --fast=22
Total size after compression 25457700882
CREATE FULLFILES 13m19.447s

Zstd --adapt
Total size after compression 16354637680
CREATE FULLFILES 13m25.032s

Zstd --fast
Total size after compression 19045817625
CREATE FULLFILES 13m11.38s

Zstd --ultra
Total size after compression 16380224911
CREATE FULLFILES 13m22.015s

@rchiossi
Copy link
Contributor

Do you have the numbers for the other compression methods as well (xz, bzip2, gzip)?

@ashleshaAtrey
Copy link
Contributor

Xz:
Total size before compression 43183234560
Total size after compression 13877547116
CREATE FULLFILES 14m55.218s

bzip2
Total size before compression 43183935488
Total size after compression 16014229591
CREATE FULLFILES 12m18.674s

gzip
Total size before compression 43182304256
Total size after compression 16910068787
CREATE FULLFILES 11m54.421s

@phmccarty
Copy link
Contributor

I think that decompression speed is also worth cross-comparing, because xz tends to be slower than zstd in this area, and swupd has to decompress many files in course of its operation.

@phmccarty
Copy link
Contributor

Also note Arjan's old blog post where he conducted a detailed cross-comparison for compression types. It would be awesome to produce a followup to that post with the latest findings.

@ashleshaAtrey
Copy link
Contributor

ashleshaAtrey commented Mar 27, 2020

I ran decompression tests on 906752 fullfiles using the tar utility,

For XZ compression, time took to decompress: 11134 seconds
For Zstd compression, time took to decompress: 19767 seconds
To decompress optimal size( mix of XZ or Zstd files): 12991 seconds

Next, I will work on finding the stats for memory used while compressing and decompressing those files.

@ashleshaAtrey
Copy link
Contributor

Memory used while creating fullfiles:
Alloc is bytes of allocated heap objects.
TotalAlloc increases as heap objects are allocated, but unlike Alloc and HeapAlloc, it does not decrease when objects are freed
Sys is the total bytes of memory obtained from the OS

Zstd compression
Alloc = 6493 MiB TotalAlloc = 49925 MiB Sys = 9213 MiB
xz compression
Alloc = 5850 MiB TotalAlloc = 49926 MiB Sys = 9344 MiB

@bryteise
Copy link
Member

Interesting zstd is taking longer to decompress (and by a fairly significant amount too). That's really surprising.

@reaganlo
Copy link
Contributor

reaganlo commented Mar 30, 2020

@ashleshaAtrey Can you review if the numbers and units look correct.

  compression time (minutes) decompression time (minutes) memory usage (MiB) compression size (Bytes)
xz 14.91666667 185.5666667 9344 13877547116
zstd 13.55 329.45 9213 16528392879

@phmccarty
Copy link
Contributor

phmccarty commented Mar 30, 2020

I am very surprised to see zstd have such a large decompression time. In my experience, decompression time for zstd has been dramatically faster than xz...

@bryteise
Copy link
Member

Thinking about this a little more.

I wonder if there is a set of files which take disproportionately longer to decompress (but there are few of them) or files that take slightly longer to decompress (but there are many of them)?

Figuring this out would be enlightening. Do you have file by file time differences?

@ashleshaAtrey
Copy link
Contributor

I will work on finding file by file time difference, dont have those stats handy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants