vxyz
zstd-1.3.4 integrates long range mode with multithreaded compression. The long range match finder runs in serial, and then the backend compressors integrate regular zstd matches with the preprocessed long range matches in parallel. The divide is to limit the memory usage to the maximum of the long range window size, and the memory that zstdmt requires without long range matching. This long range matcher runs at about 200 MB/s, so depending on the number of cores available for the backend compression, you can tune the compression level to match.
zstd -T0 -5 --long file # autodetect threads, level 5, 128 MB window
zstd -T16 -10 --long=31 file # 16 threads, level 10, 2 GB window
Benchmarks on the two files "Linux 4.7 - 4.12" and "Linux git" from the 1.3.2 release are shown below. All the compressors are run with 16 threads, except "zstd single 2 GB". The zstd compressors are run with either a 128 MB or 2 GB window size, and the lrzip compressor is run with the lzo, gzip, and xz backends. The benchmarks were run on a 16 core Sandy Bridge @ 2.2 GHz.