Better data compressors
Contents
Lesser-known compressors
You can gain in performance and/or compression ratio if you use a more advanced but less common data compressor. Here are a few.
- Zstandard. According to my tests
zstd -7
compresses as fast as or faster thangzip -9
on a wide range of hardware with a better compression ratio.zstd -7 --long
results in an even better ratio, though it uses several times more RAM. Zstandard is mature, maintained, and increasingly widely deployed. I would use it for backups and long-term data archival (and I do!). - Long Range Zip.
lrzip -z -L 3
is almost as good asxz -9
on a large collection of JSON files but compresses 5x faster.
A shell script for comparing compressors
This script requires GNU time(1).
#! /bin/sh
file="$1"
shift
for cmd in "$@"; do
echo "== $cmd"
command time --format 'elapsed %E max %M' $cmd < "$file" \
| wc -c \
| awk '{ print $1 / 1024.0 / 1024 }'
done
Usage
$ ./compbench.sh dir/file.ext cat lz4 'gzip -9'
An MTGJSON test
The file AllPrintings.json
was version 4.6.3+20200501 and 194 MiB in size. I used zstd version 1.4.4 and lrzip version 0.631.
Results
Compressor | Compression ratio | Compressed size (MiB) | Elapsed time (wall clock) | Max resident set (MiB) |
---|---|---|---|---|
lz4 |
0.36 | 69.34 | 0:01.09 | 7.08 |
gzip -9 |
0.23 | 45.20 | 0:13.01 | 1.89 |
zstd -7 |
0.16 | 31.60 | 0:10.71 | 40.09 |
bzip2 -9 |
0.15 | 28.39 | 0:37.99 | 8.56 |
zstd -7 --long |
0.14 | 27.25 | 0:10.80 | 168.34 |
lrzip -z -L 3 |
0.12 | 23.19 | 0:40.41 | 342.72 |
xz -9 |
0.10 | 19.39 | 2:38.82 | 675.51 |
Tags: comparison, compression.