Better data compressors

Contents

Lesser-known compressors

You can gain in performance and/or compression ratio if you use a more advanced but less common data compressor. Here are a few.

  • Zstandard. According to my tests zstd -7 compresses as fast as or faster than gzip -9 on a wide range of hardware with a better compression ratio. zstd -7 --long results in an even better ratio, though it uses several times more RAM. Zstandard is mature, maintained, and increasingly widely deployed. I would use it for backups and long-term data archival (and I do!).
  • Long Range Zip. lrzip -z -L 3 is almost as good as xz -9 on a large collection of JSON files but compresses 5x faster.

A shell script for comparing compressors

#! /bin/sh

file="$1"
shift

for cmd in "$@"; do
    echo "== $cmd"

    command time --format 'elapsed %E max %M' $cmd < "$file" \
    | wc -c \
    | awk '{ print $1 / 1024.0 / 1024 }'
done

An MTGJSON test

The file AllPrintings.json was version 4.6.3+20200501 and 194 MiB in size. I used zstd version 1.4.4 and lrzip version 0.631.

Results

Compressor Compression ratio Compressed size (MiB) Elapsed time (wall clock) Max resident set (MiB)
lz4 0.36 69.34 0:01.09 7.08
gzip -9 0.23 45.20 0:13.01 1.89
zstd -7 0.16 31.60 0:10.71 40.09
bzip2 -9 0.15 28.39 0:37.99 8.56
zstd -7 --long 0.14 27.25 0:10.80 168.34
lrzip -z -L 3 0.12 23.19 0:40.41 342.72
xz -9 0.10 19.39 2:38.82 675.51

Tags: comparison, compression.