[Zlib-devel] DEFLATE performance improvements v1
Jim Kukunas
james.t.kukunas at linux.intel.com
Fri Dec 13 18:13:09 EST 2013
Hi Folks,
This patch series introduces a number of deflate performance improvements.
These improvements include two new deflate strategies, quick and medium,
as well as various improvements such as a faster hash function,
PCLMULQDQ-optimized CRC folding, and SSE2 hash shifting.
Changelog:
- General
- fixed CPUID check for 32-bit PIC
- removed trailing whitespaces from various files
- likely/unlikely attributes are now in zutil.h, and wrapped by a
check for GCC. They are now exposed as zlikely and zunlikely.
- explicit check for x86_64 architecture with -m32 CFLAGS to switch to
i*86 and for i*86 architectures with -m64 CFLAGS
- switch from uname -p to uname -m
- Deflate Quick Strategy:
- deflate_quick.c is now built separately from deflate.c, and is built
with -msse4
- changed the constraint in quick_insert_string() from "p" to "r", as
some versions of clang don't support "p"
- added a separate compare258 implementation for 32-bit PIC
- Deflate Medium Strategy:
- cleaned up some formating
- CRC folding
- intrinsics variables are no longer exposed in deflate.h. Instead,
crc registers are manually save/restored in the appropriate
crc_folding functions.
While rerunning the performance tests on this revision, we noticed,
especially for level 1, a significantly high variance among some of the results.
We made a few changes to the benchmark, to further minimize this noise, and
reran the results. As such, the results are revised to:
Compression Corpora: Calgary, normal and large, Canterbury, normal and large,
and Silesia
Processor: Intel i5-2540M @ 2.60 GHz
Compared with git commit 50893291621658f355bc5b4d450a8d06a563053d
Level 9, on average, is 23% (1.31x faster) with no change in compression.
Level 6, on average, is 43% (1.76x faster) with negligible change in
compression.
Level 1, on average, is 60% (2.43x faster) with a sacrifice of about 30%
compression.
The exact performance of a particular workload is very data dependant. For
example, the performance at level 1 for some files, such as pic and ptt5,
is 69% (3.19x faster).
As such, I've posted the full git tree (5089329 + these patches) to github, and
would appreciate if others would run their own benchmarks and post their
results.
https://github.com/jtkukunas/zlib.git
Thanks.
--
Jim Kukunas
Intel Open Source Technology Center
More information about the Zlib-devel
mailing list