[Zlib-devel] [1/8][RFC V3 Patch] Prepare Adler32.c
Jan Seiffert
kaffeemonster at googlemail.com
Sun Apr 24 12:19:32 EDT 2011
This is a patch to prepare adler32.c for the things to come.
* add another variant of modulus function for Archs without divide (or
wide mul).
* rename MOD & MOD4 to reduce_full & reduce_x
* add a "simpler" reduce
* split the adler32 function into sub functions, now we can hook in
other functions for the large size adler32
* add a 64-Bit pseudo SIMD version
This code is for all the mips64, powerpc64 (without altivec), sparc64
an other 64 Bit processors.
But i would like to dedicate this code to Alpha which early versions
do not have instructions for byte wise memory access.
Some results:
Intel Core-i5-750
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 9000 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 9100 ms
a: 0x733CB174, 10000 * 159998 bytes t: 9400 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 9300 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 9200 ms
a: 0x1902A382, 10000 * 159984 bytes t: 9800 ms
-------- risc2 ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 4000 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 4000 ms
a: 0x733CB174, 10000 * 159998 bytes t: 4300 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 4200 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 4300 ms
a: 0x1902A382, 10000 * 159984 bytes t: 4200 ms
speedup: 2.250000
Intel Xeon, Nocona
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 20000 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 19100 ms
a: 0x733CB174, 10000 * 159998 bytes t: 21300 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 25100 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 19100 ms
a: 0x1902A382, 10000 * 159984 bytes t: 20000 ms
-------- risc2 ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 9900 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 9900 ms
a: 0x733CB174, 10000 * 159998 bytes t: 10600 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 9700 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 10500 ms
a: 0x1902A382, 10000 * 159984 bytes t: 11000 ms
speedup: 2.020202
some very old UltraSPARC with an old gcc 3.2 which creates not so greate loop
code
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 67100 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 63500 ms
a: 0x733CB174, 10000 * 159998 bytes t: 63500 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 67200 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 67100 ms
a: 0x1902A382, 10000 * 159984 bytes t: 63500 ms
-------- risc2 ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 54300 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 54600 ms
a: 0x733CB174, 10000 * 159998 bytes t: 54300 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 54400 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 54300 ms
a: 0x1902A382, 10000 * 159984 bytes t: 54300 ms
speedup: 1.235727
an Alpha EV68 when no ByteWordeXtention is used
-------- orig ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 5008.384 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 4202.496 ms
a: 0x733CB174, 10000 * 159998 bytes t: 4769.792 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 4804.608 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 5287.936 ms
a: 0x1902A382, 10000 * 159984 bytes t: 5074.944 ms
-------- risc2 ------
a: 0x0CB4B676, 10000 * 160000 bytes t: 3879.936 ms
a: 0x25BEB273, 10000 * 159999 bytes t: 3879.936 ms
a: 0x733CB174, 10000 * 159998 bytes t: 3878.912 ms
a: 0x1144AF76, 10000 * 159996 bytes t: 3880.960 ms
a: 0x3F4ECB8A, 10000 * 159992 bytes t: 3876.864 ms
a: 0x1902A382, 10000 * 159984 bytes t: 3879.936 ms
speedup: 1.290842
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01-prepare.patch
Type: text/x-patch
Size: 12949 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20110424/767a21a2/attachment.bin>
More information about the Zlib-devel
mailing list