[Zlib-devel] [1/8][RFC V3 Patch] Prepare Adler32.c

Jan Seiffert kaffeemonster at googlemail.com
Sun Apr 24 12:19:32 EDT 2011


This is a patch to prepare adler32.c for the things to come.

* add another variant of modulus function for Archs without divide (or
wide mul).
* rename MOD & MOD4 to reduce_full & reduce_x
* add a "simpler" reduce
* split the adler32 function into sub functions, now we can hook in
other functions for the large size adler32

* add a 64-Bit pseudo SIMD version
This code is for all the mips64, powerpc64 (without altivec), sparc64
an other 64 Bit processors.
But i would like to dedicate this code to Alpha which early versions
do not have instructions for byte wise memory access.
Some results:

Intel Core-i5-750
       -------- orig ------
               a: 0x0CB4B676, 10000 * 160000 bytes     t: 9000 ms
               a: 0x25BEB273, 10000 * 159999 bytes     t: 9100 ms
               a: 0x733CB174, 10000 * 159998 bytes     t: 9400 ms
               a: 0x1144AF76, 10000 * 159996 bytes     t: 9300 ms
               a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 9200 ms
               a: 0x1902A382, 10000 * 159984 bytes     t: 9800 ms
        -------- risc2 ------
               a: 0x0CB4B676, 10000 * 160000 bytes     t: 4000 ms
               a: 0x25BEB273, 10000 * 159999 bytes     t: 4000 ms
               a: 0x733CB174, 10000 * 159998 bytes     t: 4300 ms
               a: 0x1144AF76, 10000 * 159996 bytes     t: 4200 ms
               a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 4300 ms
               a: 0x1902A382, 10000 * 159984 bytes     t: 4200 ms
        speedup: 2.250000

Intel Xeon, Nocona
        -------- orig ------
               a: 0x0CB4B676, 10000 * 160000 bytes     t: 20000 ms
               a: 0x25BEB273, 10000 * 159999 bytes     t: 19100 ms
               a: 0x733CB174, 10000 * 159998 bytes     t: 21300 ms
               a: 0x1144AF76, 10000 * 159996 bytes     t: 25100 ms
               a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 19100 ms
               a: 0x1902A382, 10000 * 159984 bytes     t: 20000 ms
        -------- risc2 ------
               a: 0x0CB4B676, 10000 * 160000 bytes     t: 9900 ms
               a: 0x25BEB273, 10000 * 159999 bytes     t: 9900 ms
               a: 0x733CB174, 10000 * 159998 bytes     t: 10600 ms
               a: 0x1144AF76, 10000 * 159996 bytes     t: 9700 ms
               a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 10500 ms
               a: 0x1902A382, 10000 * 159984 bytes     t: 11000 ms
        speedup: 2.020202

some very old UltraSPARC with an old gcc 3.2 which creates not so greate loop
code
         -------- orig ------
                a: 0x0CB4B676, 10000 * 160000 bytes     t: 67100 ms
                a: 0x25BEB273, 10000 * 159999 bytes     t: 63500 ms
                a: 0x733CB174, 10000 * 159998 bytes     t: 63500 ms
                a: 0x1144AF76, 10000 * 159996 bytes     t: 67200 ms
                a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 67100 ms
                a: 0x1902A382, 10000 * 159984 bytes     t: 63500 ms
         -------- risc2 ------
                a: 0x0CB4B676, 10000 * 160000 bytes     t: 54300 ms
                a: 0x25BEB273, 10000 * 159999 bytes     t: 54600 ms
                a: 0x733CB174, 10000 * 159998 bytes     t: 54300 ms
                a: 0x1144AF76, 10000 * 159996 bytes     t: 54400 ms
                a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 54300 ms
                a: 0x1902A382, 10000 * 159984 bytes     t: 54300 ms
         speedup: 1.235727

an Alpha EV68 when no ByteWordeXtention is used
        -------- orig ------
                a: 0x0CB4B676, 10000 * 160000 bytes     t: 5008.384 ms
                a: 0x25BEB273, 10000 * 159999 bytes     t: 4202.496 ms
                a: 0x733CB174, 10000 * 159998 bytes     t: 4769.792 ms
                a: 0x1144AF76, 10000 * 159996 bytes     t: 4804.608 ms
                a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 5287.936 ms
                a: 0x1902A382, 10000 * 159984 bytes     t: 5074.944 ms
         -------- risc2 ------
                a: 0x0CB4B676, 10000 * 160000 bytes     t: 3879.936 ms
                a: 0x25BEB273, 10000 * 159999 bytes     t: 3879.936 ms
                a: 0x733CB174, 10000 * 159998 bytes     t: 3878.912 ms
                a: 0x1144AF76, 10000 * 159996 bytes     t: 3880.960 ms
                a: 0x3F4ECB8A, 10000 * 159992 bytes     t: 3876.864 ms
                a: 0x1902A382, 10000 * 159984 bytes     t: 3879.936 ms
         speedup: 1.290842
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 01-prepare.patch
Type: text/x-patch
Size: 12949 bytes
Desc: not available
URL: <http://madler.net/pipermail/zlib-devel_madler.net/attachments/20110424/767a21a2/attachment.bin>


More information about the Zlib-devel mailing list