<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div></div><br><div><div>Le 5 avr. 2013 à 15:23, Frédéric Kayser a écrit :</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hello,<div>I have noticed this small phrase in RF1951 page 14:</div><div><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><br></span></div><div><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">         The code length repeat codes can cross from HLIT + 257 to the</span></div><div><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">         HDIST + 1 code lengths.  In other words, all code lengths form</span></div><div><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">         a single sequence of HLIT + HDIST + 258 values.</span></div><div><br></div><div><br></div><div>Since Zlib builds the Huffman header using two separate lists/trees (Literals/End-of-Block/Lengths and Distances) and never reunite them:</div><div><br></div><div><pre><div class="line" id="LC806">in <span class="n">build_bl_tree</span><span class="p">(</span><span class="n">s</span><span class="p">)</span></div><div><br></div><div class="line" id="LC806">    <span class="cm">/* Determine the bit length frequencies for literal and distance trees */</span></div><div class="line" id="LC807">    <span class="n">scan_tree</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">ct_data</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="o">-></span><span class="n">dyn_ltree</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">l_desc</span><span class="p">.</span><span class="n">max_code</span><span class="p">);</span></div><div class="line" id="LC808">    <span class="n">scan_tree</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">ct_data</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="o">-></span><span class="n">dyn_dtree</span><span class="p">,</span> <span class="n">s</span><span class="o">-></span><span class="n">d_desc</span><span class="p">.</span><span class="n">max_code</span><span class="p">);</span></div></pre><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre;"><br></span></font></div><div><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">in send_all_trees</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">(</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">s</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">lcodes</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">dcodes</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">blcodes</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">)</span></span></div><div><br></div></div><div><pre><div class="line" id="LC855">    <span class="n">send_tree</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">ct_data</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="o">-></span><span class="n">dyn_ltree</span><span class="p">,</span> <span class="n">lcodes</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span> <span class="cm">/* literal tree */</span></div><div class="line" id="LC858">    <span class="n">send_tree</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">ct_data</span> <span class="o">*</span><span class="p">)</span><span class="n">s</span><span class="o">-></span><span class="n">dyn_dtree</span><span class="p">,</span> <span class="n">dcodes</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span> <span class="cm">/* distance tree */</span></div></pre><div><br></div><div>it misses a small and rare Huffman table optimization, for instance here are side by side dump snippets of the same file produced by Defdb, the first file has been compressed using Gzip, the second is optimized by Defluff:</div></div><div><br></div><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre;"> [_] 0xFF CL (val: 0)                           [_] 0xFF CL (val: 0)
 [3] EofB CL (val: 7)                           [3] EofB CL (val: 7)
 [3] l_00 CL (val: <u>5</u>) (length 3)                [3] l_00 CL (val: <u>5</u>) (length 3)
 [4] LREP <font class="Apple-style-span" color="#ff0000">(3 times)</font>                             [5] LREP <font class="Apple-style-span" color="#ff0000">(4 times)</font>
 [_] l_01 CL (val: <u>5</u>) (length 4)                [_] l_01 CL (val: <u>5</u>) (length 4)
 [_] l_02 CL (val: <u>5</u>) (length 5)                [_] l_02 CL (val: <u>5</u>) (length 5)
 [_] l_03 CL (val: <u>5</u>) (length 6)                [_] l_03 CL (val: <u>5</u>) (length 6)</span></font><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre;">
 <font class="Apple-style-span" color="#ff0000">[3]</font> d_00 CL (val: <u>5</u>) (distance 1)              <font class="Apple-style-span" color="#ff0000">[_]</font> d_00 CL (val: <u>5</u>) (distance 1)
 [4] d_01 CL (val: 1) (distance 2)              [5] d_01 CL (val: 1) (distance 2)
 [3] d_02 CL (val: 5) (distance 3)              [3] d_02 CL (val: 5) (distance 3)
 [4] d_03 CL (val: 0) (distance 4)              [4] d_03 CL (val: 0) (distance 4)
 [4] d_04 CL (val: 3) (distances [5-6])         [4] d_04 CL (val: 3) (distances [5-6])</span></font></div><div><br></div><div>Gzip finds only 4 repetitions of CL 5 recorded as a single CL length followed by a factor 3 repetition (the next CL5 is recorded as a single CL), whereas Defluff finds 5 repetitions (four on the Literals/End-of-Block/Lengths CL "side" and one on the Distances CL "side") recorded as a single CL length followed by a factor 4 repetition.</div><div><br></div><div>CL lists/trees as seen by Zlib:</div><div><br></div><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre; "> [_] 0xFF CL (val: 0)
 [3] EofB CL (val: 7)
 [3] l_00 CL (val: 5)
 [4] LREP (3 times)
 [_] l_01 CL (val: 5) (length 4)
 [_] l_02 CL (val: 5) (length 5)
 [_] l_03 CL (val: 5) (length 6)</span></font></div><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre; ">--- artificial border between Literals/End-of-Block/L</span></font><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">engths CL and Distances CL created by the two separate calls to scan_tree ---</span><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre; ">
 [3] d_00 CL (val: 5) (distance 1)
 [4] d_01 CL (val: 1) (distance 2)
 [3] d_02 CL (val: 5) (distance 3)
 [4] d_03 CL (val: 0) (distance 4)
 [4] d_04 CL (val: 3) (distances [5-6])</span></font></div><div><font class="Apple-style-span" face="monospace"><span class="Apple-style-span" style="white-space: pre; "><br></span></font></div><div><span class="Apple-style-span">Since there's apparently some tailroom in <span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">dyn_ltree</span> I may try to write a patch to introduce a new function </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">scan_trees</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">(</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">s</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">(</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">ct_data</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">*</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">)</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">s</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">-></span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">dyn_ltree</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">(</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">ct_data</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">*</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">)</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">s</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">-></span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">dyn_dtree</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">,</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "> l</span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">codes</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">-</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="mi">1, d</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="n">codes</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="o">-</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="mi">1</span></span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; "><span class="p">) </span></span>that will put <span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">dyn_dtree</span><span class="Apple-style-span"> right after </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">dyn_ltree</span><span class="Apple-style-span"> before the scan is actually performed this way it would catch repetitions crossing the two lists/trees.</span></div><div><span class="Apple-style-span"><br></span></div><div><span class="Apple-style-span">By the way they are other caveats in </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">scan_tree</span><span class="Apple-style-span"> for instance it handles repetitions in a greedy way. If the same CL repeats 10 times it will output: a single CL, a factor 7 repetition, a single CL and a single CL again (1+7+1+1) whereas it could emit a single CL a factor 6 repetition and a factor 3 repetition (1+6+3), I'm not saying that the second option is always better but currently </span><span class="Apple-style-span" style="font-family: monospace; white-space: pre; ">scan_tree</span><span class="Apple-style-span"> does not even try alternate options.</span></div><div><span class="Apple-style-span"><br></span></div><div><span class="Apple-style-span">Test files attached.</span></div><div><span class="Apple-style-span"></span></div></div><span><3rep_gzip.gz></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><span class="Apple-style-span"></span></div><div><span class="Apple-style-span"></span></div></div><span><4rep_defluff.gz></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><span class="Apple-style-span"></span></div><div><br></div><div>Best regards</div><div><span class="Apple-style-span">-- </span></div><div><span class="Apple-style-span">Frédéric Kayser</span></div></div>_______________________________________________<br>Zlib-devel mailing list<br><a href="mailto:Zlib-devel@madler.net">Zlib-devel@madler.net</a><br>http://mail.madler.net/mailman/listinfo/zlib-devel_madler.net<br></blockquote></div><br></body></html>