corresponding to the removal of workspace
which is needed while building huffman table
and is now either present in DCtx,
or temporarily borrowed from available FSE table space.
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.
There is now a single implementation, shared by both.
Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
for FSE symbols.
While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
which was not done properly by gcc 4.8
resulting in major performance difference.
ex :
zstd -b1 silesia.tar
before : dec 680 MB/s
after : dec 710 MB/s (without bmi2)
after : dec 770 MB/s (with DYNAMIC_BMI2)
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).
The methology followed seems broadly applicable to other BMI2 modules.
Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.
Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
The zstd format specification doesn't enforce that Huffman compressed
literals (including the table) have to be smaller than the uncompressed
literals. The compressor will never Huffman compress literals if the
compressed size is larger than the uncompressed size. The decompresser
doesn't accept Huffman compressed literals with 4 streams whose compressed
size is at least as large as the uncompressed size.
* Make the decompresser accept Huffman compressed literals whose size
increases.
* Add a test case that exposes the bug. The compressed file has to be
statically generated, since the compressor won't normally produce files
that expose the bug.