When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.
There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.
btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.
This fixes btultra2 performance issue on very small input.
Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.
We pre-hash the ptr for the dict match state sometimes. When that actually
happens, a hashlog of 0 can produce undefined behavior (right shift a long
long by 64). Only applies to unoptimized compilations, since when
optimizations are applied, those hash operations are dropped when we're not
actually in dms mode.
removed "cached" structure.
prices are now saved in the optimal table.
Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.
It translates into a very small compression ratio gain :
enwik7, level 20:
blocks 4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726
On the other hand, the cpu cost is doubled.
The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.
Consequently, bit-fractional evaluation is only turned on for btultra.