updated doc

dev
Yann Collet 2016-07-08 10:42:59 +02:00
parent aa2628da30
commit 26f681451f
2 changed files with 30 additions and 16 deletions

View File

@ -45,14 +45,19 @@ It is used by `zstd` command line utility, and [7zip plugin](http://mcmilk.de/pr
- compress/zbuff_compress.c
- decompress/zbuff_decompress.c
#### Dictionary builder
To create dictionaries from training sets :
In order to create dictionaries from some training sets,
it's needed to include all files from [dictBuilder directory](dictBuilder/)
#### Legacy support
Zstandard can decode previous formats, starting from v0.1.
Support for these format is provided in [folder legacy](legacy/).
It's also required to compile the library with `ZSTD_LEGACY_SUPPORT = 1`.
- dictBuilder/divsufsort.c
- dictBuilder/divsufsort.h
- dictBuilder/zdict.c
- dictBuilder/zdict.h
#### Miscellaneous

View File

@ -565,37 +565,46 @@ which tells how to decode the list of weights.
| Nb of 1s | 1 | 2 | 3 | 4 | 7 | 8 | 15| 16| 31| 32| 63| 64|127|128|
|Complement| 1 | 2 | 1 | 4 | 1 | 8 | 1 | 16| 1 | 32| 1 | 64| 1 |128|
_Note_ : complement is by using the "join to nearest power of 2" rule.
_Note_ : complement is found by using "join to nearest power of 2" rule.
- if headerByte >= 128 : this is a direct representation,
where each weight is written directly as a 4 bits field (0-15).
The full representation occupies `((nbSymbols+1)/2)` bytes,
meaning it uses a last full byte even if nbSymbols is odd.
`nbSymbols = headerByte - 127;`
`nbSymbols = headerByte - 127;`.
Note that maximum nbSymbols is 241-127 = 114.
A larger serie must necessarily use FSE compression.
- if headerByte < 128 :
the serie of weights is compressed by FSE.
The length of the compressed serie is `headerByte` (0-127).
The length of the FSE-compressed serie is `headerByte` (0-127).
##### FSE (Finite State Entropy) compression of huffman weights
The serie of weights is compressed using standard FSE compression.
The serie of weights is compressed using FSE compression.
It's a single bitstream with 2 interleaved states,
using a single distribution table.
sharing a single distribution table.
To decode an FSE bitstream, it is necessary to know its compressed size.
Compressed size is provided by `headerByte`.
It's also necessary to know its maximum decompressed size.
In this case, it's `255`, since literal values range from `0` to `255`,
It's also necessary to know its maximum decompressed size,
which is `255`, since literal values span from `0` to `255`,
and last symbol value is not represented.
An FSE bitstream starts by a header, describing probabilities distribution.
It will create a Decoding Table.
It is necessary to know the maximum accuracy of distribution
to properly allocate space for the Table.
For a list of huffman weights, this maximum is 7 bits.
Table must be pre-allocated, which requires to support a maximum accuracy.
For a list of huffman weights, recommended maximum is 7 bits.
FSE header is [described in relevant chapter](#fse-distribution-table--condensed-format),
and so is [FSE bitstream](#bitstream).
The main difference is that Huffman header compression uses 2 states,
which share the same FSE distribution table.
Bitstream contains only FSE symbols, there are no interleaved "raw bitfields".
The number of symbols to decode is discovered
by tracking bitStream overflow condition.
When both states have overflowed the bitstream, end is reached.
FSE header and bitstreams are described in a separated chapter.
##### Conversion from weights to huffman prefix codes