Merge pull request #302 from inikep/Documentation
improved documentation
This commit is contained in:
commit
baf7ecbdfd
@ -24,9 +24,9 @@
|
|||||||
# zstd : Command Line Utility, supporting gzip-like arguments
|
# zstd : Command Line Utility, supporting gzip-like arguments
|
||||||
# zstd32 : Same as zstd, but forced to compile in 32-bits mode
|
# zstd32 : Same as zstd, but forced to compile in 32-bits mode
|
||||||
# zstd_nolegacy : zstd without support of decompression of legacy versions
|
# zstd_nolegacy : zstd without support of decompression of legacy versions
|
||||||
# zstd-small: minimal zstd without dictBuilder and bench
|
# zstd-small : minimal zstd without dictionary builder and benchmark
|
||||||
# zstd-compress: compressor-only version of zstd
|
# zstd-compress : compressor-only version of zstd
|
||||||
# zstd-decompress: decompressor-only version of zstd
|
# zstd-decompress : decompressor-only version of zstd
|
||||||
# ##########################################################################
|
# ##########################################################################
|
||||||
|
|
||||||
DESTDIR?=
|
DESTDIR?=
|
||||||
|
94
programs/README.md
Normal file
94
programs/README.md
Normal file
@ -0,0 +1,94 @@
|
|||||||
|
zstd - Command Line Interface
|
||||||
|
================================
|
||||||
|
|
||||||
|
Command Line Interface (CLI) can be created using the `make` command without any additional parameters.
|
||||||
|
There are however other Makefile targets that create different variations of CLI:
|
||||||
|
- `zstd` : default CLI supporting gzip-like arguments; includes dictionary builder, benchmark, and support for decompression of legacy zstd versions
|
||||||
|
- `zstd32` : Same as `zstd`, but forced to compile in 32-bits mode
|
||||||
|
- `zstd_nolegacy` : Same as `zstd` except of support for decompression of legacy zstd versions
|
||||||
|
- `zstd-small` : CLI optimized for minimal size; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
|
||||||
|
- `zstd-compress` : compressor-only version of CLI; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
|
||||||
|
- `zstd-decompress` : decompressor-only version of CLI; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
|
||||||
|
|
||||||
|
|
||||||
|
#### Aggregation of parameters
|
||||||
|
CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`.
|
||||||
|
|
||||||
|
|
||||||
|
#### Dictionary builder in Command Line Interface
|
||||||
|
Zstd offers a training mode, which can be used to tune the algorithm for a selected
|
||||||
|
type of data, by providing it with a few samples. The result of the training is stored
|
||||||
|
in a file selected with the `-o` option (default name is `dictionary`),
|
||||||
|
which can be loaded before compression and decompression.
|
||||||
|
|
||||||
|
Using a dictionary, the compression ratio achievable on small data improves dramatically.
|
||||||
|
These compression gains are achieved while simultaneously providing faster compression and decompression speeds.
|
||||||
|
Dictionary work if there is some correlation in a family of small data (there is no universal dictionary).
|
||||||
|
Hence, deploying one dictionary per type of data will provide the greater benefits.
|
||||||
|
Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm
|
||||||
|
will rely more and more on previously decoded content to compress the rest of the file.
|
||||||
|
|
||||||
|
Usage of the dictionary builder and created dictionaries with CLI:
|
||||||
|
|
||||||
|
1. Create the dictionary : `zstd --train FullPathToTrainingSet/* -o dictionaryName`
|
||||||
|
2. Compress with the dictionary: `zstd FILE -D dictionaryName`
|
||||||
|
3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Benchmark in Command Line Interface
|
||||||
|
CLI includes in-memory compression benchmark module for zstd.
|
||||||
|
The benchmark is conducted using given filenames. The files are read into memory and joined together.
|
||||||
|
It makes benchmark more precise as it eliminates I/O overhead.
|
||||||
|
Many filenames can be supplied as multiple parameters, parameters with wildcards or
|
||||||
|
names of directories can be used as parameters with the `-r` option.
|
||||||
|
|
||||||
|
The benchmark measures ratio, compressed size, compression and decompression speed.
|
||||||
|
One can select compression levels starting from `-b` and ending with `-e`.
|
||||||
|
The `-i` parameter selects minimal time used for each of tested levels.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Usage of Command Line Interface
|
||||||
|
The full list of options can be obtained with `-h` or `-H` parameter:
|
||||||
|
```
|
||||||
|
Usage :
|
||||||
|
zstd [args] [FILE(s)] [-o file]
|
||||||
|
|
||||||
|
FILE : a filename
|
||||||
|
with no FILE, or when FILE is - , read standard input
|
||||||
|
Arguments :
|
||||||
|
-# : # compression level (1-19, default:3)
|
||||||
|
-d : decompression
|
||||||
|
-D file: use `file` as Dictionary
|
||||||
|
-o file: result stored into `file` (only if 1 input file)
|
||||||
|
-f : overwrite output without prompting
|
||||||
|
--rm : remove source file(s) after successful de/compression
|
||||||
|
-k : preserve source file(s) (default)
|
||||||
|
-h/-H : display help/long help and exit
|
||||||
|
|
||||||
|
Advanced arguments :
|
||||||
|
-V : display Version number and exit
|
||||||
|
-v : verbose mode; specify multiple times to increase log level (default:2)
|
||||||
|
-q : suppress warnings; specify twice to suppress errors too
|
||||||
|
-c : force write to standard output, even if it is the console
|
||||||
|
-r : operate recursively on directories
|
||||||
|
--ultra : enable levels beyond 19, up to 22 (requires more memory)
|
||||||
|
--no-dictID : don't write dictID into header (dictionary compression)
|
||||||
|
--[no-]check : integrity check (default:enabled)
|
||||||
|
--test : test compressed file integrity
|
||||||
|
--[no-]sparse : sparse mode (default:enabled on file, disabled on stdout)
|
||||||
|
|
||||||
|
Dictionary builder :
|
||||||
|
--train ## : create a dictionary from a training set of files
|
||||||
|
-o file : `file` is dictionary name (default: dictionary)
|
||||||
|
--maxdict ## : limit dictionary to specified size (default : 112640)
|
||||||
|
-s# : dictionary selectivity level (default: 9)
|
||||||
|
--dictID ## : force dictionary ID to specified value (default: random)
|
||||||
|
|
||||||
|
Benchmark arguments :
|
||||||
|
-b# : benchmark file(s), using # compression level (default : 1)
|
||||||
|
-e# : test all compression levels from -bX to # (default: 1)
|
||||||
|
-i# : minimum evaluation time in seconds (default : 3s)
|
||||||
|
-B# : cut file into independent blocks of size # (default: no block)
|
||||||
|
```
|
@ -4,7 +4,20 @@ projects for various integrated development environments (IDE)
|
|||||||
#### Included projects
|
#### Included projects
|
||||||
|
|
||||||
The following projects are included with the zstd distribution:
|
The following projects are included with the zstd distribution:
|
||||||
- cmake - CMake project contributed by Artyom Dymchenko
|
- `cmake` - CMake project contributed by Artyom Dymchenko
|
||||||
- VS2008 - Visual Studio 2008 project
|
- `VS2005` - Visual Studio 2005 project
|
||||||
- VS2010 - Visual Studio 2010 project (which also works well with Visual Studio 2012, 2013, 2015)
|
- `VS2008` - Visual Studio 2008 project
|
||||||
- build - command line scripts prepared for Visual Studio compilation without IDE
|
- `VS2010` - Visual Studio 2010 project (which also works well with Visual Studio 2012, 2013, 2015)
|
||||||
|
- `build` - command line scripts prepared for Visual Studio compilation without IDE
|
||||||
|
|
||||||
|
|
||||||
|
#### How to compile zstd with Visual Studio
|
||||||
|
|
||||||
|
1. Install Visual Studio e.g. VS 2015 Community Edition (it's free).
|
||||||
|
2. Download the latest version of zstd from https://github.com/Cyan4973/zstd/releases
|
||||||
|
3. Decompress ZIP archive.
|
||||||
|
4. Go to decompressed directory then to `projects` then `VS2010` and open `zstd.sln`
|
||||||
|
5. Visual Studio will ask about converting VS2010 project to VS2015 and you should agree.
|
||||||
|
6. Change `Debug` to `Release` and if you have 64-bit Windows change also `Win32` to `x64`.
|
||||||
|
7. Press F7 on keyboard or select `BUILD` from the menu bar and choose `Build Solution`.
|
||||||
|
8. If compilation will be fine a compiled executable will be in `projects\VS2010\bin\x64\Release\zstd.exe`
|
||||||
|
@ -22,13 +22,17 @@
|
|||||||
# - zstd homepage : http://www.zstd.net/
|
# - zstd homepage : http://www.zstd.net/
|
||||||
# ##########################################################################
|
# ##########################################################################
|
||||||
# datagen : Synthetic and parametrable data generator, for tests
|
# datagen : Synthetic and parametrable data generator, for tests
|
||||||
|
# fullbench : Precisely measure speed for each zstd inner functions
|
||||||
|
# fullbench32: Same as fullbench, but forced to compile in 32-bits mode
|
||||||
# fuzzer : Test tool, to check zstd integrity on target platform
|
# fuzzer : Test tool, to check zstd integrity on target platform
|
||||||
# fuzzer32: Same as fuzzer, but forced to compile in 32-bits mode
|
# fuzzer32: Same as fuzzer, but forced to compile in 32-bits mode
|
||||||
|
# paramgrill : parameter tester for zstd
|
||||||
|
# test-zstd-speed.py : script for testing zstd speed difference between commits
|
||||||
|
# versionsTest : compatibility test between zstd versions stored on Github (v0.1+)
|
||||||
# zbufftest : Test tool, to check ZBUFF integrity on target platform
|
# zbufftest : Test tool, to check ZBUFF integrity on target platform
|
||||||
# zbufftest32: Same as zbufftest, but forced to compile in 32-bits mode
|
# zbufftest32: Same as zbufftest, but forced to compile in 32-bits mode
|
||||||
# fullbench : Precisely measure speed for each zstd inner function
|
# zstreamtest : Fuzzer test tool for zstd streaming API
|
||||||
# fullbench32: Same as fullbench, but forced to compile in 32-bits mode
|
# zbufftest32: Same as zstreamtest, but forced to compile in 32-bits mode
|
||||||
# versionstest : Compatibility test between zstd versions stored on Github (v0.1+)
|
|
||||||
# ##########################################################################
|
# ##########################################################################
|
||||||
|
|
||||||
DESTDIR?=
|
DESTDIR?=
|
||||||
|
@ -1,14 +1,25 @@
|
|||||||
scripts for automated testing of zstd
|
programs and scripts for automated testing of zstd
|
||||||
================================
|
================================
|
||||||
|
|
||||||
#### test-zstd-versions.py - script for testing zstd interoperability between versions
|
This directory contains the following programs and scripts:
|
||||||
|
- `datagen` : Synthetic and parametrable data generator, for tests
|
||||||
|
- `fullbench` : Precisely measure speed for each zstd inner functions
|
||||||
|
- `fuzzer` : Test tool, to check zstd integrity on target platform
|
||||||
|
- `paramgrill` : parameter tester for zstd
|
||||||
|
- `test-zstd-speed.py` : script for testing zstd speed difference between commits
|
||||||
|
- `test-zstd-versions.py` : compatibility test between zstd versions stored on Github (v0.1+)
|
||||||
|
- `zbufftest` : Test tool to check ZBUFF (a buffered streaming API) integrity
|
||||||
|
- `zstreamtest` : Fuzzer test tool for zstd streaming API
|
||||||
|
|
||||||
|
|
||||||
|
#### `test-zstd-versions.py` - script for testing zstd interoperability between versions
|
||||||
|
|
||||||
This script creates `versionsTest` directory to which zstd repository is cloned.
|
This script creates `versionsTest` directory to which zstd repository is cloned.
|
||||||
Then all taged (released) versions of zstd are compiled.
|
Then all taged (released) versions of zstd are compiled.
|
||||||
In the following step interoperability between zstd versions is checked.
|
In the following step interoperability between zstd versions is checked.
|
||||||
|
|
||||||
|
|
||||||
#### test-zstd-speed.py - script for testing zstd speed difference between commits
|
#### `test-zstd-speed.py` - script for testing zstd speed difference between commits
|
||||||
|
|
||||||
This script creates `speedTest` directory to which zstd repository is cloned.
|
This script creates `speedTest` directory to which zstd repository is cloned.
|
||||||
Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the `testFileNames` parameter).
|
Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the `testFileNames` parameter).
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
Fuzzer test tool for zstd_buffered
|
Fuzzer test tool for ZBUFF - a buffered streaming API for ZSTD
|
||||||
Copyright (C) Yann Collet 2015-2016
|
Copyright (C) Yann Collet 2015-2016
|
||||||
|
|
||||||
GPL v2 License
|
GPL v2 License
|
||||||
|
@ -97,6 +97,42 @@ to decode all concatenated frames in their sequential order,
|
|||||||
delivering the final decompressed result as if it was a single content.
|
delivering the final decompressed result as if it was a single content.
|
||||||
|
|
||||||
|
|
||||||
|
Skippable Frames
|
||||||
|
----------------
|
||||||
|
|
||||||
|
| `Magic_Number` | `Frame_Size` | `User_Data` |
|
||||||
|
|:--------------:|:------------:|:-----------:|
|
||||||
|
| 4 bytes | 4 bytes | n bytes |
|
||||||
|
|
||||||
|
Skippable frames allow the insertion of user-defined data
|
||||||
|
into a flow of concatenated frames.
|
||||||
|
Its design is pretty straightforward,
|
||||||
|
with the sole objective to allow the decoder to quickly skip
|
||||||
|
over user-defined data and continue decoding.
|
||||||
|
|
||||||
|
Skippable frames defined in this specification are compatible with [LZ4] ones.
|
||||||
|
|
||||||
|
[LZ4]:http://www.lz4.org
|
||||||
|
|
||||||
|
__`Magic_Number`__
|
||||||
|
|
||||||
|
4 Bytes, little-endian format.
|
||||||
|
Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
|
||||||
|
All 16 values are valid to identify a skippable frame.
|
||||||
|
|
||||||
|
__`Frame_Size`__
|
||||||
|
|
||||||
|
This is the size, in bytes, of the following `User_Data`
|
||||||
|
(without including the magic number nor the size field itself).
|
||||||
|
This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
|
||||||
|
This means `User_Data` can’t be bigger than (2^32-1) bytes.
|
||||||
|
|
||||||
|
__`User_Data`__
|
||||||
|
|
||||||
|
The `User_Data` can be anything. Data will just be skipped by the decoder.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
General Structure of Zstandard Frame format
|
General Structure of Zstandard Frame format
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
The structure of a single Zstandard frame is following:
|
The structure of a single Zstandard frame is following:
|
||||||
@ -163,9 +199,9 @@ The `Flag_Value` can be converted into `Field_Size`,
|
|||||||
which is the number of bytes used by `Frame_Content_Size`
|
which is the number of bytes used by `Frame_Content_Size`
|
||||||
according to the following table:
|
according to the following table:
|
||||||
|
|
||||||
|`Flag_Value`| 0 | 1 | 2 | 3 |
|
|`Flag_Value`| 0 | 1 | 2 | 3 |
|
||||||
| ---------- | --- | --- | --- | --- |
|
| ---------- | ------ | --- | --- | --- |
|
||||||
|`Field_Size`| 0-1 | 2 | 4 | 8 |
|
|`Field_Size`| 0 or 1 | 2 | 4 | 8 |
|
||||||
|
|
||||||
When `Flag_Value` is `0`, `Field_Size` depends on `Single_Segment_flag` :
|
When `Flag_Value` is `0`, `Field_Size` depends on `Single_Segment_flag` :
|
||||||
if `Single_Segment_flag` is set, `Field_Size` is 1.
|
if `Single_Segment_flag` is set, `Field_Size` is 1.
|
||||||
@ -235,7 +271,7 @@ which can be any value from 1 to 2^64-1 bytes (16 EB).
|
|||||||
| ----------- | ---------- | ---------- |
|
| ----------- | ---------- | ---------- |
|
||||||
| Field name | `Exponent` | `Mantissa` |
|
| Field name | `Exponent` | `Mantissa` |
|
||||||
|
|
||||||
Maximum distance is given by the following formulae :
|
Maximum distance is given by the following formulas :
|
||||||
```
|
```
|
||||||
windowLog = 10 + Exponent;
|
windowLog = 10 + Exponent;
|
||||||
windowBase = 1 << windowLog;
|
windowBase = 1 << windowLog;
|
||||||
@ -361,40 +397,6 @@ up to `Block_Maximum_Decompressed_Size`, which is the smallest of :
|
|||||||
- 128 KB
|
- 128 KB
|
||||||
|
|
||||||
|
|
||||||
Skippable Frames
|
|
||||||
----------------
|
|
||||||
|
|
||||||
| `Magic_Number` | `Frame_Size` | `User_Data` |
|
|
||||||
|:--------------:|:------------:|:-----------:|
|
|
||||||
| 4 bytes | 4 bytes | n bytes |
|
|
||||||
|
|
||||||
Skippable frames allow the insertion of user-defined data
|
|
||||||
into a flow of concatenated frames.
|
|
||||||
Its design is pretty straightforward,
|
|
||||||
with the sole objective to allow the decoder to quickly skip
|
|
||||||
over user-defined data and continue decoding.
|
|
||||||
|
|
||||||
Skippable frames defined in this specification are compatible with [LZ4] ones.
|
|
||||||
|
|
||||||
[LZ4]:http://www.lz4.org
|
|
||||||
|
|
||||||
__`Magic_Number`__
|
|
||||||
|
|
||||||
4 Bytes, little-endian format.
|
|
||||||
Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
|
|
||||||
All 16 values are valid to identify a skippable frame.
|
|
||||||
|
|
||||||
__`Frame_Size`__
|
|
||||||
|
|
||||||
This is the size, in bytes, of the following `User_Data`
|
|
||||||
(without including the magic number nor the size field itself).
|
|
||||||
This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
|
|
||||||
This means `User_Data` can’t be bigger than (2^32-1) bytes.
|
|
||||||
|
|
||||||
__`User_Data`__
|
|
||||||
|
|
||||||
The `User_Data` can be anything. Data will just be skipped by the decoder.
|
|
||||||
|
|
||||||
|
|
||||||
The format of `Compressed_Block`
|
The format of `Compressed_Block`
|
||||||
--------------------------------
|
--------------------------------
|
||||||
@ -413,7 +415,7 @@ To decode a compressed block, the following elements are necessary :
|
|||||||
or all previous blocks when `Single_Segment_flag` is set.
|
or all previous blocks when `Single_Segment_flag` is set.
|
||||||
- List of "recent offsets" from previous compressed block.
|
- List of "recent offsets" from previous compressed block.
|
||||||
- Decoding tables of previous compressed block for each symbol type
|
- Decoding tables of previous compressed block for each symbol type
|
||||||
(literals, litLength, matchLength, offset).
|
(literals, literals lengths, match lengths, offsets).
|
||||||
|
|
||||||
|
|
||||||
### `Literals_Section`
|
### `Literals_Section`
|
||||||
@ -447,9 +449,12 @@ __`Literals_Block_Type`__
|
|||||||
|
|
||||||
This field uses 2 lowest bits of first byte, describing 4 different block types :
|
This field uses 2 lowest bits of first byte, describing 4 different block types :
|
||||||
|
|
||||||
| Value | 0 | 1 | 2 | 3 |
|
| `Literals_Block_Type` | Value |
|
||||||
| --------------------- | -------------------- | -------------------- | --------------------------- | ----------------------------- |
|
| ----------------------------- | ----- |
|
||||||
| `Literals_Block_Type` | `Raw_Literals_Block` | `RLE_Literals_Block` | `Compressed_Literals_Block` | `Repeat_Stats_Literals_Block` |
|
| `Raw_Literals_Block` | 0 |
|
||||||
|
| `RLE_Literals_Block` | 1 |
|
||||||
|
| `Compressed_Literals_Block` | 2 |
|
||||||
|
| `Repeat_Stats_Literals_Block` | 3 |
|
||||||
|
|
||||||
- `Raw_Literals_Block` - Literals are stored uncompressed.
|
- `Raw_Literals_Block` - Literals are stored uncompressed.
|
||||||
- `RLE_Literals_Block` - Literals consist of a single byte value repeated N times.
|
- `RLE_Literals_Block` - Literals consist of a single byte value repeated N times.
|
||||||
@ -466,37 +471,37 @@ __`Size_Format`__
|
|||||||
|
|
||||||
- For `Compressed_Block`, it requires to decode both `Compressed_Size`
|
- For `Compressed_Block`, it requires to decode both `Compressed_Size`
|
||||||
and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
|
and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
|
||||||
- For `Raw_Block` and `RLE_Block` it's enough to decode `Regenerated_Size`.
|
- For `Raw_Literals_Block` and `RLE_Literals_Block` it's enough to decode `Regenerated_Size`.
|
||||||
|
|
||||||
For values spanning several bytes, convention is little-endian.
|
For values spanning several bytes, convention is little-endian.
|
||||||
|
|
||||||
__`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :
|
__`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :
|
||||||
|
|
||||||
- Value : x0 : `Regenerated_Size` uses 5 bits (0-31).
|
- Value x0 : `Regenerated_Size` uses 5 bits (0-31).
|
||||||
`Literals_Section_Header` has 1 byte.
|
`Literals_Section_Header` has 1 byte.
|
||||||
`Regenerated_Size = Header[0]>>3`
|
`Regenerated_Size = Header[0]>>3`
|
||||||
- Value : 01 : `Regenerated_Size` uses 12 bits (0-4095).
|
- Value 01 : `Regenerated_Size` uses 12 bits (0-4095).
|
||||||
`Literals_Section_Header` has 2 bytes.
|
`Literals_Section_Header` has 2 bytes.
|
||||||
`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4)`
|
`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4)`
|
||||||
- Value : 11 : `Regenerated_Size` uses 20 bits (0-1048575).
|
- Value 11 : `Regenerated_Size` uses 20 bits (0-1048575).
|
||||||
`Literals_Section_Header` has 3 bytes.
|
`Literals_Section_Header` has 3 bytes.
|
||||||
`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4) + (Header[2]<<12)`
|
`Regenerated_Size = (Header[0]>>4) + (Header[1]<<4) + (Header[2]<<12)`
|
||||||
|
|
||||||
Note : it's allowed to represent a short value (ex : `13`)
|
Note : it's allowed to represent a short value (for example `13`)
|
||||||
using a long format, accepting the reduced compacity.
|
using a long format, accepting the increased compressed data size.
|
||||||
|
|
||||||
__`Size_Format` for `Compressed_Literals_Block` and `Repeat_Stats_Literals_Block`__ :
|
__`Size_Format` for `Compressed_Literals_Block` and `Repeat_Stats_Literals_Block`__ :
|
||||||
|
|
||||||
- Value : 00 : _Single stream_.
|
- Value 00 : _A single stream_.
|
||||||
Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
|
Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
|
||||||
`Literals_Section_Header` has 3 bytes.
|
`Literals_Section_Header` has 3 bytes.
|
||||||
- Value : 01 : 4 streams.
|
- Value 01 : 4 streams.
|
||||||
Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
|
Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
|
||||||
`Literals_Section_Header` has 3 bytes.
|
`Literals_Section_Header` has 3 bytes.
|
||||||
- Value : 10 : 4 streams.
|
- Value 10 : 4 streams.
|
||||||
Both `Compressed_Size` and `Regenerated_Size` use 14 bits (0-16383).
|
Both `Compressed_Size` and `Regenerated_Size` use 14 bits (0-16383).
|
||||||
`Literals_Section_Header` has 4 bytes.
|
`Literals_Section_Header` has 4 bytes.
|
||||||
- Value : 11 : 4 streams.
|
- Value 11 : 4 streams.
|
||||||
Both `Compressed_Size` and `Regenerated_Size` use 18 bits (0-262143).
|
Both `Compressed_Size` and `Regenerated_Size` use 18 bits (0-262143).
|
||||||
`Literals_Section_Header` has 5 bytes.
|
`Literals_Section_Header` has 5 bytes.
|
||||||
|
|
||||||
@ -505,7 +510,7 @@ Both `Compressed_Size` and `Regenerated_Size` fields follow little-endian conven
|
|||||||
|
|
||||||
#### `Huffman_Tree_Description`
|
#### `Huffman_Tree_Description`
|
||||||
|
|
||||||
This section is only present when `Literals_Block_Type` type is `Compressed_Block` (`2`).
|
This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).
|
||||||
|
|
||||||
Prefix coding represents symbols from an a priori known alphabet
|
Prefix coding represents symbols from an a priori known alphabet
|
||||||
by bit sequences (codewords), one codeword for each symbol,
|
by bit sequences (codewords), one codeword for each symbol,
|
||||||
@ -527,9 +532,11 @@ This specification limits maximum code length to 11 bits.
|
|||||||
##### Representation
|
##### Representation
|
||||||
|
|
||||||
All literal values from zero (included) to last present one (excluded)
|
All literal values from zero (included) to last present one (excluded)
|
||||||
are represented by `Weight` values, from 0 to `Max_Number_of_Bits`.
|
are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
|
||||||
Transformation from `Weight` to `Number_of_Bits` follows this formulae :
|
Transformation from `Weight` to `Number_of_Bits` follows this formula :
|
||||||
`Number_of_Bits = Weight ? Max_Number_of_Bits + 1 - Weight : 0` .
|
```
|
||||||
|
Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
|
||||||
|
```
|
||||||
The last symbol's `Weight` is deduced from previously decoded ones,
|
The last symbol's `Weight` is deduced from previously decoded ones,
|
||||||
by completing to the nearest power of 2.
|
by completing to the nearest power of 2.
|
||||||
This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
|
This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
|
||||||
@ -544,7 +551,10 @@ Let's presume the following Huffman tree must be described :
|
|||||||
The tree depth is 4, since its smallest element uses 4 bits.
|
The tree depth is 4, since its smallest element uses 4 bits.
|
||||||
Value `5` will not be listed, nor will values above `5`.
|
Value `5` will not be listed, nor will values above `5`.
|
||||||
Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
|
Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
|
||||||
Weight formula is : `Weight = Number_of_Bits ? Max_Number_of_Bits + 1 - Number_of_Bits : 0`.
|
Weight formula is :
|
||||||
|
```
|
||||||
|
Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
|
||||||
|
```
|
||||||
It gives the following serie of weights :
|
It gives the following serie of weights :
|
||||||
|
|
||||||
| `Weight` | 4 | 3 | 2 | 0 | 1 |
|
| `Weight` | 4 | 3 | 2 | 0 | 1 |
|
||||||
@ -575,9 +585,9 @@ which tells how to decode the list of weights.
|
|||||||
|
|
||||||
- if `headerByte` < 128 :
|
- if `headerByte` < 128 :
|
||||||
the serie of weights is compressed by FSE.
|
the serie of weights is compressed by FSE.
|
||||||
The length of the FSE-compressed serie is `headerByte` (0-127).
|
The length of the FSE-compressed serie is equal to `headerByte` (0-127).
|
||||||
|
|
||||||
##### FSE (Finite State Entropy) compression of Huffman weights
|
##### Finite State Entropy (FSE) compression of Huffman weights
|
||||||
|
|
||||||
The serie of weights is compressed using FSE compression.
|
The serie of weights is compressed using FSE compression.
|
||||||
It's a single bitstream with 2 interleaved states,
|
It's a single bitstream with 2 interleaved states,
|
||||||
@ -607,9 +617,10 @@ When both states have overflowed the bitstream, end is reached.
|
|||||||
##### Conversion from weights to Huffman prefix codes
|
##### Conversion from weights to Huffman prefix codes
|
||||||
|
|
||||||
All present symbols shall now have a `Weight` value.
|
All present symbols shall now have a `Weight` value.
|
||||||
It is possible to transform weights into Number_of_Bits, using this formula :
|
It is possible to transform weights into Number_of_Bits, using this formula:
|
||||||
`Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0` .
|
```
|
||||||
|
Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0
|
||||||
|
```
|
||||||
Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
|
Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
|
||||||
Symbols with a `Weight` of zero are removed.
|
Symbols with a `Weight` of zero are removed.
|
||||||
Then, starting from lowest weight, prefix codes are distributed in order.
|
Then, starting from lowest weight, prefix codes are distributed in order.
|
||||||
@ -631,21 +642,21 @@ it gives the following distribution :
|
|||||||
| prefix codes | N/A | 0000| 0001| 001 | 01 | 1 |
|
| prefix codes | N/A | 0000| 0001| 001 | 01 | 1 |
|
||||||
|
|
||||||
|
|
||||||
#### Literals bitstreams
|
#### The content of Huffman-compressed literal stream
|
||||||
|
|
||||||
##### Bitstreams sizes
|
##### Bitstreams sizes
|
||||||
|
|
||||||
As seen in a previous paragraph,
|
As seen in a previous paragraph,
|
||||||
there are 2 flavors of Huffman-compressed literals :
|
there are 2 types of Huffman-compressed literals :
|
||||||
single stream, and 4-streams.
|
a single stream and 4 streams.
|
||||||
|
|
||||||
4-streams is useful for CPU with multiple execution units and out-of-order operations.
|
Encoding using 4 streams is useful for CPU with multiple execution units and out-of-order operations.
|
||||||
Since each stream can be decoded independently,
|
Since each stream can be decoded independently,
|
||||||
it's possible to decode them up to 4x faster than a single stream,
|
it's possible to decode them up to 4x faster than a single stream,
|
||||||
presuming the CPU has enough parallelism available.
|
presuming the CPU has enough parallelism available.
|
||||||
|
|
||||||
For single stream, header provides both the compressed and regenerated size.
|
For single stream, header provides both the compressed and regenerated size.
|
||||||
For 4-streams though,
|
For 4 streams though,
|
||||||
header only provides compressed and regenerated size of all 4 streams combined.
|
header only provides compressed and regenerated size of all 4 streams combined.
|
||||||
In order to properly decode the 4 streams,
|
In order to properly decode the 4 streams,
|
||||||
it's necessary to know the compressed and regenerated size of each stream.
|
it's necessary to know the compressed and regenerated size of each stream.
|
||||||
@ -658,8 +669,10 @@ bitstreams are preceded by 3 unsigned little-endian 16-bits values.
|
|||||||
Each value represents the compressed size of one stream, in order.
|
Each value represents the compressed size of one stream, in order.
|
||||||
The last stream size is deducted from total compressed size
|
The last stream size is deducted from total compressed size
|
||||||
and from previously decoded stream sizes :
|
and from previously decoded stream sizes :
|
||||||
|
|
||||||
`stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.
|
`stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.
|
||||||
|
|
||||||
|
|
||||||
##### Bitstreams read and decode
|
##### Bitstreams read and decode
|
||||||
|
|
||||||
Each bitstream must be read _backward_,
|
Each bitstream must be read _backward_,
|
||||||
@ -701,23 +714,18 @@ When all _sequences_ are decoded,
|
|||||||
if there is any literal left in the _literal section_,
|
if there is any literal left in the _literal section_,
|
||||||
these bytes are added at the end of the block.
|
these bytes are added at the end of the block.
|
||||||
|
|
||||||
The _Sequences_Section_ regroup all symbols required to decode commands.
|
The `Sequences_Section` regroup all symbols required to decode commands.
|
||||||
There are 3 symbol types : literals lengths, offsets and match lengths.
|
There are 3 symbol types : literals lengths, offsets and match lengths.
|
||||||
They are encoded together, interleaved, in a single _bitstream_.
|
They are encoded together, interleaved, in a single _bitstream_.
|
||||||
|
|
||||||
Each symbol is a _code_ in its own context,
|
The `Sequences_Section` starts by a header,
|
||||||
which specifies a baseline and a number of bits to add.
|
followed by optional probability tables for each symbol type,
|
||||||
_Codes_ are FSE compressed,
|
|
||||||
and interleaved with raw additional bits in the same bitstream.
|
|
||||||
|
|
||||||
The Sequences section starts by a header,
|
|
||||||
followed by optional Probability tables for each symbol type,
|
|
||||||
followed by the bitstream.
|
followed by the bitstream.
|
||||||
|
|
||||||
| `Sequences_Section_Header` | [`Literals_Length_Table`] | [`Offset_Table`] | [`Match_Length_Table`] | bitStream |
|
| `Sequences_Section_Header` | [`Literals_Length_Table`] | [`Offset_Table`] | [`Match_Length_Table`] | bitStream |
|
||||||
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
|
| -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |
|
||||||
|
|
||||||
To decode the Sequence section, it's required to know its size.
|
To decode the `Sequences_Section`, it's required to know its size.
|
||||||
This size is deducted from `blockSize - literalSectionSize`.
|
This size is deducted from `blockSize - literalSectionSize`.
|
||||||
|
|
||||||
|
|
||||||
@ -748,8 +756,8 @@ This is a single byte, defining the compression mode of each symbol type.
|
|||||||
|
|
||||||
The last field, `Reserved`, must be all-zeroes.
|
The last field, `Reserved`, must be all-zeroes.
|
||||||
|
|
||||||
`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the compression mode of
|
`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the `Compression_Mode` of
|
||||||
literals lengths, offsets and match lengths respectively.
|
literals lengths, offsets, and match lengths respectively.
|
||||||
|
|
||||||
They follow the same enumeration :
|
They follow the same enumeration :
|
||||||
|
|
||||||
@ -764,9 +772,14 @@ They follow the same enumeration :
|
|||||||
A distribution table will be present.
|
A distribution table will be present.
|
||||||
It will be described in [next part](#distribution-tables).
|
It will be described in [next part](#distribution-tables).
|
||||||
|
|
||||||
#### Symbols decoding
|
#### The codes for literals lengths, match lengths, and offsets.
|
||||||
|
|
||||||
##### Literals Length codes
|
Each symbol is a _code_ in its own context,
|
||||||
|
which specifies `Baseline` and `Number_of_Bits` to add.
|
||||||
|
_Codes_ are FSE compressed,
|
||||||
|
and interleaved with raw additional bits in the same bitstream.
|
||||||
|
|
||||||
|
##### Literals length codes
|
||||||
|
|
||||||
Literals length codes are values ranging from `0` to `35` included.
|
Literals length codes are values ranging from `0` to `35` included.
|
||||||
They define lengths from 0 to 131071 bytes.
|
They define lengths from 0 to 131071 bytes.
|
||||||
@ -778,20 +791,20 @@ They define lengths from 0 to 131071 bytes.
|
|||||||
|
|
||||||
| `Literals_Length_Code` | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
|
| `Literals_Length_Code` | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
|
||||||
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 16 | 18 | 20 | 22 | 24 | 28 | 32 | 40 |
|
| `Baseline` | 16 | 18 | 20 | 22 | 24 | 28 | 32 | 40 |
|
||||||
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
|
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
|
||||||
|
|
||||||
| `Literals_Length_Code` | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|
| `Literals_Length_Code` | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|
||||||
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
| ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 48 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 |
|
| `Baseline` | 48 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 |
|
||||||
| `Number_of_Bits` | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|
| `Number_of_Bits` | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|
||||||
|
|
||||||
| `Literals_Length_Code` | 32 | 33 | 34 | 35 |
|
| `Literals_Length_Code` | 32 | 33 | 34 | 35 |
|
||||||
| ---------------------- | ---- | ---- | ---- | ---- |
|
| ---------------------- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 8192 |16384 |32768 |65536 |
|
| `Baseline` | 8192 |16384 |32768 |65536 |
|
||||||
| `Number_of_Bits` | 13 | 14 | 15 | 16 |
|
| `Number_of_Bits` | 13 | 14 | 15 | 16 |
|
||||||
|
|
||||||
__Default distribution__
|
##### Default distribution for literals length codes
|
||||||
|
|
||||||
When `Compression_Mode` is `Predefined_Mode`,
|
When `Compression_Mode` is `Predefined_Mode`,
|
||||||
a predefined distribution is used for FSE compression.
|
a predefined distribution is used for FSE compression.
|
||||||
@ -804,7 +817,7 @@ short literalsLength_defaultDistribution[36] =
|
|||||||
-1,-1,-1,-1 };
|
-1,-1,-1,-1 };
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Match Length codes
|
##### Match length codes
|
||||||
|
|
||||||
Match length codes are values ranging from `0` to `52` included.
|
Match length codes are values ranging from `0` to `52` included.
|
||||||
They define lengths from 3 to 131074 bytes.
|
They define lengths from 3 to 131074 bytes.
|
||||||
@ -816,25 +829,25 @@ They define lengths from 3 to 131074 bytes.
|
|||||||
|
|
||||||
| `Match_Length_Code` | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
|
| `Match_Length_Code` | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
|
||||||
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 35 | 37 | 39 | 41 | 43 | 47 | 51 | 59 |
|
| `Baseline` | 35 | 37 | 39 | 41 | 43 | 47 | 51 | 59 |
|
||||||
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
|
| `Number_of_Bits` | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 |
|
||||||
|
|
||||||
| `Match_Length_Code` | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
|
| `Match_Length_Code` | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
|
||||||
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
| ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 67 | 83 | 99 | 131 | 258 | 514 | 1026 | 2050 |
|
| `Baseline` | 67 | 83 | 99 | 131 | 258 | 514 | 1026 | 2050 |
|
||||||
| `Number_of_Bits` | 4 | 4 | 5 | 7 | 8 | 9 | 10 | 11 |
|
| `Number_of_Bits` | 4 | 4 | 5 | 7 | 8 | 9 | 10 | 11 |
|
||||||
|
|
||||||
| `Match_Length_Code` | 48 | 49 | 50 | 51 | 52 |
|
| `Match_Length_Code` | 48 | 49 | 50 | 51 | 52 |
|
||||||
| ------------------- | ---- | ---- | ---- | ---- | ---- |
|
| ------------------- | ---- | ---- | ---- | ---- | ---- |
|
||||||
| Baseline | 4098 | 8194 |16486 |32770 |65538 |
|
| `Baseline` | 4098 | 8194 |16486 |32770 |65538 |
|
||||||
| `Number_of_Bits` | 12 | 13 | 14 | 15 | 16 |
|
| `Number_of_Bits` | 12 | 13 | 14 | 15 | 16 |
|
||||||
|
|
||||||
__Default distribution__
|
##### Default distribution for match length codes
|
||||||
|
|
||||||
When `Compression_Mode` is defined as `Predefined_Mode`,
|
When `Compression_Mode` is defined as `Predefined_Mode`,
|
||||||
a predefined distribution is used for FSE compression.
|
a predefined distribution is used for FSE compression.
|
||||||
|
|
||||||
Here is its definition. It uses an accuracy of 6 bits (64 states).
|
Below is its definition. It uses an accuracy of 6 bits (64 states).
|
||||||
```
|
```
|
||||||
short matchLengths_defaultDistribution[53] =
|
short matchLengths_defaultDistribution[53] =
|
||||||
{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
|
{ 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
|
||||||
@ -853,26 +866,27 @@ For information, at the time of this writing.
|
|||||||
the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
|
the reference decoder supports a maximum `N` value of `28` in 64-bits mode.
|
||||||
|
|
||||||
An offset code is also the number of additional bits to read,
|
An offset code is also the number of additional bits to read,
|
||||||
and can be translated into an `Offset_Value` using the following formulae :
|
and can be translated into an `Offset_Value` using the following formulas :
|
||||||
|
|
||||||
```
|
```
|
||||||
Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
|
Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
|
||||||
if (Offset_Value > 3) offset = Offset_Value - 3;
|
if (Offset_Value > 3) offset = Offset_Value - 3;
|
||||||
```
|
```
|
||||||
It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to 2^(N+1))-4
|
It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to `2^(N+1))-4`
|
||||||
but is limited by [maximum back-reference distance](#window_descriptor).
|
but is limited by [maximum back-reference distance](#window_descriptor).
|
||||||
|
|
||||||
Offset_Value from 1 to 3 are special : they define "repeat codes",
|
`Offset_Value` from 1 to 3 are special : they define "repeat codes",
|
||||||
which means one of the previous offsets will be repeated.
|
which means one of the previous offsets will be repeated.
|
||||||
They are sorted in recency order, with 1 meaning the most recent one.
|
They are sorted in recency order, with 1 meaning the most recent one.
|
||||||
See [Repeat offsets](#repeat-offsets) paragraph.
|
See [Repeat offsets](#repeat-offsets) paragraph.
|
||||||
|
|
||||||
__Default distribution__
|
|
||||||
|
##### Default distribution for offset codes
|
||||||
|
|
||||||
When `Compression_Mode` is defined as `Predefined_Mode`,
|
When `Compression_Mode` is defined as `Predefined_Mode`,
|
||||||
a predefined distribution is used for FSE compression.
|
a predefined distribution is used for FSE compression.
|
||||||
|
|
||||||
Here is its definition. It uses an accuracy of 5 bits (32 states),
|
Below is its definition. It uses an accuracy of 5 bits (32 states),
|
||||||
and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
|
and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .
|
||||||
|
|
||||||
If any sequence in the compressed block requires an offset larger than this,
|
If any sequence in the compressed block requires an offset larger than this,
|
||||||
@ -913,7 +927,7 @@ The bitstream starts by reporting on which scale it operates.
|
|||||||
Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
|
Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
|
||||||
and for offsets is `8`. Higher values are considered errors.
|
and for offsets is `8`. Higher values are considered errors.
|
||||||
|
|
||||||
Then follow each symbol value, from `0` to last present one.
|
Then follows each symbol value, from `0` to last present one.
|
||||||
The number of bits used by each field is variable.
|
The number of bits used by each field is variable.
|
||||||
It depends on :
|
It depends on :
|
||||||
|
|
||||||
@ -942,11 +956,11 @@ It depends on :
|
|||||||
|
|
||||||
Symbols probabilities are read one by one, in order.
|
Symbols probabilities are read one by one, in order.
|
||||||
|
|
||||||
Probability is obtained from Value decoded by following formulae :
|
Probability is obtained from Value decoded by following formula :
|
||||||
`Proba = value - 1`
|
`Proba = value - 1`
|
||||||
|
|
||||||
It means value `0` becomes negative probability `-1`.
|
It means value `0` becomes negative probability `-1`.
|
||||||
`-1` is a special probability, which means `less than 1`.
|
`-1` is a special probability, which means "less than 1".
|
||||||
Its effect on distribution table is described in [next paragraph].
|
Its effect on distribution table is described in [next paragraph].
|
||||||
For the purpose of calculating cumulated distribution, it counts as one.
|
For the purpose of calculating cumulated distribution, it counts as one.
|
||||||
|
|
||||||
@ -979,7 +993,7 @@ The table has a size of `tableSize = 1 << Accuracy_Log`.
|
|||||||
Each cell describes the symbol decoded,
|
Each cell describes the symbol decoded,
|
||||||
and instructions to get the next state.
|
and instructions to get the next state.
|
||||||
|
|
||||||
Symbols are scanned in their natural order for `less than 1` probabilities.
|
Symbols are scanned in their natural order for "less than 1" probabilities.
|
||||||
Symbols with this probability are being attributed a single cell,
|
Symbols with this probability are being attributed a single cell,
|
||||||
starting from the end of the table.
|
starting from the end of the table.
|
||||||
These symbols define a full state reset, reading `Accuracy_Log` bits.
|
These symbols define a full state reset, reading `Accuracy_Log` bits.
|
||||||
@ -1001,7 +1015,7 @@ typically by a "less than 1" probability symbol.
|
|||||||
The result is a list of state values.
|
The result is a list of state values.
|
||||||
Each state will decode the current symbol.
|
Each state will decode the current symbol.
|
||||||
|
|
||||||
To get the Number of bits and baseline required for next state,
|
To get the `Number_of_Bits` and `Baseline` required for next state,
|
||||||
it's first necessary to sort all states in their natural order.
|
it's first necessary to sort all states in their natural order.
|
||||||
The lower states will need 1 more bit than higher ones.
|
The lower states will need 1 more bit than higher ones.
|
||||||
|
|
||||||
@ -1025,11 +1039,11 @@ Numbering starts from higher states using less bits.
|
|||||||
| width | 32 | 32 | 32 | 16 | 16 |
|
| width | 32 | 32 | 32 | 16 | 16 |
|
||||||
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
|
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
|
||||||
| range number | 2 | 4 | 6 | 0 | 1 |
|
| range number | 2 | 4 | 6 | 0 | 1 |
|
||||||
| baseline | 32 | 64 | 96 | 0 | 16 |
|
| `Baseline` | 32 | 64 | 96 | 0 | 16 |
|
||||||
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
|
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
|
||||||
|
|
||||||
Next state is determined from current state
|
Next state is determined from current state
|
||||||
by reading the required number of bits, and adding the specified baseline.
|
by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
|
||||||
|
|
||||||
|
|
||||||
#### Bitstream
|
#### Bitstream
|
||||||
@ -1059,16 +1073,16 @@ Reminder : always keep in mind that all values are read _backward_.
|
|||||||
##### Decoding a sequence
|
##### Decoding a sequence
|
||||||
|
|
||||||
A state gives a code.
|
A state gives a code.
|
||||||
A code provides a baseline and number of bits to add.
|
A code provides `Baseline` and `Number_of_Bits` to add.
|
||||||
See [Symbol Decoding] section for details on each symbol.
|
See [Symbol Decoding] section for details on each symbol.
|
||||||
|
|
||||||
Decoding starts by reading the number of bits required to decode offset.
|
Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
|
||||||
It then does the same for match length,
|
It then does the same for `Match_Length`,
|
||||||
and then for literals length.
|
and then for `Literals_Length`.
|
||||||
|
|
||||||
Offset / matchLength / litLength define a sequence.
|
`Offset`, `Match_Length`, and `Literals_Length` define a sequence.
|
||||||
It starts by inserting the number of literals defined by `litLength`,
|
It starts by inserting the number of literals defined by `Literals_Length`,
|
||||||
then continue by copying `matchLength` bytes from `currentPos - offset`.
|
then continue by copying `Match_Length` bytes from `currentPos - Offset`.
|
||||||
|
|
||||||
The next operation is to update states.
|
The next operation is to update states.
|
||||||
Using rules pre-calculated in the decoding tables,
|
Using rules pre-calculated in the decoding tables,
|
||||||
@ -1080,17 +1094,17 @@ This operation will be repeated `Number_of_Sequences` times.
|
|||||||
At the end, the bitstream shall be entirely consumed,
|
At the end, the bitstream shall be entirely consumed,
|
||||||
otherwise bitstream is considered corrupted.
|
otherwise bitstream is considered corrupted.
|
||||||
|
|
||||||
[Symbol Decoding]:#symbols-decoding
|
[Symbol Decoding]:#the-codes-for-literals-lengths-match-lengths-and-offsets
|
||||||
|
|
||||||
##### Repeat offsets
|
##### Repeat offsets
|
||||||
|
|
||||||
As seen in [Offset Codes], the first 3 values define a repeated offset.
|
As seen in [Offset Codes], the first 3 values define a repeated offset and we will call them `Repeated_Offset1`, `Repeated_Offset2`, and `Repeated_Offset3`.
|
||||||
They are sorted in recency order, with 1 meaning "most recent one".
|
They are sorted in recency order, with `Repeated_Offset1` meaning "most recent one".
|
||||||
|
|
||||||
There is an exception though, when current sequence's literals length is `0`.
|
There is an exception though, when current sequence's literals length is `0`.
|
||||||
In which case, repcodes are "pushed by one",
|
In which case, repeated offsets are "pushed by one",
|
||||||
so 1 becomes 2, 2 becomes 3,
|
so `Repeated_Offset1` becomes `Repeated_Offset2`, `Repeated_Offset2` becomes `Repeated_Offset3`,
|
||||||
and 3 becomes "offset_1 - 1_byte".
|
and `Repeated_Offset3` becomes `Repeated_Offset1 - 1_byte`.
|
||||||
|
|
||||||
On first block, offset history is populated by the following values : 1, 4 and 8 (in order).
|
On first block, offset history is populated by the following values : 1, 4 and 8 (in order).
|
||||||
|
|
||||||
@ -1105,8 +1119,8 @@ they do not contribute to offset history.
|
|||||||
New offset take the lead in offset history,
|
New offset take the lead in offset history,
|
||||||
up to its previous place if it was already present.
|
up to its previous place if it was already present.
|
||||||
|
|
||||||
It means that when repeat offset 1 (most recent) is used, history is unmodified.
|
It means that when `Repeated_Offset1` (most recent) is used, history is unmodified.
|
||||||
When repeat offset 2 is used, it's swapped with offset 1.
|
When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`.
|
||||||
|
|
||||||
|
|
||||||
Dictionary format
|
Dictionary format
|
||||||
@ -1138,8 +1152,8 @@ _Reserved ranges :_
|
|||||||
|
|
||||||
__`Entropy_Tables`__ : following the same format as a [compressed blocks].
|
__`Entropy_Tables`__ : following the same format as a [compressed blocks].
|
||||||
They are stored in following order :
|
They are stored in following order :
|
||||||
Huffman tables for literals, FSE table for offset,
|
Huffman tables for literals, FSE table for offsets,
|
||||||
FSE table for matchLenth, and FSE table for litLength.
|
FSE table for match lengths, and FSE table for literals lengths.
|
||||||
It's finally followed by 3 offset values, populating recent offsets,
|
It's finally followed by 3 offset values, populating recent offsets,
|
||||||
stored in order, 4-bytes little-endian each, for a total of 12 bytes.
|
stored in order, 4-bytes little-endian each, for a total of 12 bytes.
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user