clarifications for the FSE decoding table

requested in #1782
dev
Yann Collet 2019-10-18 17:48:12 -07:00
parent ed65210c9b
commit ff7bd16c0a
1 changed files with 30 additions and 28 deletions

View File

@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version ### Version
0.3.3 (16/08/19) 0.3.4 (16/08/19)
Introduction Introduction
@ -1107,18 +1107,18 @@ It follows the following build rule :
The table has a size of `Table_Size = 1 << Accuracy_Log`. The table has a size of `Table_Size = 1 << Accuracy_Log`.
Each cell describes the symbol decoded, Each cell describes the symbol decoded,
and instructions to get the next state. and instructions to get the next state (`Number_of_Bits` and `Baseline`).
Symbols are scanned in their natural order for "less than 1" probabilities. Symbols are scanned in their natural order for "less than 1" probabilities.
Symbols with this probability are being attributed a single cell, Symbols with this probability are being attributed a single cell,
starting from the end of the table and retreating. starting from the end of the table and retreating.
These symbols define a full state reset, reading `Accuracy_Log` bits. These symbols define a full state reset, reading `Accuracy_Log` bits.
All remaining symbols are allocated in their natural order. Then, all remaining symbols, sorted in natural order, are allocated cells.
Starting from symbol `0` and table position `0`, Starting from symbol `0` (if it exists), and table position `0`,
each symbol gets allocated as many cells as its probability. each symbol gets allocated as many cells as its probability.
Cell allocation is spreaded, not linear : Cell allocation is spreaded, not linear :
each successor position follow this rule : each successor position follows this rule :
``` ```
position += (tableSize>>1) + (tableSize>>3) + 3; position += (tableSize>>1) + (tableSize>>3) + 3;
@ -1130,40 +1130,41 @@ A position is skipped if already occupied by a "less than 1" probability symbol.
each position in the table, switching to the next symbol when enough each position in the table, switching to the next symbol when enough
states have been allocated to the current one. states have been allocated to the current one.
The result is a list of state values. The process guarantees that the table is entirely filled.
Each state will decode the current symbol. Each cell corresponds to a state value, which contains the symbol being decoded.
To get the `Number_of_Bits` and `Baseline` required for next state, To add the `Number_of_Bits` and `Baseline` required to retrieve next state,
it's first necessary to sort all states in their natural order. it's first necessary to sort all occurrences of each symbol in state order.
The lower states will need 1 more bit than higher ones. Lower states will need 1 more bit than higher ones.
The process is repeated for each symbol. The process is repeated for each symbol.
__Example__ : __Example__ :
Presuming a symbol has a probability of 5. Presuming a symbol has a probability of 5,
It receives 5 state values. States are sorted in natural order. it receives 5 cells, corresponding to 5 state values.
These state values are then sorted in natural order.
Next power of 2 is 8. Next power of 2 after 5 is 8.
Space of probabilities is divided into 8 equal parts. Space of probabilities must be divided into 8 equal parts.
Presuming the `Accuracy_Log` is 7, it defines 128 states. Presuming the `Accuracy_Log` is 7, it defines a space of 128 states.
Divided by 8, each share is 16 large. Divided by 8, each share is 16 large.
In order to reach 8, 8-5=3 lowest states will count "double", In order to reach 8 shares, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width), doubling their shares (32 in width), hence requiring one more bit.
requiring one more bit in the process.
Baseline is assigned starting from the higher states using fewer bits, Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state, increasing at each state, then resuming at the first state,
each takes its allocated width from Baseline. each state takes its allocated width from Baseline.
| state order | 0 | 1 | 2 | 3 | 4 | | state value | 1 | 39 | 77 | 84 | 122 |
| ---------------- | ----- | ----- | ------ | ---- | ----- | | state order | 0 | 1 | 2 | 3 | 4 |
| width | 32 | 32 | 32 | 16 | 16 | | ---------------- | ----- | ----- | ------ | ---- | ------ |
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 | | width | 32 | 32 | 32 | 16 | 16 |
| range number | 2 | 4 | 6 | 0 | 1 | | `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
| `Baseline` | 32 | 64 | 96 | 0 | 16 | | range number | 2 | 4 | 6 | 0 | 1 |
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 | | `Baseline` | 32 | 64 | 96 | 0 | 16 |
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
The next state is determined from current state During decoding, the next state value is determined from current state value,
by reading the required `Number_of_Bits`, and adding the specified `Baseline`. by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
See [Appendix A] for the results of this process applied to the default distributions. See [Appendix A] for the results of this process applied to the default distributions.
@ -1657,6 +1658,7 @@ or at least provide a meaningful error code explaining for which reason it canno
Version changes Version changes
--------------- ---------------
- 0.3.4 : clarifications for FSE decoding table
- 0.3.3 : clarifications for field Block_Size - 0.3.3 : clarifications for field Block_Size
- 0.3.2 : remove additional block size restriction on compressed blocks - 0.3.2 : remove additional block size restriction on compressed blocks
- 0.3.1 : minor clarification regarding offset history update rules - 0.3.1 : minor clarification regarding offset history update rules