Merge pull request #1835 from facebook/format034

clarifications for the FSE decoding table
dev
Yann Collet 2019-10-19 05:24:42 -07:00 committed by GitHub
commit 0f9866add2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 30 additions and 28 deletions

View File

@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version
0.3.3 (16/08/19)
0.3.4 (16/08/19)
Introduction
@ -1107,18 +1107,18 @@ It follows the following build rule :
The table has a size of `Table_Size = 1 << Accuracy_Log`.
Each cell describes the symbol decoded,
and instructions to get the next state.
and instructions to get the next state (`Number_of_Bits` and `Baseline`).
Symbols are scanned in their natural order for "less than 1" probabilities.
Symbols with this probability are being attributed a single cell,
starting from the end of the table and retreating.
These symbols define a full state reset, reading `Accuracy_Log` bits.
All remaining symbols are allocated in their natural order.
Starting from symbol `0` and table position `0`,
Then, all remaining symbols, sorted in natural order, are allocated cells.
Starting from symbol `0` (if it exists), and table position `0`,
each symbol gets allocated as many cells as its probability.
Cell allocation is spreaded, not linear :
each successor position follow this rule :
each successor position follows this rule :
```
position += (tableSize>>1) + (tableSize>>3) + 3;
@ -1130,40 +1130,41 @@ A position is skipped if already occupied by a "less than 1" probability symbol.
each position in the table, switching to the next symbol when enough
states have been allocated to the current one.
The result is a list of state values.
Each state will decode the current symbol.
The process guarantees that the table is entirely filled.
Each cell corresponds to a state value, which contains the symbol being decoded.
To get the `Number_of_Bits` and `Baseline` required for next state,
it's first necessary to sort all states in their natural order.
The lower states will need 1 more bit than higher ones.
To add the `Number_of_Bits` and `Baseline` required to retrieve next state,
it's first necessary to sort all occurrences of each symbol in state order.
Lower states will need 1 more bit than higher ones.
The process is repeated for each symbol.
__Example__ :
Presuming a symbol has a probability of 5.
It receives 5 state values. States are sorted in natural order.
Presuming a symbol has a probability of 5,
it receives 5 cells, corresponding to 5 state values.
These state values are then sorted in natural order.
Next power of 2 is 8.
Space of probabilities is divided into 8 equal parts.
Presuming the `Accuracy_Log` is 7, it defines 128 states.
Next power of 2 after 5 is 8.
Space of probabilities must be divided into 8 equal parts.
Presuming the `Accuracy_Log` is 7, it defines a space of 128 states.
Divided by 8, each share is 16 large.
In order to reach 8, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width),
requiring one more bit in the process.
In order to reach 8 shares, 8-5=3 lowest states will count "double",
doubling their shares (32 in width), hence requiring one more bit.
Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state,
each takes its allocated width from Baseline.
increasing at each state, then resuming at the first state,
each state takes its allocated width from Baseline.
| state order | 0 | 1 | 2 | 3 | 4 |
| ---------------- | ----- | ----- | ------ | ---- | ----- |
| width | 32 | 32 | 32 | 16 | 16 |
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
| range number | 2 | 4 | 6 | 0 | 1 |
| `Baseline` | 32 | 64 | 96 | 0 | 16 |
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
| state value | 1 | 39 | 77 | 84 | 122 |
| state order | 0 | 1 | 2 | 3 | 4 |
| ---------------- | ----- | ----- | ------ | ---- | ------ |
| width | 32 | 32 | 32 | 16 | 16 |
| `Number_of_Bits` | 5 | 5 | 5 | 4 | 4 |
| range number | 2 | 4 | 6 | 0 | 1 |
| `Baseline` | 32 | 64 | 96 | 0 | 16 |
| range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
The next state is determined from current state
During decoding, the next state value is determined from current state value,
by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
See [Appendix A] for the results of this process applied to the default distributions.
@ -1657,6 +1658,7 @@ or at least provide a meaningful error code explaining for which reason it canno
Version changes
---------------
- 0.3.4 : clarifications for FSE decoding table
- 0.3.3 : clarifications for field Block_Size
- 0.3.2 : remove additional block size restriction on compressed blocks
- 0.3.1 : minor clarification regarding offset history update rules