clarifications for the FSE decoding table

requested in #1782
2019-10-18 17:48:12 -07:00 · 2019-10-18 17:48:12 -07:00 · ff7bd16c0a
commit ff7bd16c0a
parent ed65210c9b
1 changed files with 30 additions and 28 deletions
--- a/doc/zstd_compression_format.md
+++ b/doc/zstd_compression_format.md
@ -16,7 +16,7 @@ Distribution of this document is unlimited.
 ### Version
-0.3.3 (16/08/19)
+0.3.4 (16/08/19)
 Introduction
@ -1107,18 +1107,18 @@ It follows the following build rule :
 The table has a size of `Table_Size = 1 << Accuracy_Log`.
 Each cell describes the symbol decoded,
-and instructions to get the next state.
+and instructions to get the next state (`Number_of_Bits` and `Baseline`).
 Symbols are scanned in their natural order for "less than 1" probabilities.
 Symbols with this probability are being attributed a single cell,
 starting from the end of the table and retreating.
 These symbols define a full state reset, reading `Accuracy_Log` bits.
-All remaining symbols are allocated in their natural order.
+Then, all remaining symbols, sorted in natural order, are allocated cells.
-Starting from symbol `0` and table position `0`,
+Starting from symbol `0` (if it exists), and table position `0`,
 each symbol gets allocated as many cells as its probability.
 Cell allocation is spreaded, not linear :
-each successor position follow this rule :
+each successor position follows this rule :
 ```
 position += (tableSize>>1) + (tableSize>>3) + 3;
@ -1130,40 +1130,41 @@ A position is skipped if already occupied by a "less than 1" probability symbol.
 each position in the table, switching to the next symbol when enough
 states have been allocated to the current one.
-The result is a list of state values.
+The process guarantees that the table is entirely filled.
-Each state will decode the current symbol.
+Each cell corresponds to a state value, which contains the symbol being decoded.
-To get the `Number_of_Bits` and `Baseline` required for next state,
+To add the `Number_of_Bits` and `Baseline` required to retrieve next state,
-it's first necessary to sort all states in their natural order.
+it's first necessary to sort all occurrences of each symbol in state order.
-The lower states will need 1 more bit than higher ones.
+Lower states will need 1 more bit than higher ones.
 The process is repeated for each symbol.
 __Example__ :
-Presuming a symbol has a probability of 5.
+Presuming a symbol has a probability of 5,
-It receives 5 state values. States are sorted in natural order.
+it receives 5 cells, corresponding to 5 state values.
 These state values are then sorted in natural order.
-Next power of 2 is 8.
+Next power of 2 after 5 is 8.
-Space of probabilities is divided into 8 equal parts.
+Space of probabilities must be divided into 8 equal parts.
-Presuming the `Accuracy_Log` is 7, it defines 128 states.
+Presuming the `Accuracy_Log` is 7, it defines a space of 128 states.
 Divided by 8, each share is 16 large.
-In order to reach 8, 8-5=3 lowest states will count "double",
+In order to reach 8 shares, 8-5=3 lowest states will count "double",
-doubling the number of shares (32 in width),
+doubling their shares (32 in width), hence requiring one more bit.
 requiring one more bit in the process.
 Baseline is assigned starting from the higher states using fewer bits,
-and proceeding naturally, then resuming at the first state,
+increasing at each state, then resuming at the first state,
-each takes its allocated width from Baseline.
+each state takes its allocated width from Baseline.
-| state order      |   0   |   1   |    2   |   3  |   4   |
+| state value      |   1   |  39   |   77   |  84  |  122   |
-| ---------------- | ----- | ----- | ------ | ---- | ----- |
+| state order      |   0   |   1   |    2   |   3  |    4   |
-| width            |  32   |  32   |   32   |  16  |  16   |
+| ---------------- | ----- | ----- | ------ | ---- | ------ |
-| `Number_of_Bits` |   5   |   5   |    5   |   4  |   4   |
+| width            |  32   |  32   |   32   |  16  |   16   |
-| range number     |   2   |   4   |    6   |   0  |   1   |
+| `Number_of_Bits` |   5   |   5   |    5   |   4  |    4   |
-| `Baseline`       |  32   |  64   |   96   |   0  |  16   |
+| range number     |   2   |   4   |    6   |   0  |    1   |
-| range            | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |
+| `Baseline`       |  32   |  64   |   96   |   0  |   16   |
 | range            | 32-63 | 64-95 | 96-127 | 0-15 | 16-31  |
-The next state is determined from current state
+During decoding, the next state value is determined from current state value,
 by reading the required `Number_of_Bits`, and adding the specified `Baseline`.
 See [Appendix A] for the results of this process applied to the default distributions.
@ -1657,6 +1658,7 @@ or at least provide a meaningful error code explaining for which reason it canno
 Version changes
 ---------------
 - 0.3.4 : clarifications for FSE decoding table
 - 0.3.3 : clarifications for field Block_Size
 - 0.3.2 : remove additional block size restriction on compressed blocks
 - 0.3.1 : minor clarification regarding offset history update rules