Commit Graph

115 Commits (master)

Author SHA1 Message Date
Yann Collet 7a18d709ae updated all names to offBase convention 2021-12-29 17:30:43 -08:00
Yann Collet 681c81f06c abstracted storeSeq() sumtype numeric representation from decodecorpus.c 2021-12-28 11:58:33 -08:00
Yann Collet 1aed962216 introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.

Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.

While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".

One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.

This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.

At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
2021-12-23 22:03:30 -08:00
Yann Collet aeff128331 change seqDef.offset into seqDef.offBase
to better reflect the value stored in this field.
2021-12-23 17:56:08 -08:00
Yann Collet e145b58cfd changed seqDef.matchLength into seqDef.mlBase
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
2021-12-23 13:39:46 -08:00
Yann Collet b77fcac61f change ZSTD_storeSeq() interface to accept matchLength
instead of mlBase.

This removes the need to do `- MINMATCH` at every call site.

The new interface contract is checked with an `assert()`.
2021-12-23 12:03:33 -08:00
Nick Terrell 46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
Nick Terrell a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Nick Terrell 66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet 5c0a3489a5 fix aliasing warning in decodecorpus 2020-12-04 19:21:40 -08:00
Nick Terrell a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
Nick Terrell 8f8bd2d1ac [regression] Update results.csv 2020-08-20 12:41:35 -07:00
Meghna Malhotra cc7c29595d Fixed tests to use correct workspace size 2020-05-01 13:45:48 -07:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell e068bd01df [tests] Fix decodecorpus 2019-09-20 01:09:47 -07:00
Vivek Miglani a3ce0c9d04 Fixing decodecorpus test issue 2019-07-18 14:32:09 -07:00
Ephraim Park 734eff70b8 enable repeat mode on rle 2019-06-26 16:39:00 -07:00
Josh Soref a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Yann Collet 59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
W. Felix Handte 03e040a966 Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(... 2019-01-28 17:33:01 -05:00
Yann Collet ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet 7b74405150 refactor HUF_compress_internal for clarity
changed workspace parameter convention
to always provide workspaceSize,
so that size can be explicitly checked.

Also, use more enum to make the meaning of some parameters more explicit.
2018-10-26 13:21:37 -07:00
Yann Collet f181799082 fix decodecorpus incorrect frame generation
fix #1379
decodecorpus was generating one extraneous byte when `nbSeq==0`.
This is disallowed by the specification.

The reference decoder was just skipping the extraneous byte.
It is now stricter, and flag such situation as an error.
2018-10-20 18:56:21 -07:00
Nick Terrell e3b5286197 Fix decodecorpus 2018-08-28 13:56:47 -07:00
Nick Terrell 3b56bb1e4c Fix decodecorpus 2018-08-23 17:48:06 -07:00
Yann Collet 2d76defbfe grouped all histogram functions into hist.c
renamed functions with HIST_* prefix
2018-06-13 19:49:31 -04:00
Nick Terrell dab8cfa3c7 Combine definitions of SEC_TO_MICRO 2017-11-30 19:40:53 -08:00
Nick Terrell 9a2f6f477b Use util.h for timing 2017-11-30 14:57:25 -08:00
Yann Collet 7f6a783862 fixed a small error in decodeCorpus
a compressed block must be strictly smaller than its decompressed size.
2017-10-07 15:19:52 -07:00
Nick Terrell bbe77212ef [libzstd] Increase MaxOff 2017-09-25 13:36:18 -07:00
Stella Lau 963558a072 Fix implicit conversion error 2017-09-13 16:01:16 -07:00
Stella Lau 40bf0ced7d Add flag to limit max decompressed size in decodeCorpus 2017-09-13 15:16:56 -07:00
Stella Lau e89065506e Make decodecorpus generate raw compressed blocks 2017-09-12 17:18:45 -07:00
Yann Collet 3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Yann Collet 32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Paul Cruz 7ac4724bd2 removed fnum from DISPLAY statements 2017-06-28 13:00:49 -07:00
Paul Cruz e667d33b0b fixed generation of buggy test, corrected DISPLAY statements for errors 2017-06-28 12:19:37 -07:00
Yann Collet 20eeb243d1 Merge pull request #729 from paulcruz74/corpus
Corpus
2017-06-26 17:47:28 -07:00
Paul Cruz 3a295a91f8 added additional condition so large offsets into the dictionary are not generated past windowSize 2017-06-23 15:54:51 -07:00
Paul Cruz 2085375816 fixed bug detected by the API test 2017-06-23 13:44:24 -07:00
Paul Cruz 8cd134559d type warnings 2017-06-23 12:00:48 -07:00
Paul Cruz 4219acc60a fixed bus error bug 2017-06-23 11:22:29 -07:00
Paul Cruz 2e8cc6f12a added sizeof for clarity 2017-06-22 15:52:33 -07:00
Paul Cruz b325a2e4db changed assignment 2017-06-22 15:36:28 -07:00
Paul Cruz 829eb29033 added cli test for decodecorpus inside tests/Makefile. Also changed calculation of offset 2017-06-22 14:43:44 -07:00
Paul Cruz 98751f69e7 should be updating seed whenever multiple files are generated 2017-06-22 10:23:36 -07:00
Paul Cruz 84cfa07d2d changed format of command to --use-dict=# 2017-06-22 10:04:14 -07:00
Paul Cruz 04094f37e9 fixed offset in this case os that it always goes past src start 2017-06-21 18:47:40 -07:00
Paul Cruz 0950b3159a more meaningful names for count variables 2017-06-21 18:30:27 -07:00
Paul Cruz 0b6eedeace malloc samples instead of static allocation 2017-06-21 18:24:19 -07:00