Commit Graph

212 Commits (2624652a3273817a99fb977975db96e38d440bae)

Author SHA1 Message Date
Nick Terrell 0356e05f07 [zdict] Remove ZDICT_CONTENTSIZE_MIN restriction for ZDICT_finalizeDictionary
Allow the `dictContentSize` to be any size. The finalized dictionary
content size must be at least as large as the maximum repcode (8). So we
add zero bytes to the dictionary to ensure that we meet that
requirement.

I've removed this restriction because its been causing us headaches when
people complain that dictionary training failed. It fails because there
isn't enough useful content to put in the dictionary. Either because
every sample is exactly the same and less than ZDICT_CONTENTSIZE_MIN bytes,
or there isn't enough content. Instead, we should succeed in creating
the dictionary, and it is up to the user to decide if it is worthwhile.
It is possible that the tables alone provide enough value.

NOTE: This allows us to produce dictionaries with finalized
`dictContentSize < ZDICT_CONTENTSIZE_MIN`. But, they are still valid
zstd dictionaries. We could remove the `ZDICT_CONTENTSIZE_MIN` macro,
but I've decided to leave that for now, so we don't break users.
2021-11-30 18:02:26 -08:00
Yann Collet 9d62957b31
Merge pull request #2800 from animalize/fix_c89
Fix a C89 error in msvc
2021-10-18 14:32:04 -07:00
stanjo74 52598d54e9
Limit train samples (#2809)
* Limit training samples size to 2GB

* simplified DISPLAYLEVEL() macro to use global vqriable instead of local.

* refactored training samples loading

* fixed compiler warning

* addressed comments from the pull request

* addressed @terrelln comments

* missed some fixes

* fixed type mismatch

* Fixed bug passing estimated number of samples rather insted of the loaded number of samples.
Changed unit conversion not to use bit-shifts.

* fixed a declaration after code

* fixed type conversion compile errors

* fixed more type castting

* fixed more type mismatching

* changed sizes type to size_t

* move type casting

* more type cast fixes
2021-10-04 17:47:52 -07:00
Ma Lin ae986fcdb8 Use __assume(0) for unreachable code path in msvc
msvc will optimize away the condition check.
2021-09-27 19:23:57 +08:00
Ma Lin e5ba858270 Don't initialize the first parameter of _BitScanForward* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64
2021-09-25 16:36:53 +08:00
Ma Lin 95f492ea17 Don't initialize the first parameter of _BitScanReverse* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64
2021-09-25 16:36:53 +08:00
Yann Collet 27a8bbe265 new initializer for ll price 2021-09-03 16:07:31 -07:00
Yann Collet f0fc8cb3e1 Disable console notification by default within the library
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00
Nick Terrell 09149beaf8 [1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
2021-04-30 15:13:54 -07:00
Nick Terrell a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Yann Collet b9748757b0 fixed minor cast warning 2021-02-05 09:55:54 -08:00
Yann Collet 33b73db33c
Merge pull request #2457 from facebook/cli-dll
zstd CLI can now be linked to libzstd dynamic library
2021-01-07 17:10:13 -08:00
Thomas Waldmann f9802d80a0 fix typos (work done by Andrea Gelmini) 2021-01-07 18:47:23 +01:00
Yann Collet cefdc023f7 The CLI can be linked to libzstd dynamic library
invoking target zstd-dll
2021-01-06 18:00:24 -08:00
Nick Terrell 66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Gregory Szorc dd1a7e41ee Add ifndef guards for _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE
This ensures the symbols aren't redefined, which would result in a compiler
error.

I was getting redefined symbols for _LARGEFILE64_SOURCE when building for
32-bit x86 Linux on an older CentOS release in a CI environment. With this
change, I'm able to compile the single file library in this environment.

Closes #2443.
2020-12-26 10:02:45 -07:00
animalize 1aec77ea89 use ZSTD_CLEVEL_DEFAULT in zdict.c 2020-12-03 12:46:57 +08:00
Dmitriy Titarenko 61f71753d4 Pass dictBufferCapacity to COVER_selectDict()
closes #2371
2020-11-22 23:45:18 +05:00
Yann Collet 314c7df170 minor : change test order
to reduce a warning with `clang` on Windows
2020-10-16 13:26:47 -07:00
Nick Terrell 614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell 575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Carl Woffenden 5d81d44e40 Fixed VS variable shadowing warning (and added test) 2020-07-29 12:33:39 +02:00
yoshihitoh c6548eac8e Rename static vars to avoid redefinition error. 2020-06-29 10:51:50 +09:00
Nick Terrell 081691a3aa
Merge pull request #2217 from terrelln/cover-redundant
[cover] Remove unnecessary mask and dedup hash functions
2020-06-22 17:41:13 -07:00
Nick Terrell 370933fa20
Merge pull request #2209 from Niadb/dev
Explicitly use __cdecl for qsort, to avoid warning when default calling convention is not __cdecl
2020-06-22 17:41:03 -07:00
Nick Terrell 2312b819af [cover] Remove unnecessary mask and dedup hash functions
* Remove the unnecessary mask, since `ZSTD_hash*()` already ensures
  the output is mod 2^h.
* Dedup the hash functions and use `ZSTD_hash*()` directly.
2020-06-22 12:52:13 -07:00
Carl Woffenden 38cdb6a072 Renamed cover and fast cover hash functions/vars 2020-06-22 11:54:24 +02:00
Carl Woffenden 4a9b7d136f Initial implementation (files added, macros fixed)
Hashing functions still to fix.
2020-06-22 10:31:36 +02:00
Niadb 405586d40a
Add files via upload 2020-06-19 03:32:11 -06:00
Nick Terrell 08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Nick Terrell 45c66dd298 [zdict] Stabilize ZDICT_finalizeDictionary() 2020-05-07 10:37:01 -07:00
W. Felix Handte 6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Sen Huang c85d10d0ea Remove mixed declarations 2019-11-08 13:57:26 -05:00
Sen Huang d9c475f3b3 Fix static analyze error, use proper bounds for dictEnd 2019-11-08 13:57:26 -05:00
Sen Huang d06b90692b Move asserts to loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Sen Huang b39149e156 Expose ZSTD_reset_compressedBlockState() to shared API 2019-11-08 13:57:26 -05:00
Sen Huang 6ce335371b Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge 2019-11-08 13:57:26 -05:00
Sen Huang 4a61aaf368 Remove redundant comment 2019-11-08 13:57:26 -05:00
Sen Huang c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang 04fb42b4f3 Integrated refactor into getDictHeaderSize, now passes tests 2019-11-08 13:57:26 -05:00
Sen Huang 4b141b63e0 Revert "Move decompress symbols into zstd_internal.h, remove dependency"
This reverts commit a152b4c67a5266f611db4a2eac4a79003852a795.
2019-11-08 13:57:26 -05:00
Sen Huang 84404cff6e Move decompress symbols into zstd_internal.h, remove dependency 2019-11-08 13:57:26 -05:00
Sen Huang 341e0641ed Checks malloc() for failure, returns 0 if so 2019-11-08 13:57:26 -05:00
Sen Huang 97b7f712f3 Change to heap allocation, remove implicit type conversion 2019-11-08 13:57:25 -05:00
Sen Huang 3c36a7f13a Add ZDICT_getHeaderSize() 2019-11-08 13:57:08 -05:00
moozzyk eda7946a36 Take ZSTD_parameters as a const pointer
Fixes: #1637
2019-10-22 23:21:54 -07:00
Yann Collet 2b0a271ed2 fix eductional decoder
fix #1774
also :
- fix minor compilation warnings
- make sure the `test` is run during CI tests
2019-09-06 14:30:13 -07:00
Nick Terrell 0932de54bc [dictBuilder] Fix deadlock in *COVER error case
The COVER and FASTCOVER dictionary builders can deadlock when
dictionary construction errors, likely because there are too few
samples, or too few distinct dmers. The deadlock only occurs when
there are errors.

Fixes #1746.
2019-08-26 18:19:29 -07:00
Ed Maste b81d7cc6a0 remove extraneous doubled ;s 2019-08-15 21:17:06 -04:00