Commit Graph

191 Commits (47685ac8560b0700a144a3ba7a7bc7a52d6859f9)

Author SHA1 Message Date
Carl Woffenden 5d81d44e40 Fixed VS variable shadowing warning (and added test) 2020-07-29 12:33:39 +02:00
yoshihitoh c6548eac8e Rename static vars to avoid redefinition error. 2020-06-29 10:51:50 +09:00
Nick Terrell 081691a3aa
Merge pull request #2217 from terrelln/cover-redundant
[cover] Remove unnecessary mask and dedup hash functions
2020-06-22 17:41:13 -07:00
Nick Terrell 370933fa20
Merge pull request #2209 from Niadb/dev
Explicitly use __cdecl for qsort, to avoid warning when default calling convention is not __cdecl
2020-06-22 17:41:03 -07:00
Nick Terrell 2312b819af [cover] Remove unnecessary mask and dedup hash functions
* Remove the unnecessary mask, since `ZSTD_hash*()` already ensures
  the output is mod 2^h.
* Dedup the hash functions and use `ZSTD_hash*()` directly.
2020-06-22 12:52:13 -07:00
Carl Woffenden 38cdb6a072 Renamed cover and fast cover hash functions/vars 2020-06-22 11:54:24 +02:00
Carl Woffenden 4a9b7d136f Initial implementation (files added, macros fixed)
Hashing functions still to fix.
2020-06-22 10:31:36 +02:00
Niadb 405586d40a
Add files via upload 2020-06-19 03:32:11 -06:00
Nick Terrell 08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Nick Terrell 45c66dd298 [zdict] Stabilize ZDICT_finalizeDictionary() 2020-05-07 10:37:01 -07:00
W. Felix Handte 6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Sen Huang c85d10d0ea Remove mixed declarations 2019-11-08 13:57:26 -05:00
Sen Huang d9c475f3b3 Fix static analyze error, use proper bounds for dictEnd 2019-11-08 13:57:26 -05:00
Sen Huang d06b90692b Move asserts to loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Sen Huang b39149e156 Expose ZSTD_reset_compressedBlockState() to shared API 2019-11-08 13:57:26 -05:00
Sen Huang 6ce335371b Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge 2019-11-08 13:57:26 -05:00
Sen Huang 4a61aaf368 Remove redundant comment 2019-11-08 13:57:26 -05:00
Sen Huang c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang 04fb42b4f3 Integrated refactor into getDictHeaderSize, now passes tests 2019-11-08 13:57:26 -05:00
Sen Huang 4b141b63e0 Revert "Move decompress symbols into zstd_internal.h, remove dependency"
This reverts commit a152b4c67a5266f611db4a2eac4a79003852a795.
2019-11-08 13:57:26 -05:00
Sen Huang 84404cff6e Move decompress symbols into zstd_internal.h, remove dependency 2019-11-08 13:57:26 -05:00
Sen Huang 341e0641ed Checks malloc() for failure, returns 0 if so 2019-11-08 13:57:26 -05:00
Sen Huang 97b7f712f3 Change to heap allocation, remove implicit type conversion 2019-11-08 13:57:25 -05:00
Sen Huang 3c36a7f13a Add ZDICT_getHeaderSize() 2019-11-08 13:57:08 -05:00
moozzyk eda7946a36 Take ZSTD_parameters as a const pointer
Fixes: #1637
2019-10-22 23:21:54 -07:00
Yann Collet 2b0a271ed2 fix eductional decoder
fix #1774
also :
- fix minor compilation warnings
- make sure the `test` is run during CI tests
2019-09-06 14:30:13 -07:00
Nick Terrell 0932de54bc [dictBuilder] Fix deadlock in *COVER error case
The COVER and FASTCOVER dictionary builders can deadlock when
dictionary construction errors, likely because there are too few
samples, or too few distinct dmers. The deadlock only occurs when
there are errors.

Fixes #1746.
2019-08-26 18:19:29 -07:00
Ed Maste b81d7cc6a0 remove extraneous doubled ;s 2019-08-15 21:17:06 -04:00
Tyler-Tran c55d2e7ba3 Adding shrinking flag for cover and fastcover (#1656)
* Changed ERROR(GENERIC) excluding inits

* editing git ignore

* Edited init functions to size_t returns

* moved declarations earlier

* resolved issues with changes to init functions

* fixed style and an error check

* attempting to add tests that might trigger changes

* added && die to cases expecting to fail

* resolved no die on expected failed command

* fixed accel to be incorrect value

* Adding an automated shrinking option

* Fixing build

* finalizing fixes

* fix?

* Removing added comment in cover.h

* Styling fixes

* Merging with fb dev

* removing megic number for default regression

* Requested revisions

* fixing support for fast cover

* fixing casting errors

* parenthesis fix

* fixing some build nits

* resolving travis ci syntax

* might resolve all compilation issues

* removed unused variable

* remodeling the selectDict function

* fixing bad memory access

* fixing error checks

* fixed erroring check in selectDict

* fixing mixed declarations

* modify mixed declaration

* fixing nits and adding test cases

* Adding requested changes + fixed bug for error checking

* switched double comparison from != to <

* fixed declaration typing

* refactoring COVER_best_finish() and changing shrinkDict

* removing the const's

* modifying ZDICT_optimizeTrainFromBuffer_cover functions

* fixing potential bad memcpy

* fixing the error function for dict size
2019-06-27 16:26:57 -07:00
Tyler-Tran cb47871a0a [dictBuilder] Be more specific than ERROR(generic) (#1616)
* Specify errors at a finer granularity than `ERROR(generic)`.
* Add tests for bad parameters in the dictionary builder.
2019-05-22 18:57:50 -07:00
Josh Soref a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell e649fad7aa [dictBuilder] Fix displayLevel for corpus warning
Pass the displaylevel into the corpus warning, because it is used in
fast cover and cover, so it needs to respect the local level.
2019-04-08 20:00:18 -07:00
Nick Terrell d0f5ba36fb [cover] Improvements for small or homogeneous data
* The algorithm would bail as soon as it found one epoch that
  contained no new segments. Change it so it now has to fail
  >= 10 times in a row (10 for fastcover, 10-100 for cover).
* The algorithm uses the `maxDict` size to decide the epoch size.
  When this size is absurdly large, it causes tiny epochs. Lower
  bound the epoch size at 10x the segment size, and warn the user
  that their training set is too small.

Fixes #1554
2019-03-22 14:14:46 -07:00
Nick Terrell 21616d8a77 [zdict] Improve documentation 2019-02-01 15:19:32 -08:00
Yann Collet ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet ffba142406 fixed file identity detection in 32-bit mode
also :
some library decided to use `index` as a global variable declared in standard header
shadowing the ones used in fastcover.c  :(
2018-12-20 14:30:30 -08:00
Yann Collet 2e7fd6a2cb fixed remaining searchLength invocations 2018-11-20 15:13:27 -08:00
Bartosz Szreder 5c5c476338 Prevent deadlock on malloc() failure. 2018-11-08 10:29:31 +01:00
Yann Collet 3ca6261223 fixed static analyzer warnings
note : for some reason,
scan-build version on my laptop found problems within fastcover.c
that scan-build on travisCI does not flag.

They are, as usual, false positive :
the analyzer does not understand that a table (`offset`) is correctly filled before usage.
2018-10-02 15:59:11 -07:00
Nick Terrell f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet 50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Jennifer Liu 21721b75a3 Change default f to 20 2018-09-04 17:15:14 -07:00
Jennifer Liu 944c9986e0 Update comment on default steps of cover and fastcover 2018-08-30 15:37:29 -07:00
Jennifer Liu 16db0337b1 Always use splitPoint=1.0 for non-optimize cover and fastcover 2018-08-30 14:59:22 -07:00
modbw d14edf259f
Fixed memory leak detected by cppcheck
cppcheck (which is run regularly in our CI environment)  detected a possible memory leak.
2018-08-28 07:25:05 +02:00
Yann Collet 6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet 1515f0bb0d fixed more issues detected by recent version of scan-build
test run on Linux
2018-08-16 15:20:25 -07:00
Yann Collet 6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00