76 Commits

Author SHA1 Message Date
Nick Terrell
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Quentin Carbonneaux
552efcac2d relocate large arrays from the stack to ldmState_t 2021-02-10 16:16:54 +01:00
Quentin Carbonneaux
e2ad174d73 fix some compiler warnings 2021-02-08 20:19:16 +01:00
Quentin Carbonneaux
874a590e5c deal safely with short inputs in ZSTD_ldm_generateSequences
The fuzzer CI found this bug.
2021-02-04 11:15:24 +01:00
Quentin Carbonneaux
9f327c02fd new core ldm algorithm 2021-02-03 22:24:07 +01:00
Quentin Carbonneaux
aee3dc877f fix a variable name to reflect its nature 2021-01-22 02:24:19 -08:00
Quentin Carbonneaux
d6e3de77dc fix warning and remove one more occurrence of makeEntryAndInsertByTag 2021-01-20 01:39:16 -08:00
Quentin Carbonneaux
e0d5eca8fa fix forgotten numTagBits in getTagMask 2021-01-20 00:54:20 -08:00
Quentin Carbonneaux
1e65711ca5 a couple performance improvement changes for ldm 2021-01-20 00:54:20 -08:00
Thomas Waldmann
92a2b5ccc9 fixup: lits means literals 2021-01-07 23:30:42 +01:00
Thomas Waldmann
f9802d80a0 fix typos (work done by Andrea Gelmini) 2021-01-07 18:47:23 +01:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
senhuang42
f57c7e6bbf Add base adjustment correction 2020-10-07 13:56:25 -04:00
senhuang42
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
senhuang42
35d9f488f5 Modify codepath to use opt parser exclusively if the compression level is high enough 2020-10-07 13:56:24 -04:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Yann Collet
fdc56baa42
fix 22294 (#2151) 2020-05-18 21:05:10 -07:00
Nick Terrell
b2092c6dc4 [ldm] Reset loadedDictEnd when the context is reset 2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
W. Felix Handte
19a0955ec9 Add ZSTD_cwksp_alloc_size() to Help Calculate Needed Workspace Size 2019-10-10 13:40:16 -04:00
Nick Terrell
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size (#1678)
* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Yann Collet
be9e561da4 changed ZSTD_c_compressionStrategy into ZSTD_c_strategy
also : fixed paramgrill, and limit conditions
2018-12-06 15:00:52 -08:00
Yann Collet
41c7d0b1e1 changed hashEveryLog into hashRateLog 2018-11-21 14:36:57 -08:00
Yann Collet
e874dacc08 changed searchLength into minMatch
refactored all relevant API and calls
for consistency.
2018-11-20 14:56:07 -08:00
Nick Terrell
b9693d3a49 [lib] Add rsyncable mode
- Add rsyncable mode to multithreaded mode
- Factor out LDM's hash function for reuse
2018-11-14 16:59:57 -08:00
W. Felix Handte
50cc1cf4d5 Remove CParams Arg from ZSTD_ldm_blockCompress 2018-09-28 17:12:53 -07:00
W. Felix Handte
dcdf437fed Also Remove CParams from Table Filling Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte
6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00