Commit Graph

624 Commits (3aec385a1006be4d44ad3361eb90a8f34b4678a0)

Author SHA1 Message Date
Carl Woffenden 6213b7b3b4 Minor repetition 2019-08-27 16:57:23 +02:00
Carl Woffenden 59052d5fd8 Typo 2019-08-27 16:55:03 +02:00
Carl Woffenden ec12721538 Added clarification 2019-08-27 15:53:26 +02:00
Carl Woffenden 6712a644fa Added reasoning 2019-08-27 15:51:14 +02:00
Carl Woffenden 4f2a8b752a Typo 2019-08-27 15:38:34 +02:00
Carl Woffenden a57de4ac89 Added test script; tidied and documented
The test script combines the sources then builds and runs an example. A futher example is built if the Emscripten compiler is available on the system. Documentation covers building.
2019-08-27 15:36:06 +02:00
Carl Woffenden 7c6fa81579 Added Emscripten example, removed Buck, minor tidy
Work-in-progress. Added simple Emscripten WebGL example that adds 25kB when build with Zstd. Removed Buck (will replace). Minor correctness.
2019-08-26 21:28:19 +02:00
Carl Woffenden ea8f6d2a07 Able to test combine script; minor tidy 2019-08-26 07:48:57 +02:00
Carl Woffenden d760e35ebc Preparing to run tests
Combine script more robust and can output to a specified file. Initial buck files added (work in progress).
2019-08-25 22:49:01 +02:00
Carl Woffenden 36a59336da Minor fix for files with spaces. Typo. 2019-08-23 23:09:13 +02:00
Carl Woffenden 0a49353a46 Added generator script and simple test
The script will combine decompressor sources into a single file. The example shows this in use.
2019-08-23 18:43:29 +02:00
Felix Handte 2314906b68
Merge pull request #1699 from felixhandte/seekable-gitignore
Add New Seekable Compression Example to .gitignore
2019-07-24 19:07:55 -04:00
Yann Collet 0d38ee3c30
Merge pull request #1690 from piguin/dev
fix compiling errors with clang-8
2019-07-24 15:37:05 -07:00
W. Felix Handte 15da57820d Add New Seekable Compression Example to .gitignore 2019-07-24 18:22:20 -04:00
Sean Purcell 671d533ea7 Fix seekable decompression in-memory api 2019-07-21 23:22:25 -04:00
Qin Li 04a9d6b828 fix compiling errors with clang-8
Compiling with clang-8 fails with the following errors:

largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
        U64 const dTime_ns = result.nanoSecPerRun;
                  ~~~~~~~~   ~~~~~~~^~~~~~~~~~~~~

zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 1 means that cover parameters were correct
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 0 in case of malformed parameters
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2019-07-18 19:41:00 -07:00
Josh Soref a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Yann Collet 59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
Peter (Stig) Edwards 4a9e0502e6
-Wformat-security not needed with -Wformat=2 2019-02-01 09:29:08 +00:00
Peter (Stig) Edwards 2b7120ec71
-Wformat-security not needed with -Wformat=2 2019-02-01 09:28:41 +00:00
Dmitry V. Levin 8b2210411a contrib/pzstd/Makefile: fix build of tests
Apparently, Options.o cannot be linked in without $(PROGDIR)/util.o
2018-12-28 19:02:22 +00:00
Yann Collet ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet 34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00
Yann Collet 9c3265a53f
Merge pull request #1417 from facebook/advancedAPI
Advanced API
2018-12-10 18:48:15 -08:00
Yann Collet 3583d19c4e changed parameter names from ZSTD_p_* to ZSTD_c_*
for naming consistency
2018-12-05 17:26:02 -08:00
Lzu Tao beb13bd87e Move contrib/meson to build/meson 2018-12-01 23:18:59 +07:00
Lzu Tao c0e71cae55 Add enable_lz4 build option and fix lzma dependency 2018-12-01 23:18:59 +07:00
Lzu Tao 5c4965c351 Add pedantic flag 2018-12-01 23:18:59 +07:00
Lzu Tao 6f3f1a8d3a No install zstd_manual.html 2018-12-01 23:18:59 +07:00
Lzu Tao f660825d9f Install missed zstdgrep and zstdless 2018-12-01 23:18:59 +07:00
Lzu Tao 3f27e2a072 Install zstdmt.1 manpage [skip ci] 2018-12-01 23:18:59 +07:00
Lzu Tao d3134a3ed3 Rename meson variables 2018-12-01 23:18:59 +07:00
Lzu Tao 1985e427c7 Add manpage install warning [skip ci]
We link new manpages with gz compressed format of the target manpage.
I have not tested it on Windows. So just place a warning here.
2018-12-01 23:18:59 +07:00
Lzu Tao 9c862c6a53 Fix manpage symlinks [skip ci] 2018-12-01 23:18:59 +07:00
Lzu Tao d79df2a370 Apply new InstallSymlink script 2018-12-01 23:18:59 +07:00
Lzu Tao ef2e761937 Helper script to install symlink in meson 2018-12-01 23:18:59 +07:00
Lzu Tao 3175188407 No need these helpers 2018-12-01 23:18:59 +07:00
Lzu Tao 337f914dc8 Fix lib soversion and no install cover.h header 2018-12-01 23:18:59 +07:00
Lzu Tao c9f0144302 Fix meson tests build 2018-12-01 23:18:59 +07:00
Lzu Tao 5a36a57cf5 Bump to 1.3.8 and fix run_command function
The run_command is run from an unspecified directory. Therefore we cannot assume
which directory it is running our command.
2018-12-01 23:18:59 +07:00
Lzu Tao 8a160680d1 Update legacy support to 5 2018-12-01 23:18:59 +07:00
Lzu Tao f727808731 Minor fix for meson build
Use files function instead of constructing path with meson.current_source_dir()
2018-12-01 23:18:59 +07:00
Lzu Tao 9a721e5216 Update meson build system
NOTE: This commit only tested on Linux (Ubuntu 18.04). Windows
build may not work as expected.

* Use meson >= 0.47.0 cause we use install_man function
* Add three helper Python script:
  * CopyFile.py: To copy file
  * CreateSymlink.py: To make symlink (both Windows and Unix)
  * GetZstdLibraryVersion.py: Parse lib/zstd.h to get zstd version
  These help emulating equivalent functions in CMake and Makefile.
* Use subdir from meson to split meson.build
  * Add contrib build
  * Fix other build
* Add new build options
  * build_programs: Enable programs build
  * build_contrib: Enable contrib build
  * build_tests: Enable tests build
  * use_static_runtime: Link to static run-time libraries on MSVC
  * zlib_support: Enable zlib support
  * lzma_support: Enable lzma support
2018-11-28 01:08:34 +07:00
Lzu Tao 9bd8f6a00c Rename and update build instruction in README file to README.md 2018-11-28 01:08:34 +07:00
Lzu Tao 2abd5139a5 Add meson build guide 2018-11-28 01:08:34 +07:00
Yann Collet 5adbad4059 Merge branch 'dev' into advancedAPI 2018-11-14 13:00:37 -08:00
Yann Collet b83d1e7714 removed some `static const` variables
and replaced by traditional macro constants.

Unfortunately, C doesn't consider `static const` to mean "constant"
2018-11-13 16:56:32 -08:00
Yann Collet b830ccca5c changed benchfn api
to use structure for function parameters
as it expresses much clearer than a long list of parameters,
since each parameter can now be named.
2018-11-13 13:12:50 -08:00
Yann Collet d38063f8ae separated bench module into benchfn and benchzstd
it shall be possible to use benchfn
without any dependency on zstd.
2018-11-13 11:01:59 -08:00
Yann Collet 483759a3de Improves decompression speed when using cold dictionary
by triggering the prefetching decoder path
(which used to be dedicated to long-range offsets only).

Figures on my laptop :
no content prefetch : ~300 MB/s (for reference)
full content prefetch : ~325 MB/s (before this patch)
new prefetch path : ~375 MB/s (after this patch)

The benchmark speed is already significant,
but another side-effect is that this version
prefetch less data into memory,
since it only prefetches what's needed, instead of the full dictionary.

This is supposed to help highly active environments
such as active databases,
that can't be properly measured in benchmark environment (too clean).

Also :
fixed the largeNbDict test program
which was working improperly when setting nbBlocks > nbFiles.
2018-11-08 17:00:23 -08:00
Rohit Jain 705e0b18ab Making changes to make it compile on my laptop 2018-10-11 15:51:57 -07:00
Yann Collet 123fac6b6d fix pzstd compatibility with mingw
some details changed with introduction of gcc7
2018-09-21 17:36:00 -07:00
Yann Collet 00ce26725b
Merge pull request #1324 from ko-zu/fixclangcode
Fix largeNbDicts bench for clangbuild
2018-09-17 14:10:17 -07:00
Nick Terrell 8f27e8cf3d
Merge pull request #1322 from azat-archive/seekable-fixes-pull
Fixes read write past end of input buffer.
2018-09-17 11:04:51 -07:00
ko-zu b053bec2f4 Fix largeNbDicts bench for clangbuild
Remove unsigned to size_t promotion to fix implicit down conversion errors in clangbuild target.
2018-09-17 13:09:08 +09:00
Azat Khuzhin d707692e05
seekable_decompression: support offset greater then UNIT_MAX 2018-09-16 18:05:32 +03:00
Azat Khuzhin b52867a97f
zstdseek_decompress: fix decompression with data left in input buffer 2018-09-16 18:05:32 +03:00
Yann Collet c49ccbc8e7 largeNbDicts : can select a nb of blocks
will automatically truncate or repeat input as needed,
to create the requested nb of blocks.
default: nb of files, eventually increased appropriately if blockSize is set
2018-09-12 11:31:28 -07:00
Yann Collet 50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Yann Collet c57a856d64 fixed minor static analyzer warning 2018-09-05 14:33:51 -07:00
Yann Collet 1d487d587f updated documentation 2018-09-04 14:57:45 -07:00
Yann Collet 11b8b8c100 silenced false-positive scan-build warning 2018-08-31 10:01:06 -07:00
Yann Collet 0ff67511e6 fixed link order for old compilers 2018-08-30 16:43:28 -07:00
Yann Collet f76253bb70 minor : createDictionaryBuffer() can create dictionaries of different sizes 2018-08-30 16:24:44 -07:00
Yann Collet 39c55a118f fixed minor compatibility issues with older compilers 2018-08-30 16:00:57 -07:00
Yann Collet 39ef91a599 -std=c99 for largeNbDicts 2018-08-30 14:59:23 -07:00
Yann Collet 4086b2871b largeNbDicts compatible with multiple source files
splitting is disabled by default, but can be re-enabled using usual command -B#
update commands to look like zstd ones
2018-08-30 14:38:49 -07:00
Yann Collet a5a77965d3 make all includes contrib/largeNbDicts 2018-08-29 16:17:22 -07:00
Yann Collet d89fa814c1 added a README
for documentation
2018-08-28 18:19:19 -07:00
Yann Collet 6444c50035 increases randomness of ddict ptrs 2018-08-28 18:13:46 -07:00
Yann Collet 6c398df241 level, block size and nb dicts can be set on command line 2018-08-28 18:05:31 -07:00
Yann Collet 0c66a44d1b first working test program
measures :
- compression ratio with / without dictionary
- create one dictionary per block
- memory budget for dictionaries
- decompression speed, using one different dictionary per block

current limitations :
- only one file
- 4K blocks only
- automatic dictionary built with 4K size

dictionary can be selected on command line, with -D
2018-08-28 15:47:07 -07:00
Yann Collet 274b60e6e6 largeNbDicts can compress and compare dict vs noDict 2018-08-27 17:08:44 -07:00
Yann Collet 6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet 36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet 42a02ab745 fixed minor warnings issued by scan-build 2018-08-15 14:36:02 -07:00
Jennifer Liu 0acb0abd1e Add non-optimize FASTCOVER (#1260)
* Add non-optimize FASTCOVER

* Minor fix

* Pass param as value instead of pointer
2018-08-01 11:06:16 -07:00
Jennifer Liu 4e29bc2469 Use CDict instead of CCtx in analyzeEntropy 2018-07-31 10:36:45 -07:00
Jennifer Liu 31229e527b Increment frequency for every dmer occurence within same sample instead of at most once per sample 2018-07-30 12:54:22 -07:00
Jennifer Liu 51b109c1b5 Delete old benchmarking result 2018-07-27 17:31:33 -07:00
Jennifer Liu 53ef22a4bc Undo deleting clean in make 2018-07-27 16:56:50 -07:00
Jennifer Liu 96d84ee235 Revert test.sh 2018-07-27 16:54:05 -07:00
Jennifer Liu 61262f6c0d Save segmentFreqs in ctx instead of malloc and memset in SelectSegment 2018-07-27 16:51:38 -07:00
Jennifer Liu 49b398e93f Use same param after optimizing cover and fastCover and record k and d for benchmarking 2018-07-27 13:39:19 -07:00
Jennifer Liu 759c543312 Rerun cover and fastCover with optimized values 2018-07-26 19:03:01 -07:00
Jennifer Liu 3d7941ce41 Benchmark different f values 2018-07-26 16:24:13 -07:00
Jennifer Liu 3b163e0b5b Add array to keep track of frequency within active segment, fix malloc bug, update benchmarking result 2018-07-26 13:53:13 -07:00
Jennifer Liu 2333ecb173 Allow d=6 2018-07-25 18:10:09 -07:00
Jennifer Liu 1e85f314d8 Benchmark fast cover optimize vs k=200 2018-07-25 17:53:38 -07:00
Jennifer Liu d1fc507ef9 Initial benchmarking result for fastCover 2018-07-25 17:05:54 -07:00
Jennifer Liu f5407e398a Make hash value const 2018-07-25 16:54:08 -07:00
Jennifer Liu 7f3f70f766 Add Fast Cover Dictionary Builder 2018-07-25 16:34:07 -07:00
Nick Terrell 77068a8447
Merge pull request #1246 from jennifermliu/benchmark
Benchmark dictionary builders
2018-07-20 18:09:31 -07:00
Jennifer Liu b6c5d4982c Minor fix 2018-07-20 17:41:22 -07:00
Jennifer Liu 71e767ac09 Refactoring and benchmark without dictionary 2018-07-20 17:03:47 -07:00
Jennifer Liu 470c8d42f4 Benchmark dictionary builders 2018-07-20 11:32:39 -07:00
Nick Terrell 4d1ad5cdb2
Merge pull request #1238 from jennifermliu/random
Add random dictionary builder
2018-07-19 13:52:15 -07:00
Jennifer Liu 0c5eaef248 Update Makefile 2018-07-19 13:44:27 -07:00
Jennifer Liu 5bb46a898e Rename cleanup 2018-07-18 12:15:49 -07:00