Commit Graph

91 Commits (af23d39eb86820b3bb7c879288b7d7b858e59f66)

Author SHA1 Message Date
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet 6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Jennifer Liu f5228f2c44 Refactoring 2018-07-31 13:58:54 -07:00
Jennifer Liu 4e29bc2469 Use CDict instead of CCtx in analyzeEntropy 2018-07-31 10:36:45 -07:00
Yann Collet fa41bcc2c2 grouped debug functions into debug.h
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.

There is now a single implementation, shared by both.

Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
2018-06-13 15:43:09 -04:00
Nick Terrell 569e2abccd Allow negative compression levels in training
* Set `dictCLevel` in `zstdcli.c`.
* Only set to default level if the compression level `== 0`, not `<= 0`.
2018-04-09 12:12:03 -07:00
Yann Collet 752bae4a48 added warning message
when pathological dataset is detected
(note : cover_optimize needs -v to display the warning)
2018-01-11 11:29:28 -08:00
Yann Collet e8093dde09 fixed #304
Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
2018-01-11 11:16:32 -08:00
Yann Collet 218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet 77c137b3ae minor comment refactor 2017-09-14 15:12:57 -07:00
Yann Collet 3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Nick Terrell 376f435914 [dictBuilder] Set default compression level to 3 2017-08-24 16:21:05 -07:00
Yann Collet 32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Yann Collet 2bd6440be0 pinned down error code enum values
Note : all error codes are changed by this new version,
but it's expected to be the last change for existing codes.

Codes are now grouped by category, and receive a manually attributed value.
The objective is to guarantee that
error code values will not change in the future
when introducing new codes.
Intentionnal empty spaces and ranges are defined
in order to keep room for potential new codes.
2017-07-13 17:12:16 -07:00
Yann Collet 590937df20 Merge pull request #739 from facebook/refPrefix
ZSTD_refPrefix
2017-06-29 04:36:03 -07:00
Yann Collet 7d3816183f exposed ZSTD_MAGIC_DICTIONARY in zstd.h
makes it easier to explain ZSTD_dictMode
2017-06-27 13:50:34 -07:00
Nick Terrell 5b7fd7c422 [zdict] Make COVER the default algorithm 2017-06-26 21:09:22 -07:00
Yann Collet fa3671eac7 changed ZSTD_BLOCKSIZE_ABSOLUTEMAX into ZSTD_BLOCKSIZE_MAX
Also :
change ZSTD_getBlockSizeMax() into ZSTD_getBlockSize()
created ZSTD_BLOCKSIZELOG_MAX
2017-05-19 10:51:30 -07:00
Nick Terrell 5152fb2cb2 Convert all tabs to spaces 2017-03-29 18:51:58 -07:00
Yann Collet 4cf0093571 restored bonus rule 2017-03-26 14:51:00 -07:00
Yann Collet 69017bf253 Merge branch 'dev' into LegacyDictBuilder 2017-03-26 14:39:13 -07:00
Yann Collet 582760818f minor refactor
add const
changed if for easier to add new conditions
2017-03-26 03:04:56 -07:00
Yann Collet 858f72eeb8 fixed dictBuilder issue
dictionary loading would fail during entropy analysis
2017-03-26 02:50:00 -07:00
Yann Collet ecee9f2ef8 fixed conversion warnings 2017-03-26 00:59:14 -07:00
Yann Collet 4c41d37fcc changed test for new syntax
--dictID= and --maxdict=
2017-03-24 18:36:56 -07:00
Yann Collet d41f707e88 minor improvement : remove duplicates with 1 char prefix difference 2017-03-24 17:56:45 -07:00
Yann Collet 96aa3019b2 changed advanced commands --maxdict= and --dictID=
now works with the `=` variant, which is the recommended one.
Old variant `--dictID #` still works, for compatibility with existing scripts.
Long term objective is to remove the old variant..
2017-03-24 16:04:29 -07:00
Yann Collet 9da3b215ec Ensure all limits derived from same constants
Now uses ZDICT_DICTSIZE_MIN and ZDICT_CONTENTSIZE_MIN
from zdict.h.

Also : reduced values to 256 and 128 respectively
2017-03-24 15:02:09 -07:00
Yann Collet f332ece468 dictBuilder fails to create dictionary on certain input
Properly expressed with an error code (see zstd_errors.h)
and a cli return code != 0
2017-03-23 16:24:02 -07:00
Sean Purcell 042ba122ae Change g_displayLevel to int and fix DISPLAYUPDATE flush 2017-03-23 11:21:59 -07:00
Yann Collet aca113f4f5 fixed ZSTD_sizeof_?Dict() 2016-12-23 22:25:03 +01:00
Nick Terrell bcbe77e994 ZDICT_finalizeDictionary() flipped comparison
`ZDICT_finalizeDictionary()` had a flipped comparison.
I also allowed `dictBufferCapacity == dictContentSize`.
It might be the case that the user wants to fill the dictionary
completely up, and then let zstd take exactly the space it needs
for the entropy tables.
2016-12-22 18:01:14 -08:00
Yann Collet d76d1a9ef0 added ZDICT_finalizeDictionary() 2016-12-22 20:18:43 +01:00
Yann Collet 0819abe3c1 added ZSTD_createDDict_byReference() body 2016-12-21 19:25:15 +01:00
Yann Collet 1496c3dc47 Fix : size estimation when some samples are very large 2016-12-18 11:58:23 +01:00
Yann Collet d46ecb58a5 added dll compilation tests 2016-12-17 16:28:12 +01:00
Yann Collet 0a5a5fb7fd Fix #418 : printing selected segments in zdict debug mode can segfault with certain pathological patterns 2016-11-02 13:57:55 -07:00
Yann Collet 52c1bf93fe improved dicitonary segment merge 2016-10-18 16:34:58 -07:00
Yann Collet 2b361cf2f1 minor opt 2016-10-14 16:09:07 -07:00
Yann Collet df6797447f update dictionary builder warning comments 2016-09-27 15:14:32 +02:00
Yann Collet 47094ea66b added comment on filePos 2016-09-26 18:03:33 +02:00
Yann Collet 97b378a6f8 Streaming : dictionary compression on multiple files / segments can correctly provide srcSize into header (when provided) using pledgedSrcSize. 2016-09-21 17:20:19 +02:00
Yann Collet d56dbc02d3 removed g_displayLevel 2016-09-02 17:28:41 -07:00
Yann Collet 855766d73d clarified dictionary in format description 2016-09-02 17:04:49 -07:00
Yann Collet d725427a3c g_time => local displayTime 2016-09-02 15:32:39 -07:00
Yann Collet 4ded9e591c added boilerplate 2016-08-30 11:06:28 -07:00
Yann Collet 3b15f1f10f minor refactor 2016-08-30 09:58:50 -07:00
Yann Collet da3fbcb302 Added ZDICT_getDictID() 2016-08-19 14:23:58 +02:00
Yann Collet 49d105cfcf better warning and error messages in case of dictionary training failure (#292) 2016-08-18 15:02:11 +02:00