zstd/contrib/experimental_dict_builders/fastCover
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
..
Makefile Undo deleting clean in make 2018-07-27 16:56:50 -07:00
README.md Add non-optimize FASTCOVER (#1260) 2018-08-01 11:06:16 -07:00
fastCover.c Merge fastCover into DictBuilder (#1274) 2018-08-23 12:06:20 -07:00
fastCover.h Add non-optimize FASTCOVER (#1260) 2018-08-01 11:06:16 -07:00
main.c Add non-optimize FASTCOVER (#1260) 2018-08-01 11:06:16 -07:00
test.sh Add non-optimize FASTCOVER (#1260) 2018-08-01 11:06:16 -07:00

README.md

FastCover Dictionary Builder

Permitted Arguments:

Input File/Directory (in=fileName): required; file/directory used to build dictionary; if directory, will operate recursively for files inside directory; can include multiple files/directories, each following "in=" Output Dictionary (out=dictName): if not provided, default to fastCoverDict Dictionary ID (dictID=#): nonnegative number; if not provided, default to 0 Maximum Dictionary Size (maxdict=#): positive number; in bytes, if not provided, default to 110KB Size of Selected Segment (k=#): positive number; in bytes; if not provided, default to 200 Size of Dmer (d=#): either 6 or 8; if not provided, default to 8 Number of steps (steps=#): positive number, if not provided, default to 32 Percentage of samples used for training(split=#): positive number; if not provided, default to 100

###Running Test: make test

###Usage: To build a FASTCOVER dictionary with the provided arguments: make ARG= followed by arguments If k or d is not provided, the optimize version of FASTCOVER is run.

Examples:

make ARG="in=../../../lib/dictBuilder out=dict100 dictID=520" make ARG="in=../../../lib/dictBuilder in=../../../lib/compress"