55 Commits

Author SHA1 Message Date
Frank Denis
f540dc1b7e cache_hash: hash function change
This makes the `cache_hash` hash function easier to replace.

BLAKE3 would be a natural fit for hashing large files, but:
- second preimage resistance is not necessary for the cache_hash use cases
- our BLAKE3 implementation is currently very slow

Switch to SipHash128, which gives us an immediate speed boost.
2020-08-22 15:10:33 -04:00
Frank Denis
fc55cd458a Hash functions now accept an option set
- This avoids having multiple `init()` functions for every combination
of optional parameters
- The API is consistent across all hash functions
- New options can be added later without breaking existing applications.
  For example, this is going to come in handy if we implement parallelization
  for BLAKE2 and BLAKE3.
- We don't have a mix of snake_case and camelCase functions any more, at
least in the public crypto API

Support for BLAKE2 salt and personalization (more commonly called context)
parameters have been implemented by the way to illustrate this.
2020-08-21 00:51:14 +02:00
Frank Denis
6f9ea9eaef Breaking: sort std/crypto functions into categories
Instead of having all primitives and constructions share the same namespace,
they are now organized by category and function family.

Types within the same category are expected to share the exact same API.
2020-08-20 23:02:05 +02:00
Andrew Kelley
4a69b11e74 add license header to all std lib files
add SPDX license identifier
copyright ownership is zig contributors
2020-08-20 16:07:04 -04:00
Andrew Kelley
e2c741f1e7 std.cache_hash: additionally use file size to detect modifications
I have observed on Linux writing and reading the same file many times
without the mtime changing, despite the file system having nanosecond
granularity (and about 1 millisecond worth of nanoseconds passing between
modifications). I am calling this a Linux Kernel Bug and adding file
size to the cache hash manifest as a mitigation. As evidence, macOS does
not exhibit this behavior.

This means it is possible, on Linux, for a file to be added to the cache
hash, and, if it is updated with the same file size, same inode, within
about 1 millisecond, the cache system will give us a false positive,
saying it is unmodified. I don't see any way to improve this situation
without fixing the bug in the Linux kernel.

closes #6082
2020-08-18 12:44:00 -07:00
Andrew Kelley
c0517bf1f6 std.cache_hash: temporary workaround for mtime precision on linux
See #6082
2020-08-18 01:30:57 -07:00
Andrew Kelley
ce8b9c0c5c std.cache_hash: don't trust mtime granularity to be better than 1ms
I empirically observed mtime not changing when rapidly writing the same
file name within the same millisecond of wall clock time, despite the
mtime field having nanosecond precision.

I believe this fixes the CI test failures.
2020-08-17 21:26:33 -07:00
Andrew Kelley
a916f63940 std.cache_hash: fix bug parsing inode
This resulted in false negatives cache misses.
2020-08-17 18:49:33 -07:00
Vexu
e85fe13e44
run zig fmt on std lib and self hosted 2020-07-11 20:41:19 +03:00
joachimschmidt557
0ae1157e45 std.mem.dupe is deprecated, move all references in std
Replaced all occurences of std.mem.dupe in stdlib with
Allocator.dupe/std.mem.dupeZ -> Allocator.dupeZ
2020-07-04 21:40:06 +03:00
Andrew Kelley
a83aab5209 fix std lib tests for WASI 2020-05-25 19:46:28 -04:00
Andrew Kelley
cda102be02 improvements to self-hosted cache hash system
* change miscellaneous things to more idiomatic zig style
 * change the digest length to 24 bytes instead of 48. This is
   still 70  more bits than UUIDs. For an analysis of probability of
   collisions, see:
   https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
 * fix the API having the possibility of mismatched allocators
 * fix some error paths to behave properly
 * modify the guarantees about when file contents are loaded for input files
 * pwrite instead of seek + write
 * implement isProblematicTimestamp
 * fix tests with regards to a working isProblematicTimestamp function.
   this requires sleeping until the current timestamp becomes
   unproblematic.
 * introduce std.fs.File.INode, a cross platform type abstraction
   so that cache hash implementation does not need to reach into std.os.
2020-05-25 19:29:03 -04:00
LeRoyce Pearson
72716ecc3a Fix improper initialization of CacheHashFiles 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
c3c332c9ec Add max_file_size argument 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
5a1c6a3627 Set manifest's maximum size to Andrew's recommendation 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
be69e8e871 Remove non-null assertion in CacheHash.release()
People using the API as intended would never trigger this assertion
anyway, but if someone has a non standard use case, I see no reason
to make the program panic.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
e6a37ed941 Change null pointer test to addFilePost test 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
42007307be Make if statement more idiomatic 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
2c59f95e87 Don't use iterate when opening manifest directory 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
1ffff1bb18 Add test case for fix in previous commit 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
0fa89dc51d Fix read from null pointer in CacheHash.hit
It occured when the manifest file was manually edited to include an extra
file. Now it will simply copy the file name in the manifest file
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
f13c67bcfe Return an index from CacheHash.addFile
This makes it possible for the user to retrieve the contents of the
file without running into data races.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
4d62d97076 Update code using deprecated ArrayList APIs 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
b429f4607c Make addFilePost* functions' documentation more clear 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
e7657f2938 Make CacheHash.release return an error
If a user doesn't care that the manifest failed to be written, they can
simply ignore it. The program will still work; that particular cache
item will simply not be cached.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
967b9825a7 Add "no file inputs" test
It checks whether the cache will respond correctly to inputs that don't
initially depend on filesystem state. In that case, we have to check
for the existence of a manifest file, instead of relying on reading the
list of entries to tell us if the cache is invalid.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
4254d389d3 Add test checking file changes invalidate cache 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
b67a9f2281 Switch to using testing.expect* in tests 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
d457919ff5 Make CacheHash cleanup consistent (always call release)
Instead of releasing the manifest file when an error occurs, it is
only released when when `CacheHash.release` is called. This maps better
to what a zig user expects when they do `defer cache_hash.release()`.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
05edfe983c Add addFilePost and addFilePostFetch functions 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
d770dae1b8 Add documentation to CacheHash API 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
67d6432d10 Check for problematic timestamps 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
73d2747084 Open file with exclusive lock 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
204aa7dc1a Remove unnecessary contents field from File
It was causing a segfault on `mipsel` architecture, not sure why other
architectures weren't affected.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
c88ece3679 Remove error union from CacheHash.final 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
dfb53beb52 Check if inode matches inode from manifest 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
7917f25b0a Update cache_hash to zig master 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
16c5499098 Remove up files created in test at end of test 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
fffd59e6c4 Remove file handle from CacheHash
A file handle is not the same thing as an inode index number.
Eventually the inode will be checked as well, but there needs to be
a way to get the inode in `std` first.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
21d7430696 Replace ArrayList in write_manifest with an array 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
061c1fd9ab Use readAllAlloc
Now that the memory leak mentioned in #4656 has been fixed.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
4f709d224a Make hash digest same size as in the c API 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
8e4b80522f Return base64 digest instead of using an out variable 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
e75a6e5144 Rename cache_file -> addFile 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
50cbf1f3aa Add slice and array support to add method 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
fde188aadc Make type specific add functions
Basically, move type specific code into their own functions instead
of making `add` a giant function responsible for everything.
2020-05-25 13:48:43 -04:00
LeRoyce Pearson
4173dbdce9 Rename cache functions to add 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
27bf1f781b Store fs.Dir instead of path to dir 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
55a3925ab7 Rename CacheHashFile -> File 2020-05-25 13:48:43 -04:00
LeRoyce Pearson
c8062321b3 Use fs.File 2020-05-25 13:48:43 -04:00