This makes the `cache_hash` hash function easier to replace.
BLAKE3 would be a natural fit for hashing large files, but:
- second preimage resistance is not necessary for the cache_hash use cases
- our BLAKE3 implementation is currently very slow
Switch to SipHash128, which gives us an immediate speed boost.
- This avoids having multiple `init()` functions for every combination
of optional parameters
- The API is consistent across all hash functions
- New options can be added later without breaking existing applications.
For example, this is going to come in handy if we implement parallelization
for BLAKE2 and BLAKE3.
- We don't have a mix of snake_case and camelCase functions any more, at
least in the public crypto API
Support for BLAKE2 salt and personalization (more commonly called context)
parameters have been implemented by the way to illustrate this.
Instead of having all primitives and constructions share the same namespace,
they are now organized by category and function family.
Types within the same category are expected to share the exact same API.
I have observed on Linux writing and reading the same file many times
without the mtime changing, despite the file system having nanosecond
granularity (and about 1 millisecond worth of nanoseconds passing between
modifications). I am calling this a Linux Kernel Bug and adding file
size to the cache hash manifest as a mitigation. As evidence, macOS does
not exhibit this behavior.
This means it is possible, on Linux, for a file to be added to the cache
hash, and, if it is updated with the same file size, same inode, within
about 1 millisecond, the cache system will give us a false positive,
saying it is unmodified. I don't see any way to improve this situation
without fixing the bug in the Linux kernel.
closes#6082
I empirically observed mtime not changing when rapidly writing the same
file name within the same millisecond of wall clock time, despite the
mtime field having nanosecond precision.
I believe this fixes the CI test failures.
* change miscellaneous things to more idiomatic zig style
* change the digest length to 24 bytes instead of 48. This is
still 70 more bits than UUIDs. For an analysis of probability of
collisions, see:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions
* fix the API having the possibility of mismatched allocators
* fix some error paths to behave properly
* modify the guarantees about when file contents are loaded for input files
* pwrite instead of seek + write
* implement isProblematicTimestamp
* fix tests with regards to a working isProblematicTimestamp function.
this requires sleeping until the current timestamp becomes
unproblematic.
* introduce std.fs.File.INode, a cross platform type abstraction
so that cache hash implementation does not need to reach into std.os.
People using the API as intended would never trigger this assertion
anyway, but if someone has a non standard use case, I see no reason
to make the program panic.
If a user doesn't care that the manifest failed to be written, they can
simply ignore it. The program will still work; that particular cache
item will simply not be cached.
It checks whether the cache will respond correctly to inputs that don't
initially depend on filesystem state. In that case, we have to check
for the existence of a manifest file, instead of relying on reading the
list of entries to tell us if the cache is invalid.
Instead of releasing the manifest file when an error occurs, it is
only released when when `CacheHash.release` is called. This maps better
to what a zig user expects when they do `defer cache_hash.release()`.
A file handle is not the same thing as an inode index number.
Eventually the inode will be checked as well, but there needs to be
a way to get the inode in `std` first.