The main changes are:
Unrolling the inner rounds of salsa20_wordtobyte which doubles the speed.
Passing the slice explicitly instead of returning the array saves a copy (can optimize out in future with copy elision) and gives ~10% improvement.
Inlining the outer loop gives ~15-20% improvement but it costs an extra 4Kb of code space. I think the tradeoff is worthwhile here.
The other inline loops are small and can be done by the compiler if it is worthwhile.
The rotate function replacement doesn't alter the performance from the former.
The modified throughput test I've used to benchmark is as follows. Interestingly we need to allocate memory instead of using a fixed buffer else Zig optimizes the whole thing out.
https://github.com/ziglang/zig/pull/1369#issuecomment-416456628
* Handle unions differently in std.fmt
Print the active tag's value in tagged unions. Untagged unions considered unsafe to print and treated like a pointer or an array.
Caused by struct printing behavior. Enums are different enough from structs and unions that the field iteration behavior doesn't do what we want even if @memberName didn't error on enums.
...rather than trying to find the executable on the file system.
Also use a more robust PIE offset calculation based on the
available metadata.
And for the last function, use the data that tells the end
rather than assuming 4K.
Also they print in a consistent way with Linux stack traces.
* error.BadFd is not a valid error code. it would always be a bug to
get this error code.
* merge error.Io with existing error.InputOutput
* merge error.PathNotFound with existing error.FileNotFound.
Not all OS's support both.
* add os.File.openReadC
* add error.BadPathName for windows file operations with invalid
characters
* add os.toPosixPath to help stack allocate a null terminating byte
* add some TODOs for other functions to investigate removing the
allocator requirement
* optimize some implementations to use the alternate functions when
a null byte is already available
* add a missing error.SkipZigTest
* os.selfExePath uses a non-allocating API
* os.selfExeDirPath uses a non-allocating API
* os.path.real uses a non-allocating API
* add os.path.realAlloc and os.path.realC
* convert many windows syscalls to use the W versions (See #534)
This is identical to `mem.set(u8, slice, 0)` except that it will never
be optimized out by the compiler. Intended usage is for clearing
secret data.
The resulting assembly has been manually verified in --release-* modes.
It would be valuable to test the 'never be optimized out' claim in tests
but this is harder than initially expected due to how much Zig appears
to know locally. May be doable with @intToPtr, @ptrToInt to get around
known data dependencies but I could not work it out right now.
If fmt was called on with a [*]u8 or [*]const u8 argument, but the fmt string did not specify 's' to treat it as a string, it produced a compile error due to accessing index 1 of a 0 length slice.