To prevent cache misses, token ids go in their own array, and the
start/end offsets go in a different one.
perf measurement before:
2,667,914 cache-misses:u
2,139,139,935 instructions:u
894,167,331 cycles:u
perf measurement after:
1,757,723 cache-misses:u
2,069,932,298 instructions:u
858,105,570 cycles:u
The DocComment AST node now only points to the first doc comment token.
API users are expected to iterate over the following tokens directly.
After this commit there are no more linked lists in use in the
self-hosted AST API.
Performance impact is negligible. Memory usage slightly reduced.
* Extract Call ast node tag out of SuffixOp; parameters go in memory
after Call.
* Demote AsmInput and AsmOutput from AST nodes to structs inside the
Asm node.
* The following ast nodes get their sub-node lists directly following
them in memory:
- ErrorSetDecl
- Switch
- BuiltinCall
* ast.Node.Asm gets slices for inputs, outputs, clobbers instead of
singly linked lists
Performance changes:
throughput: 72.7 MiB/s => 74.0 MiB/s
maxrss: 72 KB => 69 KB (nice)
block statements are now directly following the Block AST node rather
than a singly linked list. This had negligible impact on performance:
throughput: 72.3 MiB/s => 72.7 MiB/s
however it greatly improves the API since the statements are laid out in
a flat array in memory.
These SuffixOp nodes have their own ast.Node tags now:
* ArrayInitializer
* ArrayInitializerDot
* StructInitializer
* StructInitializerDot
Their sub-expression lists are general-purpose-allocator allocated
and then copied into the arena after completion of parsing.
throughput: 72.9 MiB/s => 74.4 MiB/s
maxrss: 68 KB => 72 KB
The API is also nicer since the sub expression lists are now flat arrays
instead of singly linked lists.
Instead of being its own node, it's a struct inside FnProto.
Instead of FnProto having a SinglyLinkedList of ParamDecl nodes,
ParamDecls are appended directly in memory after the FnProto.
throughput: 72.2 MiB/s => 72.9 MiB/s
maxrss: 70 KB => 68 KB
Importantly, the API is improved as well since the data is arranged
linearly in memory.
This makes fields and decl ast nodes part of the Root and ContainerDecl
AST nodes.
Surprisingly, it's a performance regression from using a singly-linked
list for these nodes:
throughput: 76.5 MiB/s => 69.4 MiB/s
However it has much better memory usage:
maxrss: 392 KB => 77 KB
It's also better API for consumers of the parser, since it is a flat
list in memory.
std.ast uses a singly linked list for lists of things. This is a
breaking change to the self-hosted parser API.
std.ast.Tree has been separated into a private "Parser" type which
represents in-progress parsing, and std.ast.Tree which has only
"output" data. This means cleaner, but breaking, API for parse results.
Specifically, `tokens` and `errors` are no longer SegmentedList but a
slice.
The way to iterate over AST nodes has necessarily changed since lists of
nodes are now singly linked lists rather than SegmentedList.
From these changes, I observe the following on the
self-hosted-parser benchmark from ziglang/gotta-go-fast:
throughput: 45.6 MiB/s => 55.6 MiB/s
maxrss: 359 KB => 342 KB
This commit breaks the build; more updates are necessary to fix API
usage of the self-hosted parser.
However there does not appear to be an x86 encoding for calling an
immediate address. So there's no point of setting this up. We should
just emit an indirect call to the got addr.