Instead of using the stdlib logf function for computing logarithms, we
use a faster polynomial-based approximation.
We use the xoshiro PRNG instead of the Mersenne Twister. xoshiro is
simpler and faster.
We generate samples by batches so that compilers can vectorize the
generation loops using SIMD instructions when possible.
The instrumentation code in the instrumented runtime was replaced
with new APIs to gather runtime statistics and output them in a new format
(Common Trace Format).
This commit also exposes new functions in the Gc module to pause or resume
instrumentation during a program execution (Gc.eventlog_pause and
Gc.eventlog_resume).
Build steps for this commit (from a valid core build for the previous commit):
make coreall
make coreboot
(Note: `make core` would not work as we remove a primitive.)
The Gc.Memprof module provides a low-level API, that will hopefully be
paired with user libraries that provide high-level instrumentation
choices.
A natural question is: how are the higher-level API going to expose
their choice of instrumentation to their users? With the current
Memprof.start API (before this patch), they would have to either
provide their own `start` function wrapping Memprof.start, or provide
a tuple of callbacks for to their users to pass to Memprof.start
themselves.
val start : params -> unit
(* or *)
val callback : params ->
((allocation -> foo option) * (allocation -> bar option) * ... )
With an explicit record, it is easier for libraries to expose an
instrumentation choice (possibility parametrized over
user-provided settings):
val tracker : params -> (foo, bar) Gc.Memprof.tracker
In addition, providing a record instead of optional parameters makes
it much easier to provide "default settings" (helper functions) that
instantiates the types `'minor` and `'ḿajor`, see for example
`simple_tracker` in this patch (which stores the same information for
the minor and major heap, and does not observe promotion), or to later
define checking predicates that can verify that a given choice of
callbacks is sensible (for example: providing a major-dealloc callback
but no promotion callback (dropping all tracked value on promotion) is
probably not a good idea).
Bootstrap: to avoid requiring an awkward bootstrap, this commit keeps
the (now unused) function caml_memprof_start_byt unchanged -- it is
used in the bootstrap binaries, so removing it would break the
build. The intention is to remove it in the following commit.
They are somewhat difficult to handle for native allocations, and it is not clear how useful they are. Moreover, they are easy to add back since [Gc.Memprof.allocation] is a private record.
The user can register several callbacks, which are called for various
events during the block's lifetime. We need to maintain a data
structure for tracked blocks in the runtime. When using threads,
callbacks can be called concurrently in a reentrant way, so the
functions manipulating this data structure need to be reentrant.
Introduce caml_process_pending_actions and
caml_process_pending_actions_exn: a variant of the former which does
not raise but returns a value that has to be checked against
Is_exception_value.
I keep the current conventions from caml_callback{,_exn}: For a
resource-safe interface, we mostly care about the _exn variants, but
every time there is a public _exn function I provide a function that
raises directly for convenience.
They are introduced and documented in caml/signals.h.
Private functions are converted to their _exn variant on the way as
needed: for internal functions of the runtime, it is desirable to go
towards a complete elimination of functions that raise implicitly.
Get rid of the distant logic of caml_raise_in_async_callback. Instead,
caml_process_pending_events takes care itself of its something_to_do
"resource". This avoids calling the former function in places
unrelated to asynchronous callbacks.
This make us able to get rid of to xxx_to_do variables in `final.c`
and `memprof.c`. The variable is reset to 0 when entering
`caml_check_urgent_gc`, which is now the main entry point for
asynchronous callbacks. In case a callback raises an exception, we
need to set it back to 1 to make sure no callback is missed.
This makes sure that:
- Callbacks are never called when another is running
- The postponed queue is purged when setting memprof parameters
We now use a FIFO implemented as a circular buffer for remembering of
postponed blocks.
The finalizers and all the other asynchronous callbacks (including
signal handlers, memprof callbacks and finalizers) are now called in a
common function, [caml_async_callbacks]. It is called in
[caml_check_urgent_gc] and wherever [caml_process_pending_signals] was
called.
This makes it possible to simplify the [caml_gc_dispatch] logic by
removing the loop it contains, since it no longer calls finalizers.
The workaround used for ignoring samples in the minor heap in native
mode now makes allocation very slow (or non-terminating) when the
sampling rate is not small enough. This will be fixed when sampling in
the minor heap in native mode will be implemented.
Allocations ignored by this version
- Marshalling
- In the minor heap by natively-compiled OCaml code
Allocations potentially sampled
- In the major heap
- In the minor heap by C code and OCaml code in bytecode mode