Instead, we use a thread-local variable [callback_status] which
contains the index of the corresponding entry when a callback is
running. We can do this since there can only be one running callback
at the same time in a given thread.
This lifts the restriction forbidding the call of Thread.exit from a
memprof callback.
This is done by using a local entry array for each thread, containing
tracked blocks whose allocation callback has not yet been called.
This allows some simplification in the code running callbacks for
young allocations. Indeed, since the entry array is local to one
thread, we know for sure that it cannot be modified during a callback,
and therefore we no longer need to remember the indices of the
corresponding new entries.
The instrumentation code in the instrumented runtime was replaced
with new APIs to gather runtime statistics and output them in a new format
(Common Trace Format).
This commit also exposes new functions in the Gc module to pause or resume
instrumentation during a program execution (Gc.eventlog_pause and
Gc.eventlog_resume).
The Gc.Memprof module provides a low-level API, that will hopefully be
paired with user libraries that provide high-level instrumentation
choices.
A natural question is: how are the higher-level API going to expose
their choice of instrumentation to their users? With the current
Memprof.start API (before this patch), they would have to either
provide their own `start` function wrapping Memprof.start, or provide
a tuple of callbacks for to their users to pass to Memprof.start
themselves.
val start : params -> unit
(* or *)
val callback : params ->
((allocation -> foo option) * (allocation -> bar option) * ... )
With an explicit record, it is easier for libraries to expose an
instrumentation choice (possibility parametrized over
user-provided settings):
val tracker : params -> (foo, bar) Gc.Memprof.tracker
In addition, providing a record instead of optional parameters makes
it much easier to provide "default settings" (helper functions) that
instantiates the types `'minor` and `'ḿajor`, see for example
`simple_tracker` in this patch (which stores the same information for
the minor and major heap, and does not observe promotion), or to later
define checking predicates that can verify that a given choice of
callbacks is sensible (for example: providing a major-dealloc callback
but no promotion callback (dropping all tracked value on promotion) is
probably not a good idea).
Bootstrap: to avoid requiring an awkward bootstrap, this commit keeps
the (now unused) function caml_memprof_start_byt unchanged -- it is
used in the bootstrap binaries, so removing it would break the
build. The intention is to remove it in the following commit.
They are somewhat difficult to handle for native allocations, and it is not clear how useful they are. Moreover, they are easy to add back since [Gc.Memprof.allocation] is a private record.
The user can register several callbacks, which are called for various
events during the block's lifetime. We need to maintain a data
structure for tracked blocks in the runtime. When using threads,
callbacks can be called concurrently in a reentrant way, so the
functions manipulating this data structure need to be reentrant.
This makes sure that:
- Callbacks are never called when another is running
- The postponed queue is purged when setting memprof parameters
We now use a FIFO implemented as a circular buffer for remembering of
postponed blocks.
The workaround used for ignoring samples in the minor heap in native
mode now makes allocation very slow (or non-terminating) when the
sampling rate is not small enough. This will be fixed when sampling in
the minor heap in native mode will be implemented.
Allocations ignored by this version
- Marshalling
- In the minor heap by natively-compiled OCaml code
Allocations potentially sampled
- In the major heap
- In the minor heap by C code and OCaml code in bytecode mode
If an allocation fails, the decrement of young_ptr should be undone
before the GC is entered. This happened correctly on bytecode but not
on native code.
This commit (squash of pull request #8619) fixes it for all the
platforms supported by ocamlopt.
amd64: add alternate entry points caml_call_gc{1,2,3} for code size
optimisation.
powerpc: introduce one GC call point per allocation size per function.
Each call point corrects the allocation pointer r31 before calling
caml_call_gc.
i386, arm, arm64, s390x: update the allocation pointer after the
conditional branch to the GC, not before.
arm64: simplify the code generator: Ialloc can assume that less than
0x1_0000 bytes are allocated, since the max allocation size for the
minor heap is less than that.
This is a partial cherry-pick of commit 8ceec on multicore.
- inline Pervasives in Stdlib and re-add Pervasives as a deprecated
module that aliases all elements of Stdlib except the stdlib modules.
- remove special case for Stdlib.Pervasives in printtyp.ml
In a format following that of Gc.print_stat. I chose to print only the "quick_stat"
values rather than call gc_ctrl.c::heap_stats because it's lighter, and the extra
information is typically not very useful at program exit.
Also adds documentation for the 0x400 flag (in man and Gc module)
This replaces the previous undocumented 0x400 that only displayed the
total (minwords + majwords - prowords) and with a different format,
since keeping both wouldn't provide more information.