- Mutex.lock raises Sys_error if the mutex is already locked by the
calling thread.
- Mutex.unlock raises Sys_error if the mutex is unlocked or locked
by another thread.
Add the corresponding tests.
Co-authored-by: David Allsopp <david.allsopp@metastack.com>
The user can register several callbacks, which are called for various
events during the block's lifetime. We need to maintain a data
structure for tracked blocks in the runtime. When using threads,
callbacks can be called concurrently in a reentrant way, so the
functions manipulating this data structure need to be reentrant.
Introduce caml_process_pending_actions and
caml_process_pending_actions_exn: a variant of the former which does
not raise but returns a value that has to be checked against
Is_exception_value.
I keep the current conventions from caml_callback{,_exn}: For a
resource-safe interface, we mostly care about the _exn variants, but
every time there is a public _exn function I provide a function that
raises directly for convenience.
They are introduced and documented in caml/signals.h.
Private functions are converted to their _exn variant on the way as
needed: for internal functions of the runtime, it is desirable to go
towards a complete elimination of functions that raise implicitly.
Get rid of the distant logic of caml_raise_in_async_callback. Instead,
caml_process_pending_events takes care itself of its something_to_do
"resource". This avoids calling the former function in places
unrelated to asynchronous callbacks.
In 8691, caml_check_urgent_gc was merged with the function that runs
asynchronous callbacks. The rationale was that caml_check_urgent_gc
already runs finalisers, and so could have run any asynchronous
callbacks.
We agreed on a different strategy: we know that users could not rely
on asynchronous callbacks being called at this point, so take the
opportunity to make it callback-safe, like was done for allocation
functions.
The new check_urgent_gc no longer calls finalisers (nor any
callbacks), and instead two internal functions are introduced:
* caml_do_urgent_gc_and_callbacks : function to perform actions
unconditionally.
* caml_check_urgent_gc_and_callbacks : function that checks for
something to do, and then executes all actions (GC and callbacks).
Since we cannot access backtrace position in cmmgen.ml anymore,
Cmm.raise_kind in removed. Instead, we use Lambda.raise_kind. When
assembly code is generated, we reset the backtrace position to 0 in the
case of regular raise. Importantly, the semantics remains the same.
This make us able to get rid of to xxx_to_do variables in `final.c`
and `memprof.c`. The variable is reset to 0 when entering
`caml_check_urgent_gc`, which is now the main entry point for
asynchronous callbacks. In case a callback raises an exception, we
need to set it back to 1 to make sure no callback is missed.
The finalizers and all the other asynchronous callbacks (including
signal handlers, memprof callbacks and finalizers) are now called in a
common function, [caml_async_callbacks]. It is called in
[caml_check_urgent_gc] and wherever [caml_process_pending_signals] was
called.
This makes it possible to simplify the [caml_gc_dispatch] logic by
removing the loop it contains, since it no longer calls finalizers.
Allocations ignored by this version
- Marshalling
- In the minor heap by natively-compiled OCaml code
Allocations potentially sampled
- In the major heap
- In the minor heap by C code and OCaml code in bytecode mode
Thread.yield invoked a trivial blocking section, which basically woke
up a competitor and then raced with them to get the ocaml lock back, invoking
nanosleep() to help guarantee that the yielder would lose the race. However,
until the yielder woke up again and attempted to take the ocaml lock, it
wouldn't be marked as a waiter.
As a result, if two threads A and B yielded to each other in a tight loop, A's
first yield would work well, but then B would execute 10000+ iterations of the
loop before A could mark itself as a waiter and be yielded to. This works even
worse if A and B are pinned to the same CPU, in which case A can't be marked as
a waiter until the kernel preempts B, which can take tens or hundreds of
milliseconds!
So we reimplement yield; instead of dropping the lock and taking it again (with
a wait in the middle), atomically wake a competitor and mark the yielding thread
as a waiter. (We essentially inline a failed masterlock_acquire into
masterlock_release, specialized for the case where we know another waiter exists
and we want them to run instead.)
Now, threads yielding to each other very consistently succeed--in that same
tight loop, we see a change of control on every iteration (with some very rare
exceptions, most likely from other uncommon blocking region invocations.)
This also means we don't have to worry about the vagaries of kernel scheduling
and whether or not a yielding or a yielded-to thread gets to run first; we
consistently let a competing thread run whenever we yield, which is what the API
claims to do.
The `top_of_stack` field of the `th` descriptor for the new thread was initialized too late, causing `caml_top_of_stack` to be NULL when the thread starts running. The fix is to initialize `th->top_of_stack` earlier.
Running Clang 6.0 and GCC 8 with full warnings on suggests a few simple improvements and clean-ups to the C code of OCaml. This commit implements them.
* Remove old-style, unprototyped function declarations
It's `int f(void)`, not `int f()`. [-Wstrict-prototypes]
* Be more explicit about conversions involving `float` and `double`
byterun/bigarray.c, byterun/ints.c:
add explicit casts to clarify the intent
renamed float field of conversion union from `d` to `f`.
byterun/compact.c, byterun/gc_ctrl.c:
some local variables were of type `float` while all FP computations
here are done in double precision;
turned these variables into `double`.
[-Wdouble-promotion -Wfloat-conversion]
*Add explicit initialization of struct field `compare_ext`
[-Wmissing-field-initializers]
* Declare more functions "noreturn"
[-Wmissing-noreturn]
* Make CAMLassert compliant with ISO C
In `e1 ? e2 : e3`, expressions `e2` and `e3` must have the same type.
`e2` of type `void` and `e3` of type `int`, as in the original code,
is a GNU extension.
* Remove or conditionalize unused macros
Some macros were defined and never used.
Some other macros were always defined but conditionally used.
[-Wunused-macros]
* Replace some uses of `int` by more appropriate types like `intnat`
On a 64-bit platform, `int` is only 32 bits and may not represent correctly
the length of a string or the size of an OCaml heap block.
This commit replaces a number of uses of `int` by other types that
are 64-bit wide on 64-bit architectures, such as `intnat` or `uintnat`
or `size_t` or `mlsize_t`.
Sometimes an `intnat` was used as an `int` and is intended as a Boolean
(0 or 1); then it was replaced by an `int`.
There are many remaining cases where we assign a 64-bit quantity to a
32-bit `int` variable. Either I believe these cases are safe
(e.g. the 64-bit quantity is the difference between two pointers
within an I/O buffer, something that always fits in 32 bits), or
the code change was not obvious and too risky.
[-Wshorten-64-to-32]
* Put `inline` before return type
`static inline void f(void)` is cleaner than `static void inline f(void)`.
[-Wold-style-declaration]
* Unused assignment to unused parameter
Looks very useless. [-Wunused-but-set-parameter]
$ cat /tmp/b.ml
let () = Thread.join (Thread.create ignore ())
let () = for _ = 0 to 100000; do () done
$ ocamlopt -I +threads unix.cmxa threads.cmxa /tmp/b.ml -o b
$ time ./b # before this commit
real 0m0.053s
user 0m0.000s
sys 0m0.000s
$ time ./b # after this commit
real 0m0.003s
user 0m0.000s
sys 0m0.000s
A few more wrappers were added (caml_stat_alloc_noexc, caml_stat_resize_noexc,
caml_stat_calloc_noexc) that do not throw an exception in case of errors and
offer a compatible substitute to the corresponding stdlib functions.
Before, mutexes and condition variables were allocated with caml_alloc_custom and cost factor 1/N with N of a few hundreds or thousands. Hence GCs were triggered every N allocations approximately, which is bad. The motivation for this allocation cost was to cover the possibility that mutexes and condvars consume rare kernel resources. This appears not to be the case in Linux nor in Windows, and is unlikely to be the case in any robust implementation of POSIX threads. Hence this fix sets the cost factor to 0 in allocations of mutexes and condvars.
The mutex can be destroyed for the first time when finalizing the I/O buffer.
If the buffer contains unflushed data, it is kept in the list of buffers.
Then Unix.fork() causes caml_thread_reinitialize() to reset all buffers in
this list, destroying the mutex a second time.
* Don't use the compatibility macros, neither in the C stub code nor in the testsuite.
* Make sure compiler sources do not use deprecated C identifiers.
This is achieved by ensuring that the CAML_NAME_SPACE macro is defined
everytime a C source file is compiled, rather than being defined only
in a few places. Defining this macro guarantees that the compatibility.h
header (where these deprecated identifiers are defined) will not be
included.
byterun/compatibility.h defines:
#define stat_alloc caml_stat_alloc
#define stat_free caml_stat_free
#define stat_resize caml_stat_resize
Having the "caml_" prefix seems cleaner to me, it also avoids some
issues for cross-compilation but I don't remember well which ones.
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@13314 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02