amd64: remove caml_call_gc{1,2,3} and simplify caml_alloc{1,2,3,N}
by tail-calling caml_call_gc.
i386: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
these functions do not need to preserve ebx.
arm: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
arm64: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
power: partial revert of #8619.
avoid restarting allocation sequence after failure.
s390: partial revert of #8619.
avoid restarting allocation seqeunce after failure.
If an allocation fails, the decrement of young_ptr should be undone
before the GC is entered. This happened correctly on bytecode but not
on native code.
This commit (squash of pull request #8619) fixes it for all the
platforms supported by ocamlopt.
amd64: add alternate entry points caml_call_gc{1,2,3} for code size
optimisation.
powerpc: introduce one GC call point per allocation size per function.
Each call point corrects the allocation pointer r31 before calling
caml_call_gc.
i386, arm, arm64, s390x: update the allocation pointer after the
conditional branch to the GC, not before.
arm64: simplify the code generator: Ialloc can assume that less than
0x1_0000 bytes are allocated, since the max allocation size for the
minor heap is less than that.
This is a partial cherry-pick of commit 8ceec on multicore.