In PIC mode, Itailcall_imm should jumpt to the PLT of the called function.
Also: use %r7 rather than %r1 to pass the function pointer argument to caml_c_call. It can be that caml_c_call is in a different shared object than the caller. In this case, %r0 and %r1 can be destroyed by PLT stub code, according to the ELF ABI.
Move the cold path (the one that calls the GC when alloc_ptr < alloc_limit)
as much as possible to the end of the function.
Use la and lay to produce shorter code.
Following the previous commit, %r12 becomes usable as a normal register.
However it must be saved in caml_call_gc.
Independently: change Proc.loc_external_arguments to account for the
160 reserved bytes at bottom of stack. Then, caml_c_call and
emission of code for Iextcall(false) no longer need to account for
those reserved bytes.