- Rewrite the `is_immediate` methods in $ARCH/selection.ml in the style of
other selection methods: operations that need platform-dependent handling
are explicitly listed, all others fall through `super#is_immediate`.
- The `is_immediate` method from selectgen.ml knows how to handle shifts
(and no other operation). Remove the `select_shift_op` method,
now unnecessary.
- ARM: remove special cases for multiply and multiply-high, no longer
necessary.
- RISC-V: in emit.mlp, remove implementation of checkbound immediate,
which is no longer generated.
Replace the a single `is_immediate n` method that is supposed to apply
to all arithmetic instructions by two methods:
`is_immediate op n` : tests whether `n` is in the range of supported
immediate arguments for integer operation `op`
`is_immediate_test cmp n` : tests whether `n` is in the range of supported
immediate arguments for integer comparison `cmp`
This makes it easier to handle operations without immediate operands
(e.g. multiply or multiply-high on many platforms) and operations with
specific ranges of immediate operands (e.g. N-bit unsigned versus
N-bit signed). Before, these operations had to be treated as special
cases in the platform-specific `select_operation` method.
* Prologue size does not depend on stack_offset (power, arm64)
Define `initial_stack_offset` of a function, independently
of stack_offset, and use it to compute both frame_size and
prologue_size.
Introduce the type Cmm.exttype to precisely describe arguments to
external C functions, especially unboxed numerical arguments.
Annotate Cmm.Cextcall with the types of the arguments (Cmm.exttype list).
An empty list means "all arguments have default type XInt".
Annotate Mach.Iextcall with the type of the result (Cmm.machtype)
and the types of the arguments (Cmm.exttype list).
Change (slightly) the API for describing calling conventions in Proc:
- loc_external_arguments now takes a Cmm.exttype list,
in order to know more precisely the types of the arguments.
- loc_arguments, loc_parameters, loc_results, loc_external_results
now take a Cmm.machype instead of an array of pseudoregisters.
(Only the types of the pseudoregisters mattered anyway.)
Update the implementations of module Proc accordingly, in every port.
Introduce a new overridable method in Selectgen, insert_move_extcall_arg,
to produce the code that moves an argument of an external C function
to the locations returned by Proc.loc_external_arguments.
Revise the selection of external calls accordingly
(method emit_extcall_args in Selectgen).
amd64: remove caml_call_gc{1,2,3} and simplify caml_alloc{1,2,3,N}
by tail-calling caml_call_gc.
i386: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
these functions do not need to preserve ebx.
arm: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
arm64: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
power: partial revert of #8619.
avoid restarting allocation sequence after failure.
s390: partial revert of #8619.
avoid restarting allocation seqeunce after failure.
Moves the alloc_dbginfo type to Debuginfo, to avoid a circular
dependency on architectures that use Branch_relaxation.
This commit generates frame tables with allocation sizes on all
architectures, but does not yet update the allocation code for
non-amd64 backends.
Separate the description of the IR from the transformations
performed on it by moving type declarations from linearize.ml
into their own file, called linear.ml.
The domain state fields are always aligned at 8 byte offset. This is to
ensure that even on a 32-bit where pointers are 32-bits and doubles are
64-bits, the offset calculation remains the same as 64-bit
architectures.
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.
This patch is extract from PR1482 @lthls
If an allocation fails, the decrement of young_ptr should be undone
before the GC is entered. This happened correctly on bytecode but not
on native code.
This commit (squash of pull request #8619) fixes it for all the
platforms supported by ocamlopt.
amd64: add alternate entry points caml_call_gc{1,2,3} for code size
optimisation.
powerpc: introduce one GC call point per allocation size per function.
Each call point corrects the allocation pointer r31 before calling
caml_call_gc.
i386, arm, arm64, s390x: update the allocation pointer after the
conditional branch to the GC, not before.
arm64: simplify the code generator: Ialloc can assume that less than
0x1_0000 bytes are allocated, since the max allocation size for the
minor heap is less than that.
This is a partial cherry-pick of commit 8ceec on multicore.
This is a follow-up to commit 7077b60 that fixed a lack of 8-alignment for the frame table on ADM64, as reported in #7591.
A similar issue was reported in #7887 for ARM64 and is fixed here.
For good measure, explicit alignment was added to PPC64 as well, although there was probably no issue there.
Closes: #7887.
The address was loaded from the TOC into register r0. This generated bad code in the "big TOC" case, as r0 was used as index register. The fix is to use another temporary register instead of r0.
Add "arch_power" builtin to ocamltest.
Add test case.
This commit removes support for gprof-based profiling (the -p option to ocamlopt). It follows a discussion on the core developers' list, which indicated that removing gprof support was a reasonable thing to do. The rationale is that there are better easy-to-use profilers out there now, such as perf for Linux and Instruments on macOS; and the gprof support has always been patchy across targets. We save a whole build of the runtime and simplify some other parts of the codebase by removing it.
Following on from GPR#851 and GPR#873, this pull request further enhances debugging information in Cmm terms. This was driven both by manually examining the debugger's behaviour and also by a report received from a user regarding substandard DWARF location information.
The frametable contains absolute pointers into the code, which require relocation in shared libraries and also in position-independent executables (PIE).
Before this commit, the frametable was put in the readonly data section (rodata), which is part of the text segment. In shared libraries and PIEs, relocations in the text segment are undesirable (they make the text segment writable, at least temporarily) and are flagged as warnings or errors by various tools (Debian's lintian package checker; the --warn-shared-textrel option of GNU ld; etc).
This commit puts the frametable in the (read-write) data section (.data), like in the AMD64 port for example. In PowerPC 64-bit mode, this is enough to produce .so files and PIE executables that contain no relocations in the text segment.
In PowerPC 32-bit mode there remains relocations in the text segment, but that was expected because the code we generate is not position-independent (PIC).
since the semantic changed. There is no need to check Clflags.debug
anymore Raise_withtrace, means that traces must be computed (if the
runtime boolean is true).
Since GPR#247 (stack backtraces aware of inlining) was merged, frame tables contain two kinds of addresses of labels: code labels (as before) and data labels (new, pointing to sub-frames).
On ARM in Thumb mode, the two kinds of pointers must be distinguished, because pointers to Thumb code have the low bit set, and the assembler needs to know whether a label denotes code or data to set the low bit or not.
This commit fixes this problem by splitting the "efa_label" action of record Emitaux.emit_frame_actions into two actions, "efa_code_label" and "efa_data_label". On all ports except ARM, the two actions are identical. On ARM, the actions add the appropriate ".type" declaration.
Tested on ARM-32 and x86-64 only. CI will test the other platforms.