There's no need for allocation on amd64 to clobber the %rax register. It's only used in one case (-compact out-of-line allocation of >3 words), and only used there to do a single subtraction. That subtraction can be done by the caller at no code size penalty, freeing up %rax.
Inside amd64.S functions, %r11 can be used instead of %rax as temporary. %r11 is destroyed by PLT stub code, so on ELF platforms it costs nothing to use.
amd64: remove caml_call_gc{1,2,3} and simplify caml_alloc{1,2,3,N}
by tail-calling caml_call_gc.
i386: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
these functions do not need to preserve ebx.
arm: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
arm64: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
partial revert of #8619.
power: partial revert of #8619.
avoid restarting allocation sequence after failure.
s390: partial revert of #8619.
avoid restarting allocation seqeunce after failure.
Moves the alloc_dbginfo type to Debuginfo, to avoid a circular
dependency on architectures that use Branch_relaxation.
This commit generates frame tables with allocation sizes on all
architectures, but does not yet update the allocation code for
non-amd64 backends.
This code is adapted from jhjourdan's 2c93ca1e711. Comballoc is
extended to keep track of allocation sizes and debug info for each
allocation, and the frame table format is modified to store them.
The native code GC-entry logic is changed to match bytecode, by
calling the garbage collector at most once per allocation.
amd64 only, for now.
Locations of inlined frames are now represented as contiguous
sequences rather than linked lists.
The frame tables now refer to debug info by 32-bit offset rather
than word-sized pointer.
Separate the description of the IR from the transformations
performed on it by moving type declarations from linearize.ml
into their own file, called linear.ml.
Since we cannot access backtrace position in cmmgen.ml anymore,
Cmm.raise_kind in removed. Instead, we use Lambda.raise_kind. When
assembly code is generated, we reset the backtrace position to 0 in the
case of regular raise. Importantly, the semantics remains the same.
Add --enable-function-sections option to configure. With this option,
the compiler will emit each function in a separate named text section,
on supported targets. This enables function reordering using a linker
script. With this option, the compiler also emits caml_hot__code_begin
and caml_hot__code_end sections. This allows a linker script to
move function sections outside of the segments they belong to,
without breaking caml_code_segments.
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.
This patch is extract from PR1482 @lthls
Use unsigned comparisons (jb/ja) in amd64 and i386 emitters of Lcondbranch3,
instead of the previous mixture of unsigned and signed comparisons (jb/jg).
If an allocation fails, the decrement of young_ptr should be undone
before the GC is entered. This happened correctly on bytecode but not
on native code.
This commit (squash of pull request #8619) fixes it for all the
platforms supported by ocamlopt.
amd64: add alternate entry points caml_call_gc{1,2,3} for code size
optimisation.
powerpc: introduce one GC call point per allocation size per function.
Each call point corrects the allocation pointer r31 before calling
caml_call_gc.
i386, arm, arm64, s390x: update the allocation pointer after the
conditional branch to the GC, not before.
arm64: simplify the code generator: Ialloc can assume that less than
0x1_0000 bytes are allocated, since the max allocation size for the
minor heap is less than that.
This is a partial cherry-pick of commit 8ceec on multicore.
* Various file moves in the middle end: this is the first stage of improving separation between the middle end and backend.
* Creation of file_formats/ directory (with associated file moves) to hold the definitions of compilation artifact formats.
* Creation of lambda/ directory (with associated file moves) to hold Lambda language definition files, transformation passes and construction passes from Typedtree.
* Disable (hopefully temporarily) dynlink, debugger and ocamldoc for the dune build.
This commit removes support for gprof-based profiling (the -p option to ocamlopt). It follows a discussion on the core developers' list, which indicated that removing gprof support was a reasonable thing to do. The rationale is that there are better easy-to-use profilers out there now, such as perf for Linux and Instruments on macOS; and the gprof support has always been patchy across targets. We save a whole build of the runtime and simplify some other parts of the codebase by removing it.
Mark PLT-clobbered registers as destroyed across the Ialloc instruction.
Currently only x86-64 is affected, in PIC mode only, and only with the glibc dynamic loader.
Misalignment was due to the "D.long (const 0)" emitted just before the frametable, which sets the data pointer to 4 mod 8. Looks like someone cut-and-pasted from i386 to amd64 without thinking.
This commit fixes the bug twice (because belt and suspenders and all that) in two obvious ways:
- the data terminator D.long becomes D.qword
- explicit 8-alignment is requested before emitting the frame table.
(Mental note: why is the frame table in the data segment and not in a readonly data segment?)
since the semantic changed. There is no need to check Clflags.debug
anymore Raise_withtrace, means that traces must be computed (if the
runtime boolean is true).