Commit Graph

258 Commits (master)

Author SHA1 Message Date
jacobly0 8a46d76bf9
Fix mergeable section flags and use .rodata.cst16 where appropriate (#9981)
On x86-64 ELF, the `.rodata.cst8` section was incorrectly used.
2020-10-18 13:57:53 +02:00
Nicolás Ojeda Bär 43883ae4bc Remove labels after calls, checkbound, and GC points 2020-10-08 20:28:15 +02:00
Nicolás Ojeda Bär 3869f71e98 Remove Cblockheader 2020-10-08 20:28:15 +02:00
Nicolás Ojeda Bär 540996d21e Remove Spacetime 2020-10-08 20:28:12 +02:00
Xavier Leroy 8e246c41c2 Revised detection of arithmetic instructions with immediate operands, continued
- Rewrite the `is_immediate` methods in $ARCH/selection.ml in the style of
  other selection methods: operations that need platform-dependent handling
  are explicitly listed, all others fall through `super#is_immediate`.

- The `is_immediate` method from selectgen.ml knows how to handle shifts
  (and no other operation).  Remove the `select_shift_op` method,
  now unnecessary.

- ARM: remove special cases for multiply and multiply-high, no longer
  necessary.

- RISC-V: in emit.mlp, remove implementation of checkbound immediate,
  which is no longer generated.
2020-09-21 14:49:16 +02:00
Xavier Leroy 86fbea7fc3 Back-ends for 64-bit platforms do not need to be compilable on a 32-bit host
This commit simplifies a few integer constants that were obfuscated so
as to pass compilation on a 32-bit host, as "make check_all_arches"
would do if ran on a 32-bit host.  However, "make check_all_arches"
does not run on 32-bit hosts, unlike what is claimed in comments.

More generally, 32-bit hosts are no longer used for developing OCaml and
will not be used for cross-compilation.  So, let's not complicate the
back-ends unnecessarily.
2020-09-16 11:52:26 +02:00
Xavier Leroy 65544ffd1f Revised detection of arithmetic instructions with immediate operands
Replace the a single `is_immediate n` method that is supposed to apply
to all arithmetic instructions by two methods:

`is_immediate op n` : tests whether `n` is in the range of supported
   immediate arguments for integer operation `op`
`is_immediate_test cmp n` : tests whether `n` is in the range of supported
   immediate arguments for integer comparison `cmp`

This makes it easier to handle operations without immediate operands
(e.g. multiply or multiply-high on many platforms) and operations with
specific ranges of immediate operands (e.g. N-bit unsigned versus
N-bit signed).  Before, these operations had to be treated as special
cases in the platform-specific `select_operation` method.
2020-09-16 11:52:19 +02:00
Xavier Leroy 9fcb295b98 Revised passing of arguments to external C functions
Introduce the type Cmm.exttype to precisely describe arguments to
external C functions, especially unboxed numerical arguments.

Annotate Cmm.Cextcall with the types of the arguments (Cmm.exttype list).
An empty list means "all arguments have default type XInt".

Annotate Mach.Iextcall with the type of the result (Cmm.machtype)
and the types of the arguments (Cmm.exttype list).

Change (slightly) the API for describing calling conventions in Proc:
- loc_external_arguments now takes a Cmm.exttype list,
  in order to know more precisely the types of the arguments.
- loc_arguments, loc_parameters, loc_results, loc_external_results
  now take a Cmm.machype instead of an array of pseudoregisters.
  (Only the types of the pseudoregisters mattered anyway.)

Update the implementations of module Proc accordingly, in every port.

Introduce a new overridable method in Selectgen, insert_move_extcall_arg,
to produce the code that moves an argument of an external C function
to the locations returned by Proc.loc_external_arguments.

Revise the selection of external calls accordingly
(method emit_extcall_args in Selectgen).
2020-07-24 17:39:22 +02:00
Stephen Dolan 2d92955749
Remove Const_pointer (#9578)
Since #9316 was merged, Cconst_pointer is compiled in exactly the same way as Cconst_int. This commit removes the now-redundant Cconst_pointer and Cconst_natpointer.
2020-05-19 15:31:08 +02:00
Nicolás Ojeda Bär ec6690fb53
x86 asm: handle unit names with special characters (#9465) 2020-04-19 11:17:00 +02:00
Stephen Dolan 4d4a056bc7
Micro-optimise allocations on amd64 to save a register (#9280)
There's no need for allocation on amd64 to clobber the %rax register. It's only used in one case (-compact out-of-line allocation of >3 words), and only used there to do a single subtraction. That subtraction can be done by the caller at no code size penalty, freeing up %rax.

Inside amd64.S functions, %r11 can be used instead of %rax as temporary.  %r11 is destroyed by PLT stub code, so on ELF platforms it costs nothing to use.
2020-03-09 19:52:36 +01:00
Greta Yorsh 6daaf62904 Do not emit references to dead labels (spacetime) (#9097) 2019-11-26 12:06:19 +00:00
Stephen Dolan 7fe360401b Per-architecture support for allocation size info in frame tables.
amd64: remove caml_call_gc{1,2,3} and simplify caml_alloc{1,2,3,N}
       by tail-calling caml_call_gc.

i386:  simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
       these functions do not need to preserve ebx.

arm:   simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
       partial revert of #8619.

arm64: simplify caml_alloc{1,2,3,N} by tail-calling caml_call_gc.
       partial revert of #8619.

power: partial revert of #8619.
       avoid restarting allocation sequence after failure.

s390:  partial revert of #8619.
       avoid restarting allocation seqeunce after failure.
2019-10-23 09:24:13 +01:00
Stephen Dolan 768dcce48f Use allocation-size info on more than just amd64.
Moves the alloc_dbginfo type to Debuginfo, to avoid a circular
dependency on architectures that use Branch_relaxation.

This commit generates frame tables with allocation sizes on all
architectures, but does not yet update the allocation code for
non-amd64 backends.
2019-10-22 11:47:31 +01:00
Stephen Dolan 787e2d05a7 Apply suggestions from code review, and make depend.
Co-Authored-By: Damien Doligez <damien.doligez@gmail.com>
2019-10-22 11:47:31 +01:00
Stephen Dolan 34f97941ec Retain debug information about allocation sizes, for statmemprof.
This code is adapted from jhjourdan's 2c93ca1e711. Comballoc is
extended to keep track of allocation sizes and debug info for each
allocation, and the frame table format is modified to store them.

The native code GC-entry logic is changed to match bytecode, by
calling the garbage collector at most once per allocation.

amd64 only, for now.
2019-10-22 11:47:31 +01:00
Stephen Dolan b0ad600b88 Use a more compact representation of debug information.
Locations of inlined frames are now represented as contiguous
sequences rather than linked lists.

The frame tables now refer to debug info by 32-bit offset rather
than word-sized pointer.
2019-10-22 11:46:35 +01:00
Stephen Dolan 71f3ec4091 Clear destination registers before sqrt instruction on amd64 (#9041)
This avoids a partial register stall.
2019-10-15 19:04:20 +02:00
Stephen Dolan 0852266a07 Improve code-generation for 32-to-64-bit zero-extension on amd64. 2019-10-14 10:45:15 +01:00
Tom Kelly 62d6917fd5 amd64: Emit 32bit registers for Iconst_int when we can (This is a reuse of (better) code proposed in PR1490 credit to xclerc/mshinwell) 2019-10-03 16:52:50 +02:00
Greta Yorsh 1c128fdf25 Make contains_calls into a reference instance variable 2019-09-12 12:58:54 +01:00
Greta Yorsh cae89d4e1b Pass num_stack_slots as argument 2019-09-11 18:48:20 +01:00
Greta Yorsh aeebb62e9b Move contains_calls and num_stack_slots from Proc to Mach.fundecl 2019-09-09 11:33:03 +01:00
Greta Yorsh 0b6b544fcb Split Linearize into two modules
Separate the description of the IR from the transformations
performed on it by moving type declarations from linearize.ml
into their own file, called linear.ml.
2019-09-04 11:55:11 +01:00
Thomas Refis 4a22aeccb5 warning 60: enable on local modules 2019-08-28 13:24:10 +01:00
KC Sivaramakrishnan a395c4cf71 Fix CFI offsets in amd64 2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan ee2bcfe1ad Optimising poptrap in i386 2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan de5ef602fd Rename exn_handler to exception_pointer 2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan c06038a0ee Move backtrace support global variables to domain state.
Since we cannot access backtrace position in cmmgen.ml anymore,
Cmm.raise_kind in removed. Instead, we use Lambda.raise_kind. When
assembly code is generated, we reset the backtrace position to 0 in the
case of regular raise. Importantly, the semantics remains the same.
2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan ddf400b1e9 Destroy r11 at Itrywith 2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan 45b1e18f59 young_ptr and young_limit are now in domain state 2019-08-23 09:50:05 +05:30
KC Sivaramakrishnan fc6f028492 Introduce domain state and steal exception pointer 2019-08-23 09:50:05 +05:30
Greta Yorsh c79387bb64 Add .caml to function section names
Emit .text.caml.function_name instead of .text.function_name,
and update runtime assembly function names accordingly.
2019-07-15 10:25:26 +01:00
Greta Yorsh 351edb49bb Add compile-time option -function-sections 2019-07-15 10:25:26 +01:00
Greta Yorsh 27a92a9445 Emit each function in a separate section (amd64,i386,arm,arm64)
Add --enable-function-sections option to configure. With this option,
the compiler will emit each function in a separate named text section,
on supported targets. This enables function reordering using a linker
script. With this option, the compiler also emits caml_hot__code_begin
and caml_hot__code_end sections. This allows a linker script to
move function sections outside of the segments they belong to,
without breaking caml_code_segments.
2019-07-15 10:25:26 +01:00
Greta Yorsh d8b6a1713b Add pseudo-instruction `Ladjust_trap_depth` (#2322)
Ladjust_trap_depth replaces dummy Lpushtrap generated in linearize of
Iexit to notify assembler generation about updates to the
stack. Ladjust_trap_depth is used to keep the virtual stack pointer in
sync and emit dwarf information, without emitting any assembly
instructions. It therefore avoids generating dead code.

This patch is extract from PR1482 @lthls
2019-06-24 14:18:37 +01:00
Nicolás Ojeda Bär 23f6a7364b amd64: align data segment to word boundary 2019-06-24 09:35:07 +02:00
Greta Yorsh e7aef3aa6f Fix amd64 and i386 emitters of Lcondbranch3 (#8677)
Use unsigned comparisons (jb/ja) in amd64 and i386 emitters of Lcondbranch3,
instead of the previous mixture of unsigned  and signed comparisons (jb/jg).
2019-06-03 16:30:34 +02:00
Stephen Dolan c24e5b5c8a Ensure that Gc.minor_words remains accurate after a GC (#8619)
If an allocation fails, the decrement of young_ptr should be undone
before the GC is entered. This happened correctly on bytecode but not
on native code.

This commit (squash of pull request #8619) fixes it for all the
platforms supported by ocamlopt.

amd64: add alternate entry points caml_call_gc{1,2,3} for code size
optimisation.

powerpc: introduce one GC call point per allocation size per function.
Each call point corrects the allocation pointer r31 before calling
caml_call_gc.

i386, arm, arm64, s390x: update the allocation pointer after the
conditional branch to the GC, not before.

arm64: simplify the code generator: Ialloc can assume that less than
0x1_0000 bytes are allocated, since the max allocation size for the
minor heap is less than that.

This is a partial cherry-pick of commit 8ceec on multicore.
2019-05-04 10:01:23 +02:00
Mark Shinwell 72ea849d2a
Move some middle-end files around (#2281)
* Various file moves in the middle end: this is the first stage of improving separation between the middle end and backend.
* Creation of file_formats/ directory (with associated file moves) to hold the definitions of compilation artifact formats.
* Creation of lambda/ directory (with associated file moves) to hold Lambda language definition files, transformation passes and construction passes from Typedtree.
* Disable (hopefully temporarily) dynlink, debugger and ocamldoc for the dune build.
2019-04-01 17:18:47 +01:00
Mark Shinwell 4334b2de87
Position [Lprologue] correctly (#2292) 2019-03-29 11:47:53 +00:00
Mark Shinwell 2cc1ea26b9 Remove gprof support (#2314)
This commit removes support for gprof-based profiling (the -p option to ocamlopt).  It follows a discussion on the core developers' list, which indicated that removing gprof support was a reasonable thing to do. The rationale is that there are better easy-to-use profilers out there now, such as perf for Linux and Instruments on macOS; and the gprof support has always been patchy across targets. We save a whole build of the runtime and simplify some other parts of the codebase by removing it.
2019-03-16 19:56:53 +01:00
Mark Shinwell 618e5dbfbd More debugging information in Cmm terms (#2308)
Following on from GPR#851 and GPR#873, this pull request further enhances debugging information in Cmm terms. This was driven both by manually examining the debugger's behaviour and also by a report received from a user regarding substandard DWARF location information.
2019-03-13 15:40:04 +00:00
Mark Shinwell 24e12ad9e1 Propagate environments further in Selectgen 2019-03-08 13:06:31 +00:00
Vincent Laviron 1dba5329a2 Linearize: for Trywith, remove the jump/call to the handler (#2237) 2019-03-07 10:37:22 +00:00
Daniel Bünzli a7afd89003 s/string_of_int/Int.to_string/g 2018-11-07 13:52:02 +01:00
Mark Shinwell 770e662e96
Add [Proc.destroyed_at_reloadretaddr] (#2065) 2018-10-15 12:53:27 +01:00
Mark Shinwell dacb2240a4
DWARF register numberings (#2080) 2018-10-04 11:30:52 +01:00
Mark Shinwell dae65dacda
Rename Mach.Ialloc record field from _words_ to _bytes_ and fix logic in a couple of places (#2074) 2018-10-02 16:00:03 +01:00
Mark Shinwell 2a072d8036
Add Lprologue (#2055) 2018-09-24 10:03:26 +01:00