The definition was suggested by Daniel Bunzli. It considers
that ".", "..", ".foo" all have an empty extension.
This commit also fixes chop_extension to align with this definition
and adds remove_extension which behaves as chop_extension but
does not fail when the extension is empty.
There used to be a Misc.chop_extension_if_any in the compiler code base.
The commit also replaces it with the new Filename.remove_extension.
Adds the required_globals information to bytecode compilation units.
This patch also bootstrap ocamlc. The cmo format is changed by this
commit, there is no way around bootstraping here. Note that ocamldep and
ocamllex does not rely on the cmo format, so they are not present in
this commit.
Changes in tests:
* Update test/transprim/comparison_table.ml.reference:
The (opaque (global List!)) expression is not present anymore
* Update tests/no-alias-deps/aliases.cmo.reference
The output of objinfo changed
In order to remove some redundancy, the Pparse modules used a dirty
Obj.magic trick to manipulate either structure or signature values
(ASTs parsed from source files). This unsafe approach means that
programming mistakes may result in broken type soudness, and indeed
I myself had to track a very puzzling Segfault during the development
of my Menhir backend (due to a copy-paste error I was passing
Parse.implementation instead of Parse.interface somewhere). Wondering
why your parser generator seems to generate segfaulting parsers is
Not Fun.
This change means that the external interface of Pparse has changed
a bit. There is no way to fulfill the type of Pparse.file in
a type-safe way
val file : formatter -> tool_name:string -> string ->
(Lexing.lexbuf -> 'a) -> string -> 'a
as it promises to be able to process any ast type 'a depending on the
magic number (the last string argument). The knew type-safe interface is
val file : formatter -> tool_name:string -> string ->
(Lexing.lexbuf -> 'a) -> 'a ast_kind -> 'a
where ['a ast_kind] is a GADT type:
type 'a ast_kind =
| Structure : Parsetree.structure ast_kind
| Signature : Parsetree.signature ast_kind
This addresses PR#6475.
In 4.02 the behavior of ocamlc/ocamlopt with regards to these options
was as follows:
* options and arguments are parsed left-to-right in the exact order
in which they are passed, with compilation taking into account
only the options leftwards from it;
* "foo.c" is compiled to "foo.o" in current directory;
* when "-c" is not specified:
* "foo.ml" is compiled to "foo.cmo"/"foo.cmxo"
in current directory;
* after all files have been compiled, if any .ml files are passed,
all provided files are linked as:
* when "-o" is not specified: "a.out" in current directory;
* when "-o out" is specified: "out".
* when "-c" is specified:
* "foo.ml" is compiled to:
* when "-o" is not specified: "foo.cmo"/"foo.cmxo"
in current directory;
* when "-o out" is specified: "out.cmo"/"out.cmxo";
and then compilation proceeds as if the last "-o" option
has disappeared.
* no final link is performed.
The behavior where the build product of the C sources always ended up
in the current directory was problematic: it required buildsystem
hacks to move the file in its proper place and ultimately was racy,
as multiple files with the same basename in different directories
may well end up overwriting each other with e.g. ocamlbuild.
On top of that, the behavior was quite confusing, since it is not
only stateful and dependent on argument order, but also the mere act
of compilation changed state.
The commit 1d8e590c has attempted to rectify that by looking at
the "-o" option when compiling C files, too. After that commit,
the behavior of ocamlc/ocamlopt was as follows (only the handling
of C files was changed, but the entire chart is provided for
posterity):
* options and arguments are parsed left-to-right in the exact order
in which they are passed, with compilation taking into account
only the options leftwards from it;
* "foo.c" is compiled to:
* when "-o" is not specified: "foo.o" in current directory;
* when "-o out" is specified: "out".
* when "-c" is not specified:
* "foo.ml" is compiled to "foo.cmo"/"foo.cmxo"
in current directory;
* after all files have been compiled, if any .ml files are passed,
all provided files are linked as:
* when "-o" is not specified: "a.out" in current directory;
* when "-o out" is specified: "out".
* when "-c" is specified:
* "foo.ml" is compiled to:
* when "-o" is not specified: "foo.cmo"/"foo.cmxo"
in current directory;
* when "-o out" is specified: "out.cmo"/"out.cmxo";
and then compilation proceeds as if the last "-o" option
has disappeared.
* no final link is performed.
There is a non-obvious bug here. Specifically, what happens if more
than one C source file is passed together with a "-o" option? Also,
what happens if a C source file is passed together with a "-o" option
and then a final link is performed? The answer is that
the intermediate build product gets silently overwritten, with quite
opaque errors as a result.
There is some code (and even buildsystems) in the wild that is relying
on the fact that the -o option does not affect compilation of C source
files, e.g. by running commands such as (from ocamlnet):
ocamlc -custom -o t tend.c t.ml
It might seem that the solution would be to make the behavior of
the compiler drivers for C files match that for the OCaml files;
specifically, pretend that the "-o" option has disappeared once
the C compiler has written a build product to the specified place.
However, this would still break the existing code, and moreover
does not address the underlying problem: that the option parsing
of the OCaml compiler driver is confusing and prone to creating
latent bugs.
Instead, this commit finishes (after 1d8e590c and 55d2d420) overhauls
the way option parsing in ocamlc/ocamlopt works to behave as follows:
* options are parsed left-to-right in the order they are specified;
* after all options are parsed, arguments are parsed left-to-right
in the order they were specified;
* when "-o out" and "-c" are specified:
* when more than one file is passed, an error message
is displayed.
* when one file is passed:
* "foo.c" is compiled to "out";
* "foo.ml" is compiled to "out.cmo"/"out.cmxo".
* when "-o out" is not specified or "-c" is not specified:
* "foo.c" is compiled to "foo.o" in current directory;
* "foo.ml" is compiled to "foo.cmo"/"foo.cmxo"
in current directory;
* when "-c" is not specified:
* after all files have been compiled, if any .ml files are passed,
all provided files are linked as:
* when "-o" is not specified: "a.out" in current directory;
* when "-o out" is specified: "out".
In short, the combination of "-o", "-c" and a single source file
compiles that one file to the corresponding intermediate
build product. Otherwise, passing "-o" will either set the name of
the final build product only or error out.
This preserves compatibility with old code, makes the handling of
C and OCaml sources consistent, and overall makes the behavior
of the option parser more straightforward. However, this may break
code that relies on the fact that options are parsed in-order, e.g.
ocamlc -o t a.ml -g b.ml
where debug info would be built only for "b.ml".
Some alternative implementation paths I have considered:
* Collect the C sources and process them after OCaml sources,
while paying attention to any "-o" or "-c" that may have
been set. This doesn't work because compilation of C sources
is also affected by many flags, e.g. "-I", and so this would
have the same drawbacks but none of the benefits;
* Compile C and OCaml sources in-order as usual, but error out
when an improper combination of flags is encountered in
the middle of a compilation. This is technically feasible,
and is the option that maximally preserves compatibility, but
it is very complex: it doubles the amount of implicitly mutated
global state, and there's no guarantee I will get all edge
cases right. Moreover, the option parsing remains confusing,
and I strongly believe that the current behavior should not
remain in place.
On top of that it is hard to imagine cases where setting new options
in the middle of compilation would actually be desirable, because
this mechanism is very inexpressive: it can only add new options and
option values, since there is no way to negate or clear most of
the driver's state. Most likely is that any code that does so,
does it in error and remains operational by pure chance.
The behavior of ocamlc and ocamlopt drivers before this commit is
that the command-line options and arguments are processed exactly
sequentially; encountered options (e.g. "-o") modify the state of
the driver, and encountered arguments (e.g. "t.ml") compile
the corresponding file with whatever state the driver had at the time.
This can be quite confusing, because compiler drivers (e.g. gcc/g++,
clang/clang++, rustc, javac, go, ...) either parse the entire command
line before going on to compile files or reject options after
the first argument (only in the case of go). Thus the behavior
of ocamlc and ocamlopt is unexpected.
The following commit provides another reason for this change.
This module checks all the AST invariants. This is to ensure that all
invariants are written down in one place and are consistently checked
between the various clients of the AST (typer, pprintast, ...).
The invariants are checked in Pparsee, after applying the ppx
rewriters.