This commit is delicate and needs a careful review.
The `matcher_of_pattern` function is a temporary measure to reduce the
invasiveness of the patch, and make it easier to review.
(Note for reviewers: in the previous version the Record case had
a funny handling of Any, but it is in fact equivalent to just adding
omegas as we now do in all cases.)
There are two obvious directions for improvement:
- Get rid of matcher_of_pattern and pass a head directly to the
various make_matching_* functions.
- Try to factorize this code with ctx_matcher which, it is now
obvious, does essentially the same thing.
Another, less immediate area of attack would be to consider
a presentation of Pattern_head.t where the Any case can be statically
ruled out -- maybe the description could have two levels, one
isomorphic to option (Any or not?) and one for non-any heads.
(This commit is more tricky than the previous ones in the patchset
and requires a careful review.)
This refactoring clarifies and simplifies the specialize_matrix logic
by getting rid of the OrPat exception used in a higher-order
way (or sometimes not used in certain matchers, when it is possible to
"push" the or-pattern down in the pattern). Instead it uses an
arity-based criterion to implement the or-pattern-related logic once
in the specializer, instead of having to implement it in each
matcher. As a result, the compiler improves a bit as it will push
or-patterns down during specialization in valid situations that were
not implemented before -- probably because they are not terribly
important in practice: all constant and arity-1 constructs benefit
from optimized or-pattern handling, in particular the following are
new:
- lazy patterns
- non-constant polymorphic variants
- size-one records and arrays
Several functions in the pattern-matching compiler do recursive
"specialization" through a filter_rec helper written in tail-call
style with a 'rem' parameter containing the matrix rows yet to be
processed as input. Typical returns of the function are
foo :: filter_rec rem
to add a new output, and
filter_rec (foo :: rem)
to add a new input to be processed (usually by partial decomposition
of the current input row).
Some places would contain the declaration (let rem = filter_rec rem)
to factorize outputs of the first kind, but this gives a programming
style that is very confusing as `rem` may now represent either an
input or an output of the filter.
Using better types (as will be done farther away in the
pattern-matching refactoring) avoids this problem completely:
specialization then has different input and output types (typically,
from general to half-simple patterns), so incorrectly mixing inputs
and outputs is impossible. Yay typing.
review of can_group factorization by @gasche:
> I reviewed the change to `can_group` and believe it is correct.
> (I knew it would be nicer as a binary operation!)
>
> The different treatments of Lazy and Tuple/Record does look a bit odd,
> but I believe that it is actually the right thing to write.
>
> In the Tuple or Record case, the idea is that we can group Any heads
> with the Tuple/Record case and just explode all of them (including the
> Any cases) according to the tuple/record arity.
>
> In the Lazy case, the corresponding choice would be to add Any values
> to the Lazy group, and force all the elements of the group. This is
> not so nice, because intuitively we shouldn't force Lazy values unless
> we have to.
>
> One may expect that in general the argument of the pattern will be
> forced anyway, as long as there is at least one Lazy pattern above in
> the matrix, so it doesn't really matter whether we include the Any
> patterns in the forced group or not. I would argue that (1) even if
> that was true, it would be semantically dubious to handle Any that way
> (see a more precise criterion below), (2) I am not actually sure that
> this is true, what if the first group gets found unreachable by
> splits in following columns?
>
> # type void = | ;;
> # match (lazy (print_endline "foo"), (None : void option)) with
> | lazy (), Some _ -> .
> | _, None -> false;;
> - : bool = false
>
> This gives the following decision tree for whether Any is included in
> the group:
>
> - Can the absence of Any be used to generate nice/efficient tests for
> the heads of the group? In that case, don't include Any.
> (Not included: all the Constant cases, Array (discriminate on size),
> String, etc.)
>
> - Is Any always exactly equivalent to a more-defined head for values
> of this type? In that case, do include Any, otherwise do not.
> (Included: Variant, Record)
> (Not included: Lazy)
* Move driver code from Cmt2annot to Read_cmt
* Move cmt2annot.ml into typing/
* make depend
* Use standard error handling
* Move specific logic to read_cmt
* Do not pass full cmt record as argument
* Better locations
* Emit .annot files produced from cmt data
* Remove direct calls to Stypes
* Deprecate -annot
* Changes
* make depend
* Adapt doc
* make -C tools depend
There were some patches on the file since our last series of patch that
didn't respect the formatting.
So we're calling ocamlformat again (the same version as before,
something < 0.10) so everything rebases cleanly.
This PR was tested with also the -dsel, -dlinear output (also fixed to
not-print locations), but the output is architecture-dependent so this
part of the test was removed.
pat_attributes and pat_extra nodes should be on the inner value
pattern, rather than on the outer computation pattern, so that a user
looking for a specific value-pattern constructor with a specific
attribute does not need to consider the Tpat_value case specifically.
(Thanks to Alain Frisch for this suggestion.)
Value patterns match on a value (the result of computation), while
computation patterns handle the effects (hint hint) of
a computation. The only forms of computation patterns in OCaml today
are value patterns and exception patterns (exception p).
The sub-pattern `p` of the `lazy p` construction should be
a computation pattern, rather than a value pattern. This pull-request
does not make this change.
Most of the changes in this PR are boilerplate -- it really is a lot
of work now to add a new syntactic category to the typed-tree
syntax. This boilerplate is fairly automatic and should be easy to
review.
There is a subtle part to the patch, though: the implementation of the
pattern type-checking. It now has to reconstruct the value/computation
distinction (absent from the parse-tree), and return values from two
different types. Instead of splitting the type-checker in several
functions (which risked code duplications), I choose to use a GADT to
have the same [type_pat] function return two different types depending
on the caller. This is the least invasive way to adapt this part of
the codebase, whose inherent complexity is so large (unfortunately)
that adding a GADT to the mix barely makes a difference.