ocaml/experimental/frisch/extension_points.txt

741 lines
19 KiB
Plaintext

This file describes the changes on the extension_points branch.
=== Attributes
Attributes are "decorations" of the syntax tree which are ignored by
the type-checker. An attribute is made of an identifier (written id below)
and a payload (written s below).
* The identifier 'id' can be a lowercase or uppercase identifier
(including OCaml keywords) or a sequence of such atomic identifiers
separated with a dots (whitespaces are allowed around the dots).
In the Parsetree, the identifier is represented as a single string
(without spaces).
* The payload 's' can be one of three things:
- An OCaml structure (i.e. a list of structure items). Note that a
structure can be empty or reduced to a single expression.
[@id]
[@id x + 3]
[@id type t = int]
- A type expression, prefixed with the ":" character.
[@id : TYP]
- A pattern, prefixed with the "?" character, and optionally followed
by a "when" clause:
[@id ? PAT]
[@id ? PAT when EXPR]
Attributes on expressions, type expressions, module expressions, module type expressions,
patterns, class expressions, class type expressions:
... [@id s]
The same syntax [@id s] is also available to add attributes on
constructors and labels in type declarations:
type t =
| A [@id1]
| B [@id2] of int [@id3]
Here, id1 (resp. id2) is attached to the constructor A (resp. B)
and id3 is attached to the int type expression. Example on records:
type t =
{
x [@id1]: int;
mutable y [@id2] [@id3]: string [@id4];
}
Attributes on items:
... [@@id s]
Items designate:
- structure and signature items (for type declarations, recursive modules, class
declarations and class type declarations, each component has its own attributes)
- class fields and class type fields
- each binding in a let declaration (for let structure item, local let-bindings in
expression and class expressions)
For instance, consider:
type t1 = ... [@@id1] [@@id2] and t2 = ... [@@id3] [@@id4]
Here, the attributes on t1 are id1, id23; the attributes on
t2 are id3 and id4.
Similarly for:
let x1 = ... [@@id1] [@@id2] and x2 = ... [@@id3] [@@id4]
Floating attributes:
The [@@@id s] form defines an attribute which stands as a
stand-alone signature or structure item (not attached to another
item).
Example:
module type S = sig
[@@id1]
type t
[@@id2]
[@@@id3] [@@@id4]
[@@@id5]
type s
[@@id6]
end
Here, id1, id3, id4, id5 are floating attributes, while
id2 is attached to the type t and id6 is attached to the type s.
=== Extension nodes
Extension nodes replace valid components in the syntax tree. They are
normally interpreted and expanded by AST mapper. The type-checker
fails when it encounters such an extension node. An extension node is
made of an identifier (an "LIDENT", written id below) and an optional
expression (written expr below).
Two syntaxes exist for extension node:
As expressions, type expressions, module expressions, module type expressions,
patterns, class expressions, class type expressions:
[%id s]
As structure item, signature item, class field, class type field:
[%%id s]
As other structure item, signature item, class field or class type
field, attributes can be attached to a [%%id s] extension node.
=== Alternative syntax for attributes and extensions on specific kinds of nodes
All expression constructions starting with a keyword (EXPR = KW REST) support an
alternative syntax for attributes and/or extensions:
KW[@id s]...[@id s] REST
---->
EXPR[@id s]...[@id s]
KW%id REST
---->
[%id EXPR]
KW%id[@id s]...[@id s] REST
---->
[%id EXPR[@id s]...[@id s]]
where KW can stand for:
assert
begin
for
fun
function
if
lazy
let
let module
let open
match
new
object
try
while
For instance:
let[@foo] x = 2 in x + 1 ==== (let x = 2 in x + 1)[@foo]
begin[@foo] ... end ==== (begin ... end)[@foo]
match%foo e with ... ==== [%foo match e with ...]
The let-binding form of structure items also supports this form:
let%foo x = ... ==== [%%foo let x = ...]
=== Quoted strings
Quoted strings gives a different syntax to write string literals in
OCaml code. This will typically be used to support embedding pieces
of foreign syntax fragments (to be interpret by a -ppx filter or just
a library) in OCaml code.
The opening delimiter has the form {id| where id is a (possibly empty)
sequence of lowercase letters. The corresponding closing delimiter is
|id} (the same identifier). Contrary to regular OCaml string
literals, quoted strings don't interpret any character in a special
way.
Example:
String.length {|\"|} (* returns 2 *)
String.length {foo|\"|foo} (* returns 2 *)
The fact that a string literal comes from a quoted string is kept in
the Parsetree representation. The Astypes.Const_string constructor is
now defined as:
| Const_string of string * string option
where the "string option" represents the delimiter (None for a string
literal with the regular syntax).
=== Representation of attributes in the Parsetree
Attributes as standalone signature/structure items are represented
by a new constructor:
| Psig_attribute of attribute
| Pstr_attribute of attribute
Most other attributes are stored in an extra field in their record:
and expression = {
...
pexp_attributes: attribute list;
...
}
and type_declaration = {
...
ptype_attributes: attribute list;
...
}
In a previous version, attributes on expressions (and types, patterns,
etc) used to be stored as a new constructor. The current choice makes
it easier to pattern match on structured AST fragments while ignoring
attributes.
For open/include signature/structure items and exception rebind
structure item, the attributes are stored directly in the constructor
of the item:
| Pstr_open of Longident.t loc * attribute list
=== Attributes in the Typedtree
The Typedtree representation has been updated to follow closely the
Parsetree, and attributes are kept exactly as in the Parsetree. This
can allow external tools to process .cmt/.cmti files and process
attributes in them. An example of a mini-ocamldoc based on this
technique is in experimental/frisch/minidoc.ml.
=== Other changes to the parser and Parsetree
--- Introducing Ast_helper module
This module simplifies the creation of AST fragments, without having to
touch the concrete type definitions of Parsetree. Record and sum types
are encapsulated in builder functions, with some optional arguments, e.g.
to represent attributes.
--- Relaxing the syntax for signatures and structures
It is now possible to start a signature or a structure with a ";;" token and to have two successive ";;" tokens.
Rationale:
In an intermediate version of this branch, floating attributes shared
the same syntax as item attributes, with the constraints that they
had to appear either at the beginning of their structure or signature,
or after ";;". The relaxation above made is possible to always prefix
a floating attributes by ";;" independently of its context.
Floating attributes now have a custom syntax [@@@id], but this changes
is harmless, and the same argument holds for toplevel expressions:
it is always possile to write:
;; print_endline "bla";;
without having to care about whether the previous structure item
ends with ";;" or not.
-- Relaxing the syntax for exception declarations
The parser now accepts the same syntax for exceptioon declarations as for constructor declarations,
which permits the GADT syntax:
exception A : int -> foo
The type-checker rejects this form. Note that it is also possible to
define exception whose name is () or ::.
Attributes can be put on the constructor or on the whole declaration:
exception A[@foo] of int [@@bar]
Rationale:
One less notion in the Parsetree, more uniform parsing. Also
open the door to existentials in exception constructors.
--- Relaxing the syntax for recursive modules
Before:
module X1 : MT1 = M1 and ... and Xn : MTn = Mn
Now:
module X1 = M1 and ... and Xn = Mn
(with the usual sugar that Xi = (Mi : MTi) can be written as Xi : MTi = Mi
which gives the old syntax)
The type-checker fails when a module expression is not of
the form (M : MT)
Rationale:
1. More uniform representation in the Parsetree.
2. The type-checker can be made more clever in the future to support
other forms of module expressions (e.g. functions with an explicit
constraint on its result; or a structure with only type-level
components).
--- Turning some tuple or n-ary constructors into records
Before:
| Pstr_module of string loc * module_expr
After:
| Pstr_module of module_binding
...
and module_binding =
{
pmb_name: string loc;
pmb_expr: module_expr;
pmb_attributes: attribute list;
}
Rationale:
More self-documented, more robust to future additions (such as
attributes), simplifies some code.
--- Keeping names inside value_description and type_declaration
Before:
| Psig_type of (string loc * type_declaration) list
After:
| Psig_type of type_declaration list
....
and type_declaration =
{ ptype_name: string loc;
...
}
Rationale:
More self-documented, simplifies some code.
--- Better representation of variance information on type parameters
Introduced a new type Asttypes.variance to represent variance
(Covariant/Contravariant/Invariant) and use it instead of bool * bool
in Parsetree. Moreover, variance information is now attached
directly to the parameters fields:
and type_declaration =
{ ptype_name: string loc;
- ptype_params: string loc option list;
+ ptype_params: (string loc option * variance) list;
ptype_cstrs: (core_type * core_type * Location.t) list;
ptype_kind: type_kind;
ptype_private: private_flag;
ptype_manifest: core_type option;
- ptype_variance: (bool * bool) list;
ptype_attributes: attribute list;
ptype_loc: Location.t }
--- Getting rid of 'Default' case in Astypes.rec_flag
This constructor was used internally only during the compilation of
default expression for optional arguments, in order to trigger a
subsequent optimization (see PR#5975). This behavior is now
implemented by creating an attribute internally (whose name "#default"
cannot be used in real programs).
Rationale:
- Attributes give a way to encode information local to the
type-checker without polluting the definition of the Parsetree.
--- Simpler and more faithful representation of object types
- | Ptyp_object of core_field_type list
+ | Ptyp_object of (string * core_type) list * closed_flag
(and get rid of Parsetree.core_field_type)
And same in the Typedtree.
Rationale:
- More faithful representation of the syntax really supported
(i.e. the ".." can only be the last field).
- One less "concept" in the Parsetree.
--- Do not require empty Ptyp_poly nodes in the Parsetree
The type-checker automatically inserts Ptyp_poly node (with no
variable) where needed. It is still allowed to put empty
Ptyp_poly nodes in the Parsetree.
Rationale:
- Less chance that Ast-related code forget to insert those nodes.
To be discussed: should we segrate simple_poly_type from core_type in the
Parsetree to prevent Ptyp_poly nodes to be inserted in the wrong place?
--- Use constructor names closer to concrete syntax
E.g. Pcf_cstr -> Pcf_constraint.
Rationale:
- Make the Parsetree more self-documented.
--- Merge concrete/virtual val and method constructors
As in the Typedtree.
- | Pcf_valvirt of (string loc * mutable_flag * core_type)
- | Pcf_val of (string loc * mutable_flag * override_flag * expression)
- | Pcf_virt of (string loc * private_flag * core_type)
- | Pcf_meth of (string loc * private_flag * override_flag * expression)
+ | Pcf_val of (string loc * mutable_flag * class_field_kind)
+ | Pcf_method of (string loc * private_flag * class_field_kind
...
+and class_field_kind =
+ | Cfk_virtual of core_type
+ | Cfk_concrete of override_flag * expression
+
--- Explicit representation of "when" guards
Replaced the "(pattern * expression) list" argument of Pexp_function, Pexp_match, Pexp_try
with "case list", with case defined as:
{
pc_lhs: pattern;
pc_guard: expression option;
pc_rhs: expression;
}
and get rid of Pexp_when. Idem in the Typedtree.
Rationale:
- Make it explicit when the guard can appear.
--- Get rid of "fun p when guard -> e"
See #5939, #5936.
--- Get rid of the location argument on pci_params
It was only used for error messages, and we get better location using
the location of each parameter variable.
--- More faithful representation of "with constraint"
All kinds of "with constraints" used to be represented together with a
Longident.t denoting the constrained identifier. Now, each constraint
keeps its own constrainted identifier, which allows us to express more
invariants in the Parsetree (such as: := constraints cannot be on qualified
identifiers). Also, we avoid mixing in a single Longident.t identifier
which can be LIDENT or UIDENT.
--- Get rid of the "#c [> `A]" syntax
See #5936, #5983.
--- Keep interval patterns in the Parsetree
They used to be expanded into or-patterns by the parser. It is better to do
the expansion in the type-checker to allow -ppx rewriters to see the interval
patterns.
Note: Camlp4 parsers still expand interval patterns themselves (TODO?).
--- Get rid of Pexp_assertfalse
Do not treat specially "assert false" in the parser any more, but
instead in the type-checker. This simplifies the Parsetree and avoids
a potential source of confusion. Moreove, this ensures that
attributes can be put (and used by ppx rewriters) on the "false"
expressions. This is also more robust, since it checks that the
condition is the constructor "false" after type-checking the condition:
- if "false" is redefined (as a constructor of a different sum type),
an error will be reported;
- "extra" layers which are represented as exp_extra in the typedtree
won't break the detection of the "false", e.g. the following will
be recognized as "assert false":
assert(false : bool)
assert(let open X in false)
Note: Camlp4's AST still has a special representation for "assert false".
--- Get rid of the "explicit arity" flag on Pexp_construct/Ppat_construct
This Boolean was used (only by camlp5?) to indicate that the tuple
(expression/pattern) used as the argument was intended to correspond
to the arity of an n-ary constructor. In particular, this allowed
the revised syntax to distinguish "A x y" from "A (x, y)" (the second one
being wrapped in an extra fake tuple) and get a proper error message
if "A (x, y)" was used with a constructor expecting two arguments.
The feature has been preserved, but the information that a
Pexp_construct/Ppat_constructo node has an "exact arity" is now
propagated used as am attribute "ocaml.explicit_arity" on that node.
--- Split Pexp_function into Pexp_function/Pexp_fun
This reflects more closely the concrete syntax and removes cases of
Parsetree fragments which don't correspond to concrete syntax.
Typedtree has not been changed.
Note: Camlp4's AST has not been adapted.
--- Split Pexp_constraint into Pexp_constraint/Pexp_coerce
Idem in the Typedtree.
This reflects more closely the concrete syntax.
Note: Camlp4's AST has not been adapted.
--- Accept abstract module type declaration in structures
Previously, we could declare:
module type S
in signatures, but not implementations. To make the syntax, the Parsetree
and the type-checker more uniform, this is now also allowed in structures
(altough this is probably useless in practice).
=== More TODOs
- Adapt pprintast to print attributes and extension nodes.
- Adapt Camlp4 (both its parser(s) and its internal representation of OCaml ASTs).
- Consider adding hooks to the type-checker so that custom extension expanders can be registered (a la OCaml Templates).
- Make the Ast_helper module more user-friendly (e.g. with optional arguments and good default values) and/or
expose higher-level convenience functions.
- Document Ast_helper modules.
=== Use cases
From https://github.com/gasche/ocaml-syntax-extension-discussion/wiki/Use-Cases
-- Bisect
let f x =
match List.map foo [x; a x; b x] with
| [y1; y2; y3] -> tata
| _ -> assert false [@bisect VISIT]
;;[@@bisect IGNORE-BEGIN]
let unused = ()
;;[@@bisect IGNORE-END]
-- OCamldoc
val stats : ('a, 'b) t -> statistics
[@@doc
"[Hashtbl.stats tbl] returns statistics about the table [tbl]:
number of buckets, size of the biggest bucket, distribution of
buckets by size."
]
[@@since "4.00.0"]
;;[@@doc section 6 "Functorial interface"]
module type HashedType =
sig
type t
[@@doc "The type of the hashtable keys."]
val equal : t -> t -> bool
[@@doc "The equality predicate used to compare keys."]
end
-- type-conv, deriving
type t = {
x : int [@default 42];
y : int [@default 3] [@sexp_drop_default];
z : int [@default 3] [@sexp_drop_if z_test];
} [@@sexp]
type r1 = {
r1_l1 : int;
r1_l2 : int;
} [@@deriving (Dump, Eq, Show, Typeable, Pickle, Functor)]
-- camlp4 map/fold generators
type variable = string
and term =
| Var of variable
| Lam of variable * term
| App of term * term
class map = [%generate_map term]
or:
[%%generate_map map term]
-- ocaml-rpc
type t = { foo [@rpc "type"]: int; bar [@rpc "let"]: int }
[@@ rpc]
or:
type t = { foo: int; bar: int }
[@@ rpc ("foo" > "type"), ("bar" > "let")]
-- pa_monad
begin%monad
a <-- [1; 2; 3];
b <-- [3; 4; 5];
return (a + b)
end
-- pa_lwt
let%lwt x = start_thread foo
and y = start_other_thread foo in
try%lwt
let%for_lwt (x, y) = waiting_threads in
compute blah
with Killed -> bar
-- Bolt
let funct n =
[%log "funct(%d)" n LEVEL DEBUG];
for i = 1 to n do
print_endline "..."
done
-- pre-polyrecord
let r = [%polyrec x = 1; y = ref None]
let () = [%polyrec r.y <- Some 2]
-- orakuda
function%regexp
| "$/^[0-9]+$/" as v -> `Int (int_of_string v#_0)
| "$/^[a-z][A-Za-z0-9_]*$" as v -> `Variable v#_0
| _ -> failwith "parse error"
-- bitstring
let bits = Bitstring.bitstring_of_file "/bin/ls" in
match%bitstring bits with
| [ 0x7f, 8; "ELF", 24, string; (* ELF magic number *)
e_ident, Mul(12,8), bitstring; (* ELF identifier *)
e_type, 16, littleendian; (* object file type *)
e_machine, 16, littleendian (* architecture *)
] ->
printf "This is an ELF binary, type %d, arch %d\n"
e_type e_machine
-- sedlex
let rec token buf =
let%regexp ('a'..'z'|'A'..'Z') = letter in
match%sedlex buf with
| number -> Printf.printf "Number %s\n" (Sedlexing.Latin1.lexeme buf); token buf
| letter, Star ('A'..'Z' | 'a'..'z' | digit) -> Printf.printf "Ident %s\n" (Sedlexing.Latin1.lexeme buf); token buf
| Plus xml_blank -> token buf
| Plus (Chars "+*-/") -> Printf.printf "Op %s\n" (Sedlexing.Latin1.lexeme buf); token buf
| Range(128,255) -> print_endline "Non ASCII"
| eof -> print_endline "EOF"
| _ -> failwith "Unexpected character"
-- cppo
[%%ifdef DEBUG]
[%%define debug(s) = Printf.eprintf "[%S %i] %s\n%!" __FILE__ __LINE__ s]
[%%else]
[%%define debug(s) = ()]
[%%endif]
debug("test")
-- PG'OCaml
let fetch_users dbh =
[%pgsql dbh "select id, name from users"]
-- Macaque
let names view = [%view {name = t.name}, t <- !view]"
-- Cass
let color1 = [%css{| black |}]
let color2 = [%css{| gray |}]
let button = [%css{|
.button {
$Css.gradient ~low:color2 ~high:color1$;
color: white;
$Css.top_rounded$;
|}]