979 lines
38 KiB
Plaintext
979 lines
38 KiB
Plaintext
\chapter{The core language} \label{c:core-xamples}
|
|
%HEVEA\cutname{coreexamples.html}
|
|
|
|
This part of the manual is a tutorial introduction to the
|
|
OCaml language. A good familiarity with programming in a conventional
|
|
languages (say, C or Java) is assumed, but no prior exposure to
|
|
functional languages is required. The present chapter introduces the
|
|
core language. Chapter~\ref{c:moduleexamples} deals with the
|
|
module system, chapter~\ref{c:objectexamples} with the
|
|
object-oriented features, chapter~\ref{c:labl-examples} with
|
|
extensions to the core language (labeled arguments and polymorphic
|
|
variants), and chapter~\ref{c:advexamples} gives some advanced examples.
|
|
|
|
\section{s:basics}{Basics}
|
|
|
|
For this overview of OCaml, we use the interactive system, which
|
|
is started by running "ocaml" from the Unix shell, or by launching the
|
|
"OCamlwin.exe" application under Windows. This tutorial is presented
|
|
as the transcript of a session with the interactive system:
|
|
lines starting with "#" represent user input; the system responses are
|
|
printed below, without a leading "#".
|
|
|
|
Under the interactive system, the user types OCaml phrases terminated
|
|
by ";;" in response to the "#" prompt, and the system compiles them
|
|
on the fly, executes them, and prints the outcome of evaluation.
|
|
Phrases are either simple expressions, or "let" definitions of
|
|
identifiers (either values or functions).
|
|
\begin{caml_example}{toplevel}
|
|
1+2*3;;
|
|
let pi = 4.0 *. atan 1.0;;
|
|
let square x = x *. x;;
|
|
square (sin pi) +. square (cos pi);;
|
|
\end{caml_example}
|
|
The OCaml system computes both the value and the type for
|
|
each phrase. Even function parameters need no explicit type declaration:
|
|
the system infers their types from their usage in the
|
|
function. Notice also that integers and floating-point numbers are
|
|
distinct types, with distinct operators: "+" and "*" operate on
|
|
integers, but "+." and "*." operate on floats.
|
|
\begin{caml_example}{toplevel}[error]
|
|
1.0 * 2;;
|
|
\end{caml_example}
|
|
|
|
Recursive functions are defined with the "let rec" binding:
|
|
\begin{caml_example}{toplevel}
|
|
let rec fib n =
|
|
if n < 2 then n else fib (n-1) + fib (n-2);;
|
|
fib 10;;
|
|
\end{caml_example}
|
|
|
|
\section{s:datatypes}{Data types}
|
|
|
|
In addition to integers and floating-point numbers, OCaml offers the
|
|
usual basic data types:
|
|
\begin{itemize}%
|
|
\item booleans
|
|
\begin{caml_example}{toplevel}
|
|
(1 < 2) = false;;
|
|
let one = if true then 1 else 2;;
|
|
\end{caml_example}
|
|
\item characters
|
|
\begin{caml_example}{toplevel}
|
|
'a';;
|
|
int_of_char '\n';;
|
|
\end{caml_example}
|
|
\item immutable character strings
|
|
\begin{caml_example}{toplevel}
|
|
"Hello" ^ " " ^ "world";;
|
|
{|This is a quoted string, here, neither \ nor " are special characters|};;
|
|
{|"\\"|}="\"\\\\\"";;
|
|
{delimiter|the end of this|}quoted string is here|delimiter}
|
|
= "the end of this|}quoted string is here";;
|
|
\end{caml_example}
|
|
\end{itemize}
|
|
|
|
Predefined data structures include tuples, arrays, and lists. There are also
|
|
general mechanisms for defining your own data structures, such as records and
|
|
variants, which will be covered in more detail later; for now, we concentrate
|
|
on lists. Lists are either given in extension as a bracketed list of
|
|
semicolon-separated elements, or built from the empty list "[]"
|
|
(pronounce ``nil'') by adding elements in front using the "::"
|
|
(``cons'') operator.
|
|
\begin{caml_example}{toplevel}
|
|
let l = ["is"; "a"; "tale"; "told"; "etc."];;
|
|
"Life" :: l;;
|
|
\end{caml_example}
|
|
As with all other OCaml data structures, lists do not need to be
|
|
explicitly allocated and deallocated from memory: all memory
|
|
management is entirely automatic in OCaml. Similarly, there is no
|
|
explicit handling of pointers: the OCaml compiler silently introduces
|
|
pointers where necessary.
|
|
|
|
As with most OCaml data structures, inspecting and destructuring lists
|
|
is performed by pattern-matching. List patterns have exactly the same
|
|
form as list expressions, with identifiers representing unspecified
|
|
parts of the list. As an example, here is insertion sort on a list:
|
|
\begin{caml_example}{toplevel}
|
|
let rec sort lst =
|
|
match lst with
|
|
[] -> []
|
|
| head :: tail -> insert head (sort tail)
|
|
and insert elt lst =
|
|
match lst with
|
|
[] -> [elt]
|
|
| head :: tail -> if elt <= head then elt :: lst else head :: insert elt tail
|
|
;;
|
|
sort l;;
|
|
\end{caml_example}
|
|
|
|
The type inferred for "sort", "'a list -> 'a list", means that "sort"
|
|
can actually apply to lists of any type, and returns a list of the
|
|
same type. The type "'a" is a {\em type variable}, and stands for any
|
|
given type. The reason why "sort" can apply to lists of any type is
|
|
that the comparisons ("=", "<=", etc.) are {\em polymorphic} in OCaml:
|
|
they operate between any two values of the same type. This makes
|
|
"sort" itself polymorphic over all list types.
|
|
\begin{caml_example}{toplevel}
|
|
sort [6;2;5;3];;
|
|
sort [3.14; 2.718];;
|
|
\end{caml_example}
|
|
|
|
The "sort" function above does not modify its input list: it builds
|
|
and returns a new list containing the same elements as the input list,
|
|
in ascending order. There is actually no way in OCaml to modify
|
|
a list in-place once it is built: we say that lists are {\em immutable}
|
|
data structures. Most OCaml data structures are immutable, but a few
|
|
(most notably arrays) are {\em mutable}, meaning that they can be
|
|
modified in-place at any time.
|
|
|
|
The OCaml notation for the type of a function with multiple arguments is \\
|
|
"arg1_type -> arg2_type -> ... -> return_type". For example,
|
|
the type inferred for "insert", "'a -> 'a list -> 'a list", means that "insert"
|
|
takes two arguments, an element of any type "'a" and a list with elements of
|
|
the same type "'a" and returns a list of the same type.
|
|
\section{s:functions-as-values}{Functions as values}
|
|
|
|
OCaml is a functional language: functions in the full mathematical
|
|
sense are supported and can be passed around freely just as any other
|
|
piece of data. For instance, here is a "deriv" function that takes any
|
|
float function as argument and returns an approximation of its
|
|
derivative function:
|
|
\begin{caml_example}{toplevel}
|
|
let deriv f dx = function x -> (f (x +. dx) -. f x) /. dx;;
|
|
let sin' = deriv sin 1e-6;;
|
|
sin' pi;;
|
|
\end{caml_example}
|
|
Even function composition is definable:
|
|
\begin{caml_example}{toplevel}
|
|
let compose f g = function x -> f (g x);;
|
|
let cos2 = compose square cos;;
|
|
\end{caml_example}
|
|
|
|
Functions that take other functions as arguments are called
|
|
``functionals'', or ``higher-order functions''. Functionals are
|
|
especially useful to provide iterators or similar generic operations
|
|
over a data structure. For instance, the standard OCaml library
|
|
provides a "List.map" functional that applies a given function to each
|
|
element of a list, and returns the list of the results:
|
|
\begin{caml_example}{toplevel}
|
|
List.map (function n -> n * 2 + 1) [0;1;2;3;4];;
|
|
\end{caml_example}
|
|
This functional, along with a number of other list and array
|
|
functionals, is predefined because it is often useful, but there is
|
|
nothing magic with it: it can easily be defined as follows.
|
|
\begin{caml_example}{toplevel}
|
|
let rec map f l =
|
|
match l with
|
|
[] -> []
|
|
| hd :: tl -> f hd :: map f tl;;
|
|
\end{caml_example}
|
|
|
|
\section{s:tut-recvariants}{Records and variants}
|
|
|
|
User-defined data structures include records and variants. Both are
|
|
defined with the "type" declaration. Here, we declare a record type to
|
|
represent rational numbers.
|
|
\begin{caml_example}{toplevel}
|
|
type ratio = {num: int; denom: int};;
|
|
let add_ratio r1 r2 =
|
|
{num = r1.num * r2.denom + r2.num * r1.denom;
|
|
denom = r1.denom * r2.denom};;
|
|
add_ratio {num=1; denom=3} {num=2; denom=5};;
|
|
\end{caml_example}
|
|
Record fields can also be accessed through pattern-matching:
|
|
\begin{caml_example}{toplevel}
|
|
let integer_part r =
|
|
match r with
|
|
{num=num; denom=denom} -> num / denom;;
|
|
\end{caml_example}
|
|
Since there is only one case in this pattern matching, it
|
|
is safe to expand directly the argument "r" in a record pattern:
|
|
\begin{caml_example}{toplevel}
|
|
let integer_part {num=num; denom=denom} = num / denom;;
|
|
\end{caml_example}
|
|
Unneeded fields can be omitted:
|
|
\begin{caml_example}{toplevel}
|
|
let get_denom {denom=denom} = denom;;
|
|
\end{caml_example}
|
|
Optionally, missing fields can be made explicit by ending the list of
|
|
fields with a trailing wildcard "_"::
|
|
\begin{caml_example}{toplevel}
|
|
let get_num {num=num; _ } = num;;
|
|
\end{caml_example}
|
|
When both sides of the "=" sign are the same, it is possible to avoid
|
|
repeating the field name by eliding the "=field" part:
|
|
\begin{caml_example}{toplevel}
|
|
let integer_part {num; denom} = num / denom;;
|
|
\end{caml_example}
|
|
This short notation for fields also works when constructing records:
|
|
\begin{caml_example}{toplevel}
|
|
let ratio num denom = {num; denom};;
|
|
\end{caml_example}
|
|
At last, it is possible to update few fields of a record at once:
|
|
\begin{caml_example}{toplevel}
|
|
let integer_product integer ratio = { ratio with num = integer * ratio.num };;
|
|
\end{caml_example}
|
|
With this functional update notation, the record on the left-hand side
|
|
of "with" is copied except for the fields on the right-hand side which
|
|
are updated.
|
|
|
|
The declaration of a variant type lists all possible forms for values
|
|
of that type. Each case is identified by a name, called a constructor,
|
|
which serves both for constructing values of the variant type and
|
|
inspecting them by pattern-matching. Constructor names are capitalized
|
|
to distinguish them from variable names (which must start with a
|
|
lowercase letter). For instance, here is a variant
|
|
type for doing mixed arithmetic (integers and floats):
|
|
\begin{caml_example}{toplevel}
|
|
type number = Int of int | Float of float | Error;;
|
|
\end{caml_example}
|
|
This declaration expresses that a value of type "number" is either an
|
|
integer, a floating-point number, or the constant "Error" representing
|
|
the result of an invalid operation (e.g. a division by zero).
|
|
|
|
Enumerated types are a special case of variant types, where all
|
|
alternatives are constants:
|
|
\begin{caml_example}{toplevel}
|
|
type sign = Positive | Negative;;
|
|
let sign_int n = if n >= 0 then Positive else Negative;;
|
|
\end{caml_example}
|
|
|
|
To define arithmetic operations for the "number" type, we use
|
|
pattern-matching on the two numbers involved:
|
|
\begin{caml_example}{toplevel}
|
|
let add_num n1 n2 =
|
|
match (n1, n2) with
|
|
(Int i1, Int i2) ->
|
|
(* Check for overflow of integer addition *)
|
|
if sign_int i1 = sign_int i2 && sign_int (i1 + i2) <> sign_int i1
|
|
then Float(float i1 +. float i2)
|
|
else Int(i1 + i2)
|
|
| (Int i1, Float f2) -> Float(float i1 +. f2)
|
|
| (Float f1, Int i2) -> Float(f1 +. float i2)
|
|
| (Float f1, Float f2) -> Float(f1 +. f2)
|
|
| (Error, _) -> Error
|
|
| (_, Error) -> Error;;
|
|
add_num (Int 123) (Float 3.14159);;
|
|
\end{caml_example}
|
|
|
|
Another interesting example of variant type is the built-in
|
|
"'a option" type which represents either a value of type "'a" or an
|
|
absence of value:
|
|
\begin{caml_example}{toplevel}
|
|
type 'a option = Some of 'a | None;;
|
|
\end{caml_example}
|
|
This type is particularly useful when defining function that can
|
|
fail in common situations, for instance
|
|
\begin{caml_example}{toplevel}
|
|
let safe_square_root x = if x > 0. then Some(sqrt x) else None;;
|
|
\end{caml_example}
|
|
|
|
The most common usage of variant types is to describe recursive data
|
|
structures. Consider for example the type of binary trees:
|
|
\begin{caml_example}{toplevel}
|
|
type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;;
|
|
\end{caml_example}
|
|
This definition reads as follows: a binary tree containing values of
|
|
type "'a" (an arbitrary type) is either empty, or is a node containing
|
|
one value of type "'a" and two subtrees also containing values of type
|
|
"'a", that is, two "'a btree".
|
|
|
|
Operations on binary trees are naturally expressed as recursive functions
|
|
following the same structure as the type definition itself. For
|
|
instance, here are functions performing lookup and insertion in
|
|
ordered binary trees (elements increase from left to right):
|
|
\begin{caml_example}{toplevel}
|
|
let rec member x btree =
|
|
match btree with
|
|
Empty -> false
|
|
| Node(y, left, right) ->
|
|
if x = y then true else
|
|
if x < y then member x left else member x right;;
|
|
let rec insert x btree =
|
|
match btree with
|
|
Empty -> Node(x, Empty, Empty)
|
|
| Node(y, left, right) ->
|
|
if x <= y then Node(y, insert x left, right)
|
|
else Node(y, left, insert x right);;
|
|
\end{caml_example}
|
|
|
|
|
|
\subsection{ss:record-and-variant-disambiguation}{Record and variant disambiguation}
|
|
( This subsection can be skipped on the first reading )
|
|
|
|
Astute readers may have wondered what happens when two or more record
|
|
fields or constructors share the same name
|
|
|
|
\begin{caml_example*}{toplevel}
|
|
type first_record = { x:int; y:int; z:int }
|
|
type middle_record = { x:int; z:int }
|
|
type last_record = { x:int };;
|
|
type first_variant = A | B | C
|
|
type last_variant = A;;
|
|
\end{caml_example*}
|
|
|
|
The answer is that when confronted with multiple options, OCaml tries to
|
|
use locally available information to disambiguate between the various fields
|
|
and constructors. First, if the type of the record or variant is known,
|
|
OCaml can pick unambiguously the corresponding field or constructor.
|
|
For instance:
|
|
|
|
\begin{caml_example}{toplevel}
|
|
let look_at_x_then_z (r:first_record) =
|
|
let x = r.x in
|
|
x + r.z;;
|
|
let permute (x:first_variant) = match x with
|
|
| A -> (B:first_variant)
|
|
| B -> A
|
|
| C -> C;;
|
|
type wrapped = First of first_record
|
|
let f (First r) = r, r.x;;
|
|
\end{caml_example}
|
|
|
|
In the first example, "(r:first_record)" is an explicit annotation
|
|
telling OCaml that the type of "r" is "first_record". With this
|
|
annotation, Ocaml knows that "r.x" refers to the "x" field of the first
|
|
record type. Similarly, the type annotation in the second example makes
|
|
it clear to OCaml that the constructors "A", "B" and "C" come from the
|
|
first variant type. Contrarily, in the last example, OCaml has inferred
|
|
by itself that the type of "r" can only be "first_record" and there are
|
|
no needs for explicit type annotations.
|
|
|
|
Those explicit type annotations can in fact be used anywhere.
|
|
Most of the time they are unnecessary, but they are useful to guide
|
|
disambiguation, to debug unexpected type errors, or combined with some
|
|
of the more advanced features of OCaml described in later chapters.
|
|
|
|
Secondly, for records, OCaml can also deduce the right record type by
|
|
looking at the whole set of fields used in a expression or pattern:
|
|
\begin{caml_example}{toplevel}
|
|
let project_and_rotate {x;y; _ } = { x= - y; y = x ; z = 0} ;;
|
|
\end{caml_example}
|
|
Since the fields "x" and "y" can only appear simultaneously in the first
|
|
record type, OCaml infers that the type of "project_and_rotate" is
|
|
"first_record -> first_record".
|
|
|
|
In last resort, if there is not enough information to disambiguate between
|
|
different fields or constructors, Ocaml picks the last defined type
|
|
amongst all locally valid choices:
|
|
|
|
\begin{caml_example}{toplevel}
|
|
let look_at_xz {x;z} = x;;
|
|
\end{caml_example}
|
|
|
|
Here, OCaml has inferred that the possible choices for the type of
|
|
"{x;z}" are "first_record" and "middle_record", since the type
|
|
"last_record" has no field "z". Ocaml then picks the type "middle_record"
|
|
as the last defined type between the two possibilities.
|
|
|
|
Beware that this last resort disambiguation is local: once Ocaml has
|
|
chosen a disambiguation, it sticks to this choice, even if it leads to
|
|
an ulterior type error:
|
|
|
|
\begin{caml_example}{toplevel}[error]
|
|
let look_at_x_then_y r =
|
|
let x = r.x in (* Ocaml deduces [r: last_record] *)
|
|
x + r.y;;
|
|
let is_a_or_b x = match x with
|
|
| A -> true (* OCaml infers [x: last_variant] *)
|
|
| B -> true;;
|
|
\end{caml_example}
|
|
|
|
Moreover, being the last defined type is a quite unstable position that
|
|
may change surreptitiously after adding or moving around a type
|
|
definition, or after opening a module (see chapter \ref{c:moduleexamples}).
|
|
Consequently, adding explicit type annotations to guide disambiguation is
|
|
more robust than relying on the last defined type disambiguation.
|
|
|
|
\section{s:imperative-features}{Imperative features}
|
|
|
|
Though all examples so far were written in purely applicative style,
|
|
OCaml is also equipped with full imperative features. This includes the
|
|
usual "while" and "for" loops, as well as mutable data structures such
|
|
as arrays. Arrays are either created by listing semicolon-separated element
|
|
values between "[|" and "|]" brackets, or allocated and initialized with the
|
|
"Array.make" function, then filled up later by assignments. For instance, the
|
|
function below sums two vectors (represented as float arrays) componentwise.
|
|
\begin{caml_example}{toplevel}
|
|
let add_vect v1 v2 =
|
|
let len = min (Array.length v1) (Array.length v2) in
|
|
let res = Array.make len 0.0 in
|
|
for i = 0 to len - 1 do
|
|
res.(i) <- v1.(i) +. v2.(i)
|
|
done;
|
|
res;;
|
|
add_vect [| 1.0; 2.0 |] [| 3.0; 4.0 |];;
|
|
\end{caml_example}
|
|
|
|
Record fields can also be modified by assignment, provided they are
|
|
declared "mutable" in the definition of the record type:
|
|
\begin{caml_example}{toplevel}
|
|
type mutable_point = { mutable x: float; mutable y: float };;
|
|
let translate p dx dy =
|
|
p.x <- p.x +. dx; p.y <- p.y +. dy;;
|
|
let mypoint = { x = 0.0; y = 0.0 };;
|
|
translate mypoint 1.0 2.0;;
|
|
mypoint;;
|
|
\end{caml_example}
|
|
|
|
OCaml has no built-in notion of variable -- identifiers whose current
|
|
value can be changed by assignment. (The "let" binding is not an
|
|
assignment, it introduces a new identifier with a new scope.)
|
|
However, the standard library provides references, which are mutable
|
|
indirection cells, with operators "!" to fetch
|
|
the current contents of the reference and ":=" to assign the contents.
|
|
Variables can then be emulated by "let"-binding a reference. For
|
|
instance, here is an in-place insertion sort over arrays:
|
|
\begin{caml_example}{toplevel}
|
|
let insertion_sort a =
|
|
for i = 1 to Array.length a - 1 do
|
|
let val_i = a.(i) in
|
|
let j = ref i in
|
|
while !j > 0 && val_i < a.(!j - 1) do
|
|
a.(!j) <- a.(!j - 1);
|
|
j := !j - 1
|
|
done;
|
|
a.(!j) <- val_i
|
|
done;;
|
|
\end{caml_example}
|
|
|
|
References are also useful to write functions that maintain a current
|
|
state between two calls to the function. For instance, the following
|
|
pseudo-random number generator keeps the last returned number in a
|
|
reference:
|
|
\begin{caml_example}{toplevel}
|
|
let current_rand = ref 0;;
|
|
let random () =
|
|
current_rand := !current_rand * 25713 + 1345;
|
|
!current_rand;;
|
|
\end{caml_example}
|
|
|
|
Again, there is nothing magical with references: they are implemented as
|
|
a single-field mutable record, as follows.
|
|
\begin{caml_example}{toplevel}
|
|
type 'a ref = { mutable contents: 'a };;
|
|
let ( ! ) r = r.contents;;
|
|
let ( := ) r newval = r.contents <- newval;;
|
|
\end{caml_example}
|
|
|
|
In some special cases, you may need to store a polymorphic function in
|
|
a data structure, keeping its polymorphism. Doing this requires
|
|
user-provided type annotations, since polymorphism is only introduced
|
|
automatically for global definitions. However, you can explicitly give
|
|
polymorphic types to record fields.
|
|
\begin{caml_example}{toplevel}
|
|
type idref = { mutable id: 'a. 'a -> 'a };;
|
|
let r = {id = fun x -> x};;
|
|
let g s = (s.id 1, s.id true);;
|
|
r.id <- (fun x -> print_string "called id\n"; x);;
|
|
g r;;
|
|
\end{caml_example}
|
|
|
|
\section{s:exceptions}{Exceptions}
|
|
|
|
OCaml provides exceptions for signalling and handling exceptional
|
|
conditions. Exceptions can also be used as a general-purpose non-local
|
|
control structure, although this should not be overused since it can
|
|
make the code harder to understand. Exceptions are declared with the
|
|
"exception" construct, and signalled with the "raise" operator. For instance,
|
|
the function below for taking the head of a list uses an exception to
|
|
signal the case where an empty list is given.
|
|
\begin{caml_example}{toplevel}
|
|
exception Empty_list;;
|
|
let head l =
|
|
match l with
|
|
[] -> raise Empty_list
|
|
| hd :: tl -> hd;;
|
|
head [1;2];;
|
|
head [];;
|
|
\end{caml_example}
|
|
|
|
Exceptions are used throughout the standard library to signal cases
|
|
where the library functions cannot complete normally. For instance,
|
|
the "List.assoc" function, which returns the data associated with a
|
|
given key in a list of (key, data) pairs, raises the predefined
|
|
exception "Not_found" when the key does not appear in the list:
|
|
\begin{caml_example}{toplevel}
|
|
List.assoc 1 [(0, "zero"); (1, "one")];;
|
|
List.assoc 2 [(0, "zero"); (1, "one")];;
|
|
\end{caml_example}
|
|
|
|
Exceptions can be trapped with the "try"\ldots"with" construct:
|
|
\begin{caml_example}{toplevel}
|
|
let name_of_binary_digit digit =
|
|
try
|
|
List.assoc digit [0, "zero"; 1, "one"]
|
|
with Not_found ->
|
|
"not a binary digit";;
|
|
name_of_binary_digit 0;;
|
|
name_of_binary_digit (-1);;
|
|
\end{caml_example}
|
|
|
|
The "with" part does pattern matching on the
|
|
exception value with the same syntax and behavior as "match". Thus,
|
|
several exceptions can be caught by one
|
|
"try"\ldots"with" construct:
|
|
\begin{caml_example}{toplevel}
|
|
let rec first_named_value values names =
|
|
try
|
|
List.assoc (head values) names
|
|
with
|
|
| Empty_list -> "no named value"
|
|
| Not_found -> first_named_value (List.tl values) names;;
|
|
first_named_value [ 0; 10 ] [ 1, "one"; 10, "ten"];;
|
|
\end{caml_example}
|
|
|
|
Also, finalization can be performed by
|
|
trapping all exceptions, performing the finalization, then re-raising
|
|
the exception:
|
|
\begin{caml_example}{toplevel}
|
|
let temporarily_set_reference ref newval funct =
|
|
let oldval = !ref in
|
|
try
|
|
ref := newval;
|
|
let res = funct () in
|
|
ref := oldval;
|
|
res
|
|
with x ->
|
|
ref := oldval;
|
|
raise x;;
|
|
\end{caml_example}
|
|
|
|
An alternative to "try"\ldots"with" is to catch the exception while
|
|
pattern matching:
|
|
\begin{caml_example}{toplevel}
|
|
let assoc_may_map f x l =
|
|
match List.assoc x l with
|
|
| exception Not_found -> None
|
|
| y -> f y;;
|
|
\end{caml_example}
|
|
Note that this construction is only useful if the exception is raised
|
|
between "match"\ldots"with". Exception patterns can be combined
|
|
with ordinary patterns at the toplevel,
|
|
\begin{caml_example}{toplevel}
|
|
let flat_assoc_opt x l =
|
|
match List.assoc x l with
|
|
| None | exception Not_found -> None
|
|
| Some _ as v -> v;;
|
|
\end{caml_example}
|
|
but they cannot be nested inside other patterns. For instance,
|
|
the pattern "Some (exception A)" is invalid.
|
|
|
|
When exceptions are used as a control structure, it can be useful to make
|
|
them as local as possible by using a locally defined exception.
|
|
For instance, with
|
|
\begin{caml_eval}
|
|
let ref x: _ ref = {contents=x};;
|
|
\end{caml_eval}
|
|
\begin{caml_example}{toplevel}
|
|
let fixpoint f x =
|
|
let exception Done in
|
|
let x = ref x in
|
|
try while true do
|
|
let y = f !x in
|
|
if !x = y then raise Done else x := y
|
|
done; assert false
|
|
with Done -> !x;;
|
|
\end{caml_example}
|
|
the function "f" cannot raise a "Done" exception, which removes an
|
|
entire class of misbehaving functions.
|
|
|
|
\section{s:lazy-expr}{Lazy expressions}
|
|
|
|
OCaml allows us to defer some computation until later when we need the result of
|
|
that computation.
|
|
|
|
We use "lazy (expr)" to delay the evaluation of some expression "expr". For
|
|
example, we can defer the computation of "1+1" until we need the result of that
|
|
expression, "2". Let us see how we initialize a lazy expression.
|
|
|
|
\begin{caml_example}{toplevel}
|
|
let lazy_two = lazy ( print_endline "lazy_two evaluation"; 1 + 1 );;
|
|
\end{caml_example}
|
|
|
|
We added "print_endline \"lazy_two evaluation\"" to see when the lazy
|
|
expression is being evaluated.
|
|
|
|
The value of "lazy_two" is displayed as "<lazy>", which means the expression
|
|
has not been evaluated yet, and its final value is unknown.
|
|
|
|
Note that "lazy_two" has type "int lazy_t". However, the type "'a lazy_t" is an
|
|
internal type name, so the type "'a Lazy.t" should be preferred when possible.
|
|
|
|
When we finally need the result of a lazy expression, we can call "Lazy.force"
|
|
on that expression to force its evaluation. The function "force" comes from
|
|
standard-library module \stdmoduleref{Lazy}.
|
|
|
|
\begin{caml_example}{toplevel}
|
|
Lazy.force lazy_two;;
|
|
\end{caml_example}
|
|
|
|
Notice that our function call above prints ``lazy_two evaluation'' and then
|
|
returns the plain value of the computation.
|
|
|
|
Now if we look at the value of "lazy_two", we see that it is not displayed as
|
|
"<lazy>" anymore but as "lazy 2".
|
|
|
|
\begin{caml_example}{toplevel}
|
|
lazy_two;;
|
|
\end{caml_example}
|
|
|
|
This is because "Lazy.force" memoizes the result of the forced expression. In other
|
|
words, every subsequent call of "Lazy.force" on that expression returns the
|
|
result of the first computation without recomputing the lazy expression. Let us
|
|
force "lazy_two" once again.
|
|
|
|
\begin{caml_example}{toplevel}
|
|
Lazy.force lazy_two;;
|
|
\end{caml_example}
|
|
|
|
The expression is not evaluated this time; notice that ``lazy_two evaluation'' is
|
|
not printed. The result of the initial computation is simply returned.
|
|
|
|
Lazy patterns provide another way to force a lazy expression.
|
|
|
|
\begin{caml_example}{toplevel}
|
|
let lazy_l = lazy ([1; 2] @ [3; 4]);;
|
|
let lazy l = lazy_l;;
|
|
\end{caml_example}
|
|
|
|
We can also use lazy patterns in pattern matching.
|
|
|
|
\begin{caml_example}{toplevel}
|
|
let maybe_eval lazy_guard lazy_expr =
|
|
match lazy_guard, lazy_expr with
|
|
| lazy false, _ -> "matches if (Lazy.force lazy_guard = false); lazy_expr not forced"
|
|
| lazy true, lazy _ -> "matches if (Lazy.force lazy_guard = true); lazy_expr forced";;
|
|
\end{caml_example}
|
|
|
|
The lazy expression "lazy_expr" is forced only if the "lazy_guard" value yields
|
|
"true" once computed. Indeed, a simple wildcard pattern (not lazy) never forces
|
|
the lazy expression's evaluation. However, a pattern with keyword "lazy", even
|
|
if it is wildcard, always forces the evaluation of the deferred computation.
|
|
|
|
\section{s:symb-expr}{Symbolic processing of expressions}
|
|
|
|
We finish this introduction with a more complete example
|
|
representative of the use of OCaml for symbolic processing: formal
|
|
manipulations of arithmetic expressions containing variables. The
|
|
following variant type describes the expressions we shall manipulate:
|
|
\begin{caml_example}{toplevel}
|
|
type expression =
|
|
Const of float
|
|
| Var of string
|
|
| Sum of expression * expression (* e1 + e2 *)
|
|
| Diff of expression * expression (* e1 - e2 *)
|
|
| Prod of expression * expression (* e1 * e2 *)
|
|
| Quot of expression * expression (* e1 / e2 *)
|
|
;;
|
|
\end{caml_example}
|
|
|
|
We first define a function to evaluate an expression given an
|
|
environment that maps variable names to their values. For simplicity,
|
|
the environment is represented as an association list.
|
|
\begin{caml_example}{toplevel}
|
|
exception Unbound_variable of string;;
|
|
let rec eval env exp =
|
|
match exp with
|
|
Const c -> c
|
|
| Var v ->
|
|
(try List.assoc v env with Not_found -> raise (Unbound_variable v))
|
|
| Sum(f, g) -> eval env f +. eval env g
|
|
| Diff(f, g) -> eval env f -. eval env g
|
|
| Prod(f, g) -> eval env f *. eval env g
|
|
| Quot(f, g) -> eval env f /. eval env g;;
|
|
eval [("x", 1.0); ("y", 3.14)] (Prod(Sum(Var "x", Const 2.0), Var "y"));;
|
|
\end{caml_example}
|
|
|
|
Now for a real symbolic processing, we define the derivative of an
|
|
expression with respect to a variable "dv":
|
|
\begin{caml_example}{toplevel}
|
|
let rec deriv exp dv =
|
|
match exp with
|
|
Const c -> Const 0.0
|
|
| Var v -> if v = dv then Const 1.0 else Const 0.0
|
|
| Sum(f, g) -> Sum(deriv f dv, deriv g dv)
|
|
| Diff(f, g) -> Diff(deriv f dv, deriv g dv)
|
|
| Prod(f, g) -> Sum(Prod(f, deriv g dv), Prod(deriv f dv, g))
|
|
| Quot(f, g) -> Quot(Diff(Prod(deriv f dv, g), Prod(f, deriv g dv)),
|
|
Prod(g, g))
|
|
;;
|
|
deriv (Quot(Const 1.0, Var "x")) "x";;
|
|
\end{caml_example}
|
|
|
|
\section{s:pretty-printing}{Pretty-printing}
|
|
|
|
As shown in the examples above, the internal representation (also
|
|
called {\em abstract syntax\/}) of expressions quickly becomes hard to
|
|
read and write as the expressions get larger. We need a printer and a
|
|
parser to go back and forth between the abstract syntax and the {\em
|
|
concrete syntax}, which in the case of expressions is the familiar
|
|
algebraic notation (e.g. "2*x+1").
|
|
|
|
For the printing function, we take into account the usual precedence
|
|
rules (i.e. "*" binds tighter than "+") to avoid printing unnecessary
|
|
parentheses. To this end, we maintain the current operator precedence
|
|
and print parentheses around an operator only if its precedence is
|
|
less than the current precedence.
|
|
\begin{caml_example}{toplevel}
|
|
let print_expr exp =
|
|
(* Local function definitions *)
|
|
let open_paren prec op_prec =
|
|
if prec > op_prec then print_string "(" in
|
|
let close_paren prec op_prec =
|
|
if prec > op_prec then print_string ")" in
|
|
let rec print prec exp = (* prec is the current precedence *)
|
|
match exp with
|
|
Const c -> print_float c
|
|
| Var v -> print_string v
|
|
| Sum(f, g) ->
|
|
open_paren prec 0;
|
|
print 0 f; print_string " + "; print 0 g;
|
|
close_paren prec 0
|
|
| Diff(f, g) ->
|
|
open_paren prec 0;
|
|
print 0 f; print_string " - "; print 1 g;
|
|
close_paren prec 0
|
|
| Prod(f, g) ->
|
|
open_paren prec 2;
|
|
print 2 f; print_string " * "; print 2 g;
|
|
close_paren prec 2
|
|
| Quot(f, g) ->
|
|
open_paren prec 2;
|
|
print 2 f; print_string " / "; print 3 g;
|
|
close_paren prec 2
|
|
in print 0 exp;;
|
|
let e = Sum(Prod(Const 2.0, Var "x"), Const 1.0);;
|
|
print_expr e; print_newline ();;
|
|
print_expr (deriv e "x"); print_newline ();;
|
|
\end{caml_example}
|
|
|
|
\section{s:printf}{Printf formats}
|
|
|
|
There is a "printf" function in the \stdmoduleref{Printf} module
|
|
(see chapter~\ref{c:moduleexamples}) that allows you to make formatted
|
|
output more concisely.
|
|
It follows the behavior of the "printf" function from the C standard library.
|
|
The "printf" function takes a format string that describes the desired output
|
|
as a text interspered with specifiers (for instance "%d", "%f").
|
|
Next, the specifiers are substituted by the following arguments in their order
|
|
of apparition in the format string:
|
|
\begin{caml_example}{toplevel}
|
|
Printf.printf "%i + %i is an integer value, %F * %F is a float, %S\n"
|
|
3 2 4.5 1. "this is a string";;
|
|
\end{caml_example}
|
|
The OCaml type system checks that the type of the arguments and the specifiers are
|
|
compatible. If you pass it an argument of a type that does not correspond to
|
|
the format specifier, the compiler will display an error message:
|
|
\begin{caml_example}{toplevel}[error]
|
|
Printf.printf "Float value: %F" 42;;
|
|
\end{caml_example}
|
|
The "fprintf" function is like "printf" except that it takes an output channel as
|
|
the first argument. The "%a" specifier can be useful to define custom printer
|
|
(for custom types). For instance, we can create a printing template that converts
|
|
an integer argument to signed decimal:
|
|
\begin{caml_example}{toplevel}
|
|
let pp_int ppf n = Printf.fprintf ppf "%d" n;;
|
|
Printf.printf "Outputting an integer using a custom printer: %a " pp_int 42;;
|
|
\end{caml_example}
|
|
The advantage of those printers based on the "%a" specifier is that they can be
|
|
composed together to create more complex printers step by step.
|
|
We can define a combinator that can turn a printer for "'a" type into a printer
|
|
for "'a optional":
|
|
\begin{caml_example}{toplevel}
|
|
let pp_option printer ppf = function
|
|
| None -> Printf.fprintf ppf "None"
|
|
| Some v -> Printf.fprintf ppf "Some(%a)" printer v;;
|
|
Printf.fprintf stdout
|
|
"The current setting is %a. \nThere is only %a\n"
|
|
(pp_option pp_int) (Some 3)
|
|
(pp_option pp_int) None
|
|
;;
|
|
\end{caml_example}
|
|
If the value of its argument its "None", the printer returned by pp_option
|
|
printer prints "None" otherwise it uses the provided printer to print "Some ".
|
|
|
|
Here is how to rewrite the pretty-printer using "fprintf":
|
|
\begin{caml_example}{toplevel}
|
|
let pp_expr ppf expr =
|
|
let open_paren prec op_prec output =
|
|
if prec > op_prec then Printf.fprintf output "%s" "(" in
|
|
let close_paren prec op_prec output =
|
|
if prec > op_prec then Printf.fprintf output "%s" ")" in
|
|
let rec print prec ppf expr =
|
|
match expr with
|
|
| Const c -> Printf.fprintf ppf "%F" c
|
|
| Var v -> Printf.fprintf ppf "%s" v
|
|
| Sum(f, g) ->
|
|
open_paren prec 0 ppf;
|
|
Printf.fprintf ppf "%a + %a" (print 0) f (print 0) g;
|
|
close_paren prec 0 ppf
|
|
| Diff(f, g) ->
|
|
open_paren prec 0 ppf;
|
|
Printf.fprintf ppf "%a - %a" (print 0) f (print 1) g;
|
|
close_paren prec 0 ppf
|
|
| Prod(f, g) ->
|
|
open_paren prec 2 ppf;
|
|
Printf.fprintf ppf "%a * %a" (print 2) f (print 2) g;
|
|
close_paren prec 2 ppf
|
|
| Quot(f, g) ->
|
|
open_paren prec 2 ppf;
|
|
Printf.fprintf ppf "%a / %a" (print 2) f (print 3) g;
|
|
close_paren prec 2 ppf
|
|
in print 0 ppf expr;;
|
|
pp_expr stdout e; print_newline ();;
|
|
pp_expr stdout (deriv e "x"); print_newline ();;
|
|
\end{caml_example}
|
|
|
|
Due to the way that format string are build, storing a format string requires
|
|
an explicit type annotation:
|
|
\begin{caml_example*}{toplevel}
|
|
let str : _ format =
|
|
"%i is an integer value, %F is a float, %S\n";;
|
|
\end{caml_example*}
|
|
\begin{caml_example}{toplevel}
|
|
Printf.printf str 3 4.5 "string value";;
|
|
\end{caml_example}
|
|
|
|
%%%%%%%%%%% Should be moved to the camlp4 documentation.
|
|
%% Parsing (transforming concrete syntax into abstract syntax) is usually
|
|
%% more delicate. OCaml offers several tools to help write parsers:
|
|
%% on the one hand, OCaml versions of the lexer generator Lex and the
|
|
%% parser generator Yacc (see chapter~\ref{c:ocamlyacc}), which handle
|
|
%% LALR(1) languages using push-down automata; on the other hand, a
|
|
%% predefined type of streams (of characters or tokens) and
|
|
%% pattern-matching over streams, which facilitate the writing of
|
|
%% recursive-descent parsers for LL(1) languages. An example using
|
|
%% "ocamllex" and "ocamlyacc" is given in
|
|
%% chapter~\ref{c:ocamlyacc}. Here, we will use stream parsers.
|
|
%% The syntactic support for stream parsers is provided by the Camlp4
|
|
%% preprocessor, which can be loaded into the interactive toplevel via
|
|
%% the "#load" directives below.
|
|
%%
|
|
%% \begin{caml_example}
|
|
%% #load "dynlink.cma";;
|
|
%% #load "camlp4o.cma";;
|
|
%% open Genlex;;
|
|
%% let lexer = make_lexer ["("; ")"; "+"; "-"; "*"; "/"];;
|
|
%% \end{caml_example}
|
|
%% For the lexical analysis phase (transformation of the input text into
|
|
%% a stream of tokens), we use a ``generic'' lexer provided in the
|
|
%% standard library module "Genlex". The "make_lexer" function takes a
|
|
%% list of keywords and returns a lexing function that ``tokenizes'' an
|
|
%% input stream of characters. Tokens are either identifiers, keywords,
|
|
%% or literals (integer, floats, characters, strings). Whitespace and
|
|
%% comments are skipped.
|
|
%% \begin{caml_example}
|
|
%% let token_stream = lexer (Stream.of_string "1.0 +x");;
|
|
%% Stream.next token_stream;;
|
|
%% Stream.next token_stream;;
|
|
%% Stream.next token_stream;;
|
|
%% \end{caml_example}
|
|
%%
|
|
%% The parser itself operates by pattern-matching on the stream of
|
|
%% tokens. As usual with recursive descent parsers, we use several
|
|
%% intermediate parsing functions to reflect the precedence and
|
|
%% associativity of operators. Pattern-matching over streams is more
|
|
%% powerful than on regular data structures, as it allows recursive calls
|
|
%% to parsing functions inside the patterns, for matching sub-components of
|
|
%% the input stream. See the Camlp4 documentation for more details.
|
|
%%
|
|
%% %Already said above
|
|
%% %In order to use stream parsers at toplevel, we must first load the
|
|
%% %"camlp4" preprocessor.
|
|
%% %\begin{caml_example}
|
|
%% %#load"camlp4o.cma";;
|
|
%% %\end{caml_example}
|
|
%% %Then we are ready to define our parser.
|
|
%% \begin{caml_example}{toplevel}
|
|
%% let rec parse_expr = parser
|
|
%% [< e1 = parse_mult; e = parse_more_adds e1 >] -> e
|
|
%% and parse_more_adds e1 = parser
|
|
%% [< 'Kwd "+"; e2 = parse_mult; e = parse_more_adds (Sum(e1, e2)) >] -> e
|
|
%% | [< 'Kwd "-"; e2 = parse_mult; e = parse_more_adds (Diff(e1, e2)) >] -> e
|
|
%% | [< >] -> e1
|
|
%% and parse_mult = parser
|
|
%% [< e1 = parse_simple; e = parse_more_mults e1 >] -> e
|
|
%% and parse_more_mults e1 = parser
|
|
%% [< 'Kwd "*"; e2 = parse_simple; e = parse_more_mults (Prod(e1, e2)) >] -> e
|
|
%% | [< 'Kwd "/"; e2 = parse_simple; e = parse_more_mults (Quot(e1, e2)) >] -> e
|
|
%% | [< >] -> e1
|
|
%% and parse_simple = parser
|
|
%% [< 'Ident s >] -> Var s
|
|
%% | [< 'Int i >] -> Const(float i)
|
|
%% | [< 'Float f >] -> Const f
|
|
%% | [< 'Kwd "("; e = parse_expr; 'Kwd ")" >] -> e;;
|
|
%% let parse_expression = parser [< e = parse_expr; _ = Stream.empty >] -> e;;
|
|
%% \end{caml_example}
|
|
%%
|
|
%% Composing the lexer and parser, we finally obtain a function to read
|
|
%% an expression from a character string:
|
|
%% \begin{caml_example}
|
|
%% let read_expression s = parse_expression (lexer (Stream.of_string s));;
|
|
%% read_expression "2*(x+y)";;
|
|
%% \end{caml_example}
|
|
%% A small puzzle: why do we get different results in the following two
|
|
%% examples?
|
|
%% \begin{caml_example}
|
|
%% read_expression "x - 1";;
|
|
%% read_expression "x-1";;
|
|
%% \end{caml_example}
|
|
%% Answer: the generic lexer provided by "Genlex" recognizes negative
|
|
%% integer literals as one integer token. Hence, "x-1" is read as
|
|
%% the token "Ident \"x\"" followed by the token "Int(-1)"; this sequence
|
|
%% does not match any of the parser rules. On the other hand,
|
|
%% the second space in "x - 1" causes the lexer to return the three
|
|
%% expected tokens: "Ident \"x\"", then "Kwd \"-\"", then "Int(1)".
|
|
|
|
\section{s:standalone-programs}{Standalone OCaml programs}
|
|
|
|
All examples given so far were executed under the interactive system.
|
|
OCaml code can also be compiled separately and executed
|
|
non-interactively using the batch compilers "ocamlc" and "ocamlopt".
|
|
The source code must be put in a file with extension ".ml". It
|
|
consists of a sequence of phrases, which will be evaluated at runtime
|
|
in their order of appearance in the source file. Unlike in interactive
|
|
mode, types and values are not printed automatically; the program must
|
|
call printing functions explicitly to produce some output. The ";;" used
|
|
in the interactive examples is not required in
|
|
source files created for use with OCaml compilers, but can be helpful
|
|
to mark the end of a top-level expression unambiguously even when
|
|
there are syntax errors.
|
|
Here is a
|
|
sample standalone program to print the greatest common divisor
|
|
(gcd) of two numbers:
|
|
\begin{verbatim}
|
|
(* File gcd.ml *)
|
|
let rec gcd a b =
|
|
if b = 0 then a
|
|
else gcd b (a mod b);;
|
|
|
|
let main () =
|
|
let a = int_of_string Sys.argv.(1) in
|
|
let b = int_of_string Sys.argv.(2) in
|
|
Printf.printf "%d\n" (gcd a b);
|
|
exit 0;;
|
|
main ();;
|
|
\end{verbatim}
|
|
"Sys.argv" is an array of strings containing the command-line
|
|
parameters. "Sys.argv.(1)" is thus the first command-line parameter.
|
|
The program above is compiled and executed with the following shell
|
|
commands:
|
|
\begin{verbatim}
|
|
$ ocamlc -o gcd gcd.ml
|
|
$ ./gcd 6 9
|
|
3
|
|
$ ./gcd 7 11
|
|
1
|
|
\end{verbatim}
|
|
|
|
More complex standalone OCaml programs are typically composed of
|
|
multiple source files, and can link with precompiled libraries.
|
|
Chapters~\ref{c:camlc} and~\ref{c:nativecomp} explain how to use the
|
|
batch compilers "ocamlc" and "ocamlopt". Recompilation of
|
|
multi-file OCaml projects can be automated using third-party
|
|
build systems, such as the
|
|
\href{https://github.com/ocaml/ocamlbuild/}{ocamlbuild}
|
|
compilation manager.
|