ocaml/manual/manual/tutorials/coreexamples.etex

979 lines
38 KiB
Plaintext

\chapter{The core language} \label{c:core-xamples}
%HEVEA\cutname{coreexamples.html}
This part of the manual is a tutorial introduction to the
OCaml language. A good familiarity with programming in a conventional
languages (say, C or Java) is assumed, but no prior exposure to
functional languages is required. The present chapter introduces the
core language. Chapter~\ref{c:moduleexamples} deals with the
module system, chapter~\ref{c:objectexamples} with the
object-oriented features, chapter~\ref{c:labl-examples} with
extensions to the core language (labeled arguments and polymorphic
variants), and chapter~\ref{c:advexamples} gives some advanced examples.
\section{s:basics}{Basics}
For this overview of OCaml, we use the interactive system, which
is started by running "ocaml" from the Unix shell, or by launching the
"OCamlwin.exe" application under Windows. This tutorial is presented
as the transcript of a session with the interactive system:
lines starting with "#" represent user input; the system responses are
printed below, without a leading "#".
Under the interactive system, the user types OCaml phrases terminated
by ";;" in response to the "#" prompt, and the system compiles them
on the fly, executes them, and prints the outcome of evaluation.
Phrases are either simple expressions, or "let" definitions of
identifiers (either values or functions).
\begin{caml_example}{toplevel}
1+2*3;;
let pi = 4.0 *. atan 1.0;;
let square x = x *. x;;
square (sin pi) +. square (cos pi);;
\end{caml_example}
The OCaml system computes both the value and the type for
each phrase. Even function parameters need no explicit type declaration:
the system infers their types from their usage in the
function. Notice also that integers and floating-point numbers are
distinct types, with distinct operators: "+" and "*" operate on
integers, but "+." and "*." operate on floats.
\begin{caml_example}{toplevel}[error]
1.0 * 2;;
\end{caml_example}
Recursive functions are defined with the "let rec" binding:
\begin{caml_example}{toplevel}
let rec fib n =
if n < 2 then n else fib (n-1) + fib (n-2);;
fib 10;;
\end{caml_example}
\section{s:datatypes}{Data types}
In addition to integers and floating-point numbers, OCaml offers the
usual basic data types:
\begin{itemize}%
\item booleans
\begin{caml_example}{toplevel}
(1 < 2) = false;;
let one = if true then 1 else 2;;
\end{caml_example}
\item characters
\begin{caml_example}{toplevel}
'a';;
int_of_char '\n';;
\end{caml_example}
\item immutable character strings
\begin{caml_example}{toplevel}
"Hello" ^ " " ^ "world";;
{|This is a quoted string, here, neither \ nor " are special characters|};;
{|"\\"|}="\"\\\\\"";;
{delimiter|the end of this|}quoted string is here|delimiter}
= "the end of this|}quoted string is here";;
\end{caml_example}
\end{itemize}
Predefined data structures include tuples, arrays, and lists. There are also
general mechanisms for defining your own data structures, such as records and
variants, which will be covered in more detail later; for now, we concentrate
on lists. Lists are either given in extension as a bracketed list of
semicolon-separated elements, or built from the empty list "[]"
(pronounce ``nil'') by adding elements in front using the "::"
(``cons'') operator.
\begin{caml_example}{toplevel}
let l = ["is"; "a"; "tale"; "told"; "etc."];;
"Life" :: l;;
\end{caml_example}
As with all other OCaml data structures, lists do not need to be
explicitly allocated and deallocated from memory: all memory
management is entirely automatic in OCaml. Similarly, there is no
explicit handling of pointers: the OCaml compiler silently introduces
pointers where necessary.
As with most OCaml data structures, inspecting and destructuring lists
is performed by pattern-matching. List patterns have exactly the same
form as list expressions, with identifiers representing unspecified
parts of the list. As an example, here is insertion sort on a list:
\begin{caml_example}{toplevel}
let rec sort lst =
match lst with
[] -> []
| head :: tail -> insert head (sort tail)
and insert elt lst =
match lst with
[] -> [elt]
| head :: tail -> if elt <= head then elt :: lst else head :: insert elt tail
;;
sort l;;
\end{caml_example}
The type inferred for "sort", "'a list -> 'a list", means that "sort"
can actually apply to lists of any type, and returns a list of the
same type. The type "'a" is a {\em type variable}, and stands for any
given type. The reason why "sort" can apply to lists of any type is
that the comparisons ("=", "<=", etc.) are {\em polymorphic} in OCaml:
they operate between any two values of the same type. This makes
"sort" itself polymorphic over all list types.
\begin{caml_example}{toplevel}
sort [6;2;5;3];;
sort [3.14; 2.718];;
\end{caml_example}
The "sort" function above does not modify its input list: it builds
and returns a new list containing the same elements as the input list,
in ascending order. There is actually no way in OCaml to modify
a list in-place once it is built: we say that lists are {\em immutable}
data structures. Most OCaml data structures are immutable, but a few
(most notably arrays) are {\em mutable}, meaning that they can be
modified in-place at any time.
The OCaml notation for the type of a function with multiple arguments is \\
"arg1_type -> arg2_type -> ... -> return_type". For example,
the type inferred for "insert", "'a -> 'a list -> 'a list", means that "insert"
takes two arguments, an element of any type "'a" and a list with elements of
the same type "'a" and returns a list of the same type.
\section{s:functions-as-values}{Functions as values}
OCaml is a functional language: functions in the full mathematical
sense are supported and can be passed around freely just as any other
piece of data. For instance, here is a "deriv" function that takes any
float function as argument and returns an approximation of its
derivative function:
\begin{caml_example}{toplevel}
let deriv f dx = function x -> (f (x +. dx) -. f x) /. dx;;
let sin' = deriv sin 1e-6;;
sin' pi;;
\end{caml_example}
Even function composition is definable:
\begin{caml_example}{toplevel}
let compose f g = function x -> f (g x);;
let cos2 = compose square cos;;
\end{caml_example}
Functions that take other functions as arguments are called
``functionals'', or ``higher-order functions''. Functionals are
especially useful to provide iterators or similar generic operations
over a data structure. For instance, the standard OCaml library
provides a "List.map" functional that applies a given function to each
element of a list, and returns the list of the results:
\begin{caml_example}{toplevel}
List.map (function n -> n * 2 + 1) [0;1;2;3;4];;
\end{caml_example}
This functional, along with a number of other list and array
functionals, is predefined because it is often useful, but there is
nothing magic with it: it can easily be defined as follows.
\begin{caml_example}{toplevel}
let rec map f l =
match l with
[] -> []
| hd :: tl -> f hd :: map f tl;;
\end{caml_example}
\section{s:tut-recvariants}{Records and variants}
User-defined data structures include records and variants. Both are
defined with the "type" declaration. Here, we declare a record type to
represent rational numbers.
\begin{caml_example}{toplevel}
type ratio = {num: int; denom: int};;
let add_ratio r1 r2 =
{num = r1.num * r2.denom + r2.num * r1.denom;
denom = r1.denom * r2.denom};;
add_ratio {num=1; denom=3} {num=2; denom=5};;
\end{caml_example}
Record fields can also be accessed through pattern-matching:
\begin{caml_example}{toplevel}
let integer_part r =
match r with
{num=num; denom=denom} -> num / denom;;
\end{caml_example}
Since there is only one case in this pattern matching, it
is safe to expand directly the argument "r" in a record pattern:
\begin{caml_example}{toplevel}
let integer_part {num=num; denom=denom} = num / denom;;
\end{caml_example}
Unneeded fields can be omitted:
\begin{caml_example}{toplevel}
let get_denom {denom=denom} = denom;;
\end{caml_example}
Optionally, missing fields can be made explicit by ending the list of
fields with a trailing wildcard "_"::
\begin{caml_example}{toplevel}
let get_num {num=num; _ } = num;;
\end{caml_example}
When both sides of the "=" sign are the same, it is possible to avoid
repeating the field name by eliding the "=field" part:
\begin{caml_example}{toplevel}
let integer_part {num; denom} = num / denom;;
\end{caml_example}
This short notation for fields also works when constructing records:
\begin{caml_example}{toplevel}
let ratio num denom = {num; denom};;
\end{caml_example}
At last, it is possible to update few fields of a record at once:
\begin{caml_example}{toplevel}
let integer_product integer ratio = { ratio with num = integer * ratio.num };;
\end{caml_example}
With this functional update notation, the record on the left-hand side
of "with" is copied except for the fields on the right-hand side which
are updated.
The declaration of a variant type lists all possible forms for values
of that type. Each case is identified by a name, called a constructor,
which serves both for constructing values of the variant type and
inspecting them by pattern-matching. Constructor names are capitalized
to distinguish them from variable names (which must start with a
lowercase letter). For instance, here is a variant
type for doing mixed arithmetic (integers and floats):
\begin{caml_example}{toplevel}
type number = Int of int | Float of float | Error;;
\end{caml_example}
This declaration expresses that a value of type "number" is either an
integer, a floating-point number, or the constant "Error" representing
the result of an invalid operation (e.g. a division by zero).
Enumerated types are a special case of variant types, where all
alternatives are constants:
\begin{caml_example}{toplevel}
type sign = Positive | Negative;;
let sign_int n = if n >= 0 then Positive else Negative;;
\end{caml_example}
To define arithmetic operations for the "number" type, we use
pattern-matching on the two numbers involved:
\begin{caml_example}{toplevel}
let add_num n1 n2 =
match (n1, n2) with
(Int i1, Int i2) ->
(* Check for overflow of integer addition *)
if sign_int i1 = sign_int i2 && sign_int (i1 + i2) <> sign_int i1
then Float(float i1 +. float i2)
else Int(i1 + i2)
| (Int i1, Float f2) -> Float(float i1 +. f2)
| (Float f1, Int i2) -> Float(f1 +. float i2)
| (Float f1, Float f2) -> Float(f1 +. f2)
| (Error, _) -> Error
| (_, Error) -> Error;;
add_num (Int 123) (Float 3.14159);;
\end{caml_example}
Another interesting example of variant type is the built-in
"'a option" type which represents either a value of type "'a" or an
absence of value:
\begin{caml_example}{toplevel}
type 'a option = Some of 'a | None;;
\end{caml_example}
This type is particularly useful when defining function that can
fail in common situations, for instance
\begin{caml_example}{toplevel}
let safe_square_root x = if x > 0. then Some(sqrt x) else None;;
\end{caml_example}
The most common usage of variant types is to describe recursive data
structures. Consider for example the type of binary trees:
\begin{caml_example}{toplevel}
type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;;
\end{caml_example}
This definition reads as follows: a binary tree containing values of
type "'a" (an arbitrary type) is either empty, or is a node containing
one value of type "'a" and two subtrees also containing values of type
"'a", that is, two "'a btree".
Operations on binary trees are naturally expressed as recursive functions
following the same structure as the type definition itself. For
instance, here are functions performing lookup and insertion in
ordered binary trees (elements increase from left to right):
\begin{caml_example}{toplevel}
let rec member x btree =
match btree with
Empty -> false
| Node(y, left, right) ->
if x = y then true else
if x < y then member x left else member x right;;
let rec insert x btree =
match btree with
Empty -> Node(x, Empty, Empty)
| Node(y, left, right) ->
if x <= y then Node(y, insert x left, right)
else Node(y, left, insert x right);;
\end{caml_example}
\subsection{ss:record-and-variant-disambiguation}{Record and variant disambiguation}
( This subsection can be skipped on the first reading )
Astute readers may have wondered what happens when two or more record
fields or constructors share the same name
\begin{caml_example*}{toplevel}
type first_record = { x:int; y:int; z:int }
type middle_record = { x:int; z:int }
type last_record = { x:int };;
type first_variant = A | B | C
type last_variant = A;;
\end{caml_example*}
The answer is that when confronted with multiple options, OCaml tries to
use locally available information to disambiguate between the various fields
and constructors. First, if the type of the record or variant is known,
OCaml can pick unambiguously the corresponding field or constructor.
For instance:
\begin{caml_example}{toplevel}
let look_at_x_then_z (r:first_record) =
let x = r.x in
x + r.z;;
let permute (x:first_variant) = match x with
| A -> (B:first_variant)
| B -> A
| C -> C;;
type wrapped = First of first_record
let f (First r) = r, r.x;;
\end{caml_example}
In the first example, "(r:first_record)" is an explicit annotation
telling OCaml that the type of "r" is "first_record". With this
annotation, Ocaml knows that "r.x" refers to the "x" field of the first
record type. Similarly, the type annotation in the second example makes
it clear to OCaml that the constructors "A", "B" and "C" come from the
first variant type. Contrarily, in the last example, OCaml has inferred
by itself that the type of "r" can only be "first_record" and there are
no needs for explicit type annotations.
Those explicit type annotations can in fact be used anywhere.
Most of the time they are unnecessary, but they are useful to guide
disambiguation, to debug unexpected type errors, or combined with some
of the more advanced features of OCaml described in later chapters.
Secondly, for records, OCaml can also deduce the right record type by
looking at the whole set of fields used in a expression or pattern:
\begin{caml_example}{toplevel}
let project_and_rotate {x;y; _ } = { x= - y; y = x ; z = 0} ;;
\end{caml_example}
Since the fields "x" and "y" can only appear simultaneously in the first
record type, OCaml infers that the type of "project_and_rotate" is
"first_record -> first_record".
In last resort, if there is not enough information to disambiguate between
different fields or constructors, Ocaml picks the last defined type
amongst all locally valid choices:
\begin{caml_example}{toplevel}
let look_at_xz {x;z} = x;;
\end{caml_example}
Here, OCaml has inferred that the possible choices for the type of
"{x;z}" are "first_record" and "middle_record", since the type
"last_record" has no field "z". Ocaml then picks the type "middle_record"
as the last defined type between the two possibilities.
Beware that this last resort disambiguation is local: once Ocaml has
chosen a disambiguation, it sticks to this choice, even if it leads to
an ulterior type error:
\begin{caml_example}{toplevel}[error]
let look_at_x_then_y r =
let x = r.x in (* Ocaml deduces [r: last_record] *)
x + r.y;;
let is_a_or_b x = match x with
| A -> true (* OCaml infers [x: last_variant] *)
| B -> true;;
\end{caml_example}
Moreover, being the last defined type is a quite unstable position that
may change surreptitiously after adding or moving around a type
definition, or after opening a module (see chapter \ref{c:moduleexamples}).
Consequently, adding explicit type annotations to guide disambiguation is
more robust than relying on the last defined type disambiguation.
\section{s:imperative-features}{Imperative features}
Though all examples so far were written in purely applicative style,
OCaml is also equipped with full imperative features. This includes the
usual "while" and "for" loops, as well as mutable data structures such
as arrays. Arrays are either created by listing semicolon-separated element
values between "[|" and "|]" brackets, or allocated and initialized with the
"Array.make" function, then filled up later by assignments. For instance, the
function below sums two vectors (represented as float arrays) componentwise.
\begin{caml_example}{toplevel}
let add_vect v1 v2 =
let len = min (Array.length v1) (Array.length v2) in
let res = Array.make len 0.0 in
for i = 0 to len - 1 do
res.(i) <- v1.(i) +. v2.(i)
done;
res;;
add_vect [| 1.0; 2.0 |] [| 3.0; 4.0 |];;
\end{caml_example}
Record fields can also be modified by assignment, provided they are
declared "mutable" in the definition of the record type:
\begin{caml_example}{toplevel}
type mutable_point = { mutable x: float; mutable y: float };;
let translate p dx dy =
p.x <- p.x +. dx; p.y <- p.y +. dy;;
let mypoint = { x = 0.0; y = 0.0 };;
translate mypoint 1.0 2.0;;
mypoint;;
\end{caml_example}
OCaml has no built-in notion of variable -- identifiers whose current
value can be changed by assignment. (The "let" binding is not an
assignment, it introduces a new identifier with a new scope.)
However, the standard library provides references, which are mutable
indirection cells, with operators "!" to fetch
the current contents of the reference and ":=" to assign the contents.
Variables can then be emulated by "let"-binding a reference. For
instance, here is an in-place insertion sort over arrays:
\begin{caml_example}{toplevel}
let insertion_sort a =
for i = 1 to Array.length a - 1 do
let val_i = a.(i) in
let j = ref i in
while !j > 0 && val_i < a.(!j - 1) do
a.(!j) <- a.(!j - 1);
j := !j - 1
done;
a.(!j) <- val_i
done;;
\end{caml_example}
References are also useful to write functions that maintain a current
state between two calls to the function. For instance, the following
pseudo-random number generator keeps the last returned number in a
reference:
\begin{caml_example}{toplevel}
let current_rand = ref 0;;
let random () =
current_rand := !current_rand * 25713 + 1345;
!current_rand;;
\end{caml_example}
Again, there is nothing magical with references: they are implemented as
a single-field mutable record, as follows.
\begin{caml_example}{toplevel}
type 'a ref = { mutable contents: 'a };;
let ( ! ) r = r.contents;;
let ( := ) r newval = r.contents <- newval;;
\end{caml_example}
In some special cases, you may need to store a polymorphic function in
a data structure, keeping its polymorphism. Doing this requires
user-provided type annotations, since polymorphism is only introduced
automatically for global definitions. However, you can explicitly give
polymorphic types to record fields.
\begin{caml_example}{toplevel}
type idref = { mutable id: 'a. 'a -> 'a };;
let r = {id = fun x -> x};;
let g s = (s.id 1, s.id true);;
r.id <- (fun x -> print_string "called id\n"; x);;
g r;;
\end{caml_example}
\section{s:exceptions}{Exceptions}
OCaml provides exceptions for signalling and handling exceptional
conditions. Exceptions can also be used as a general-purpose non-local
control structure, although this should not be overused since it can
make the code harder to understand. Exceptions are declared with the
"exception" construct, and signalled with the "raise" operator. For instance,
the function below for taking the head of a list uses an exception to
signal the case where an empty list is given.
\begin{caml_example}{toplevel}
exception Empty_list;;
let head l =
match l with
[] -> raise Empty_list
| hd :: tl -> hd;;
head [1;2];;
head [];;
\end{caml_example}
Exceptions are used throughout the standard library to signal cases
where the library functions cannot complete normally. For instance,
the "List.assoc" function, which returns the data associated with a
given key in a list of (key, data) pairs, raises the predefined
exception "Not_found" when the key does not appear in the list:
\begin{caml_example}{toplevel}
List.assoc 1 [(0, "zero"); (1, "one")];;
List.assoc 2 [(0, "zero"); (1, "one")];;
\end{caml_example}
Exceptions can be trapped with the "try"\ldots"with" construct:
\begin{caml_example}{toplevel}
let name_of_binary_digit digit =
try
List.assoc digit [0, "zero"; 1, "one"]
with Not_found ->
"not a binary digit";;
name_of_binary_digit 0;;
name_of_binary_digit (-1);;
\end{caml_example}
The "with" part does pattern matching on the
exception value with the same syntax and behavior as "match". Thus,
several exceptions can be caught by one
"try"\ldots"with" construct:
\begin{caml_example}{toplevel}
let rec first_named_value values names =
try
List.assoc (head values) names
with
| Empty_list -> "no named value"
| Not_found -> first_named_value (List.tl values) names;;
first_named_value [ 0; 10 ] [ 1, "one"; 10, "ten"];;
\end{caml_example}
Also, finalization can be performed by
trapping all exceptions, performing the finalization, then re-raising
the exception:
\begin{caml_example}{toplevel}
let temporarily_set_reference ref newval funct =
let oldval = !ref in
try
ref := newval;
let res = funct () in
ref := oldval;
res
with x ->
ref := oldval;
raise x;;
\end{caml_example}
An alternative to "try"\ldots"with" is to catch the exception while
pattern matching:
\begin{caml_example}{toplevel}
let assoc_may_map f x l =
match List.assoc x l with
| exception Not_found -> None
| y -> f y;;
\end{caml_example}
Note that this construction is only useful if the exception is raised
between "match"\ldots"with". Exception patterns can be combined
with ordinary patterns at the toplevel,
\begin{caml_example}{toplevel}
let flat_assoc_opt x l =
match List.assoc x l with
| None | exception Not_found -> None
| Some _ as v -> v;;
\end{caml_example}
but they cannot be nested inside other patterns. For instance,
the pattern "Some (exception A)" is invalid.
When exceptions are used as a control structure, it can be useful to make
them as local as possible by using a locally defined exception.
For instance, with
\begin{caml_eval}
let ref x: _ ref = {contents=x};;
\end{caml_eval}
\begin{caml_example}{toplevel}
let fixpoint f x =
let exception Done in
let x = ref x in
try while true do
let y = f !x in
if !x = y then raise Done else x := y
done; assert false
with Done -> !x;;
\end{caml_example}
the function "f" cannot raise a "Done" exception, which removes an
entire class of misbehaving functions.
\section{s:lazy-expr}{Lazy expressions}
OCaml allows us to defer some computation until later when we need the result of
that computation.
We use "lazy (expr)" to delay the evaluation of some expression "expr". For
example, we can defer the computation of "1+1" until we need the result of that
expression, "2". Let us see how we initialize a lazy expression.
\begin{caml_example}{toplevel}
let lazy_two = lazy ( print_endline "lazy_two evaluation"; 1 + 1 );;
\end{caml_example}
We added "print_endline \"lazy_two evaluation\"" to see when the lazy
expression is being evaluated.
The value of "lazy_two" is displayed as "<lazy>", which means the expression
has not been evaluated yet, and its final value is unknown.
Note that "lazy_two" has type "int lazy_t". However, the type "'a lazy_t" is an
internal type name, so the type "'a Lazy.t" should be preferred when possible.
When we finally need the result of a lazy expression, we can call "Lazy.force"
on that expression to force its evaluation. The function "force" comes from
standard-library module \stdmoduleref{Lazy}.
\begin{caml_example}{toplevel}
Lazy.force lazy_two;;
\end{caml_example}
Notice that our function call above prints ``lazy_two evaluation'' and then
returns the plain value of the computation.
Now if we look at the value of "lazy_two", we see that it is not displayed as
"<lazy>" anymore but as "lazy 2".
\begin{caml_example}{toplevel}
lazy_two;;
\end{caml_example}
This is because "Lazy.force" memoizes the result of the forced expression. In other
words, every subsequent call of "Lazy.force" on that expression returns the
result of the first computation without recomputing the lazy expression. Let us
force "lazy_two" once again.
\begin{caml_example}{toplevel}
Lazy.force lazy_two;;
\end{caml_example}
The expression is not evaluated this time; notice that ``lazy_two evaluation'' is
not printed. The result of the initial computation is simply returned.
Lazy patterns provide another way to force a lazy expression.
\begin{caml_example}{toplevel}
let lazy_l = lazy ([1; 2] @ [3; 4]);;
let lazy l = lazy_l;;
\end{caml_example}
We can also use lazy patterns in pattern matching.
\begin{caml_example}{toplevel}
let maybe_eval lazy_guard lazy_expr =
match lazy_guard, lazy_expr with
| lazy false, _ -> "matches if (Lazy.force lazy_guard = false); lazy_expr not forced"
| lazy true, lazy _ -> "matches if (Lazy.force lazy_guard = true); lazy_expr forced";;
\end{caml_example}
The lazy expression "lazy_expr" is forced only if the "lazy_guard" value yields
"true" once computed. Indeed, a simple wildcard pattern (not lazy) never forces
the lazy expression's evaluation. However, a pattern with keyword "lazy", even
if it is wildcard, always forces the evaluation of the deferred computation.
\section{s:symb-expr}{Symbolic processing of expressions}
We finish this introduction with a more complete example
representative of the use of OCaml for symbolic processing: formal
manipulations of arithmetic expressions containing variables. The
following variant type describes the expressions we shall manipulate:
\begin{caml_example}{toplevel}
type expression =
Const of float
| Var of string
| Sum of expression * expression (* e1 + e2 *)
| Diff of expression * expression (* e1 - e2 *)
| Prod of expression * expression (* e1 * e2 *)
| Quot of expression * expression (* e1 / e2 *)
;;
\end{caml_example}
We first define a function to evaluate an expression given an
environment that maps variable names to their values. For simplicity,
the environment is represented as an association list.
\begin{caml_example}{toplevel}
exception Unbound_variable of string;;
let rec eval env exp =
match exp with
Const c -> c
| Var v ->
(try List.assoc v env with Not_found -> raise (Unbound_variable v))
| Sum(f, g) -> eval env f +. eval env g
| Diff(f, g) -> eval env f -. eval env g
| Prod(f, g) -> eval env f *. eval env g
| Quot(f, g) -> eval env f /. eval env g;;
eval [("x", 1.0); ("y", 3.14)] (Prod(Sum(Var "x", Const 2.0), Var "y"));;
\end{caml_example}
Now for a real symbolic processing, we define the derivative of an
expression with respect to a variable "dv":
\begin{caml_example}{toplevel}
let rec deriv exp dv =
match exp with
Const c -> Const 0.0
| Var v -> if v = dv then Const 1.0 else Const 0.0
| Sum(f, g) -> Sum(deriv f dv, deriv g dv)
| Diff(f, g) -> Diff(deriv f dv, deriv g dv)
| Prod(f, g) -> Sum(Prod(f, deriv g dv), Prod(deriv f dv, g))
| Quot(f, g) -> Quot(Diff(Prod(deriv f dv, g), Prod(f, deriv g dv)),
Prod(g, g))
;;
deriv (Quot(Const 1.0, Var "x")) "x";;
\end{caml_example}
\section{s:pretty-printing}{Pretty-printing}
As shown in the examples above, the internal representation (also
called {\em abstract syntax\/}) of expressions quickly becomes hard to
read and write as the expressions get larger. We need a printer and a
parser to go back and forth between the abstract syntax and the {\em
concrete syntax}, which in the case of expressions is the familiar
algebraic notation (e.g. "2*x+1").
For the printing function, we take into account the usual precedence
rules (i.e. "*" binds tighter than "+") to avoid printing unnecessary
parentheses. To this end, we maintain the current operator precedence
and print parentheses around an operator only if its precedence is
less than the current precedence.
\begin{caml_example}{toplevel}
let print_expr exp =
(* Local function definitions *)
let open_paren prec op_prec =
if prec > op_prec then print_string "(" in
let close_paren prec op_prec =
if prec > op_prec then print_string ")" in
let rec print prec exp = (* prec is the current precedence *)
match exp with
Const c -> print_float c
| Var v -> print_string v
| Sum(f, g) ->
open_paren prec 0;
print 0 f; print_string " + "; print 0 g;
close_paren prec 0
| Diff(f, g) ->
open_paren prec 0;
print 0 f; print_string " - "; print 1 g;
close_paren prec 0
| Prod(f, g) ->
open_paren prec 2;
print 2 f; print_string " * "; print 2 g;
close_paren prec 2
| Quot(f, g) ->
open_paren prec 2;
print 2 f; print_string " / "; print 3 g;
close_paren prec 2
in print 0 exp;;
let e = Sum(Prod(Const 2.0, Var "x"), Const 1.0);;
print_expr e; print_newline ();;
print_expr (deriv e "x"); print_newline ();;
\end{caml_example}
\section{s:printf}{Printf formats}
There is a "printf" function in the \stdmoduleref{Printf} module
(see chapter~\ref{c:moduleexamples}) that allows you to make formatted
output more concisely.
It follows the behavior of the "printf" function from the C standard library.
The "printf" function takes a format string that describes the desired output
as a text interspered with specifiers (for instance "%d", "%f").
Next, the specifiers are substituted by the following arguments in their order
of apparition in the format string:
\begin{caml_example}{toplevel}
Printf.printf "%i + %i is an integer value, %F * %F is a float, %S\n"
3 2 4.5 1. "this is a string";;
\end{caml_example}
The OCaml type system checks that the type of the arguments and the specifiers are
compatible. If you pass it an argument of a type that does not correspond to
the format specifier, the compiler will display an error message:
\begin{caml_example}{toplevel}[error]
Printf.printf "Float value: %F" 42;;
\end{caml_example}
The "fprintf" function is like "printf" except that it takes an output channel as
the first argument. The "%a" specifier can be useful to define custom printer
(for custom types). For instance, we can create a printing template that converts
an integer argument to signed decimal:
\begin{caml_example}{toplevel}
let pp_int ppf n = Printf.fprintf ppf "%d" n;;
Printf.printf "Outputting an integer using a custom printer: %a " pp_int 42;;
\end{caml_example}
The advantage of those printers based on the "%a" specifier is that they can be
composed together to create more complex printers step by step.
We can define a combinator that can turn a printer for "'a" type into a printer
for "'a optional":
\begin{caml_example}{toplevel}
let pp_option printer ppf = function
| None -> Printf.fprintf ppf "None"
| Some v -> Printf.fprintf ppf "Some(%a)" printer v;;
Printf.fprintf stdout
"The current setting is %a. \nThere is only %a\n"
(pp_option pp_int) (Some 3)
(pp_option pp_int) None
;;
\end{caml_example}
If the value of its argument its "None", the printer returned by pp_option
printer prints "None" otherwise it uses the provided printer to print "Some ".
Here is how to rewrite the pretty-printer using "fprintf":
\begin{caml_example}{toplevel}
let pp_expr ppf expr =
let open_paren prec op_prec output =
if prec > op_prec then Printf.fprintf output "%s" "(" in
let close_paren prec op_prec output =
if prec > op_prec then Printf.fprintf output "%s" ")" in
let rec print prec ppf expr =
match expr with
| Const c -> Printf.fprintf ppf "%F" c
| Var v -> Printf.fprintf ppf "%s" v
| Sum(f, g) ->
open_paren prec 0 ppf;
Printf.fprintf ppf "%a + %a" (print 0) f (print 0) g;
close_paren prec 0 ppf
| Diff(f, g) ->
open_paren prec 0 ppf;
Printf.fprintf ppf "%a - %a" (print 0) f (print 1) g;
close_paren prec 0 ppf
| Prod(f, g) ->
open_paren prec 2 ppf;
Printf.fprintf ppf "%a * %a" (print 2) f (print 2) g;
close_paren prec 2 ppf
| Quot(f, g) ->
open_paren prec 2 ppf;
Printf.fprintf ppf "%a / %a" (print 2) f (print 3) g;
close_paren prec 2 ppf
in print 0 ppf expr;;
pp_expr stdout e; print_newline ();;
pp_expr stdout (deriv e "x"); print_newline ();;
\end{caml_example}
Due to the way that format string are build, storing a format string requires
an explicit type annotation:
\begin{caml_example*}{toplevel}
let str : _ format =
"%i is an integer value, %F is a float, %S\n";;
\end{caml_example*}
\begin{caml_example}{toplevel}
Printf.printf str 3 4.5 "string value";;
\end{caml_example}
%%%%%%%%%%% Should be moved to the camlp4 documentation.
%% Parsing (transforming concrete syntax into abstract syntax) is usually
%% more delicate. OCaml offers several tools to help write parsers:
%% on the one hand, OCaml versions of the lexer generator Lex and the
%% parser generator Yacc (see chapter~\ref{c:ocamlyacc}), which handle
%% LALR(1) languages using push-down automata; on the other hand, a
%% predefined type of streams (of characters or tokens) and
%% pattern-matching over streams, which facilitate the writing of
%% recursive-descent parsers for LL(1) languages. An example using
%% "ocamllex" and "ocamlyacc" is given in
%% chapter~\ref{c:ocamlyacc}. Here, we will use stream parsers.
%% The syntactic support for stream parsers is provided by the Camlp4
%% preprocessor, which can be loaded into the interactive toplevel via
%% the "#load" directives below.
%%
%% \begin{caml_example}
%% #load "dynlink.cma";;
%% #load "camlp4o.cma";;
%% open Genlex;;
%% let lexer = make_lexer ["("; ")"; "+"; "-"; "*"; "/"];;
%% \end{caml_example}
%% For the lexical analysis phase (transformation of the input text into
%% a stream of tokens), we use a ``generic'' lexer provided in the
%% standard library module "Genlex". The "make_lexer" function takes a
%% list of keywords and returns a lexing function that ``tokenizes'' an
%% input stream of characters. Tokens are either identifiers, keywords,
%% or literals (integer, floats, characters, strings). Whitespace and
%% comments are skipped.
%% \begin{caml_example}
%% let token_stream = lexer (Stream.of_string "1.0 +x");;
%% Stream.next token_stream;;
%% Stream.next token_stream;;
%% Stream.next token_stream;;
%% \end{caml_example}
%%
%% The parser itself operates by pattern-matching on the stream of
%% tokens. As usual with recursive descent parsers, we use several
%% intermediate parsing functions to reflect the precedence and
%% associativity of operators. Pattern-matching over streams is more
%% powerful than on regular data structures, as it allows recursive calls
%% to parsing functions inside the patterns, for matching sub-components of
%% the input stream. See the Camlp4 documentation for more details.
%%
%% %Already said above
%% %In order to use stream parsers at toplevel, we must first load the
%% %"camlp4" preprocessor.
%% %\begin{caml_example}
%% %#load"camlp4o.cma";;
%% %\end{caml_example}
%% %Then we are ready to define our parser.
%% \begin{caml_example}{toplevel}
%% let rec parse_expr = parser
%% [< e1 = parse_mult; e = parse_more_adds e1 >] -> e
%% and parse_more_adds e1 = parser
%% [< 'Kwd "+"; e2 = parse_mult; e = parse_more_adds (Sum(e1, e2)) >] -> e
%% | [< 'Kwd "-"; e2 = parse_mult; e = parse_more_adds (Diff(e1, e2)) >] -> e
%% | [< >] -> e1
%% and parse_mult = parser
%% [< e1 = parse_simple; e = parse_more_mults e1 >] -> e
%% and parse_more_mults e1 = parser
%% [< 'Kwd "*"; e2 = parse_simple; e = parse_more_mults (Prod(e1, e2)) >] -> e
%% | [< 'Kwd "/"; e2 = parse_simple; e = parse_more_mults (Quot(e1, e2)) >] -> e
%% | [< >] -> e1
%% and parse_simple = parser
%% [< 'Ident s >] -> Var s
%% | [< 'Int i >] -> Const(float i)
%% | [< 'Float f >] -> Const f
%% | [< 'Kwd "("; e = parse_expr; 'Kwd ")" >] -> e;;
%% let parse_expression = parser [< e = parse_expr; _ = Stream.empty >] -> e;;
%% \end{caml_example}
%%
%% Composing the lexer and parser, we finally obtain a function to read
%% an expression from a character string:
%% \begin{caml_example}
%% let read_expression s = parse_expression (lexer (Stream.of_string s));;
%% read_expression "2*(x+y)";;
%% \end{caml_example}
%% A small puzzle: why do we get different results in the following two
%% examples?
%% \begin{caml_example}
%% read_expression "x - 1";;
%% read_expression "x-1";;
%% \end{caml_example}
%% Answer: the generic lexer provided by "Genlex" recognizes negative
%% integer literals as one integer token. Hence, "x-1" is read as
%% the token "Ident \"x\"" followed by the token "Int(-1)"; this sequence
%% does not match any of the parser rules. On the other hand,
%% the second space in "x - 1" causes the lexer to return the three
%% expected tokens: "Ident \"x\"", then "Kwd \"-\"", then "Int(1)".
\section{s:standalone-programs}{Standalone OCaml programs}
All examples given so far were executed under the interactive system.
OCaml code can also be compiled separately and executed
non-interactively using the batch compilers "ocamlc" and "ocamlopt".
The source code must be put in a file with extension ".ml". It
consists of a sequence of phrases, which will be evaluated at runtime
in their order of appearance in the source file. Unlike in interactive
mode, types and values are not printed automatically; the program must
call printing functions explicitly to produce some output. The ";;" used
in the interactive examples is not required in
source files created for use with OCaml compilers, but can be helpful
to mark the end of a top-level expression unambiguously even when
there are syntax errors.
Here is a
sample standalone program to print the greatest common divisor
(gcd) of two numbers:
\begin{verbatim}
(* File gcd.ml *)
let rec gcd a b =
if b = 0 then a
else gcd b (a mod b);;
let main () =
let a = int_of_string Sys.argv.(1) in
let b = int_of_string Sys.argv.(2) in
Printf.printf "%d\n" (gcd a b);
exit 0;;
main ();;
\end{verbatim}
"Sys.argv" is an array of strings containing the command-line
parameters. "Sys.argv.(1)" is thus the first command-line parameter.
The program above is compiled and executed with the following shell
commands:
\begin{verbatim}
$ ocamlc -o gcd gcd.ml
$ ./gcd 6 9
3
$ ./gcd 7 11
1
\end{verbatim}
More complex standalone OCaml programs are typically composed of
multiple source files, and can link with precompiled libraries.
Chapters~\ref{c:camlc} and~\ref{c:nativecomp} explain how to use the
batch compilers "ocamlc" and "ocamlopt". Recompilation of
multi-file OCaml projects can be automated using third-party
build systems, such as the
\href{https://github.com/ocaml/ocamlbuild/}{ocamlbuild}
compilation manager.