Commit 33434637 authored by Michael Hanus 's avatar Michael Hanus
Browse files

Initial revision

% $Id$
% $Log$
% environment for typing program texts:
\newcommand{\pr}[1]{\mbox{\tt #1}} % program text in normal text
\newcommand{\var}{{\cal V}ar}
\newcommand{\Dom}{{\cal D}om}
\newcommand{\VRan}{{\cal VR}an}
\newcommand{\todo}[1]{\fbox{\sc To do: #1}}
\newcommand{\dexp}{D} % a disjunctive expression
\newcommand{\cconj}{\ensuremath \mathop{\pr{\&}}} % concurrent conj. in math
\newcommand{\ttbs}{\mbox{\tt\char92}} % backslash in tt font
\newcommand{\ttus}{\mbox{\tt\char95}} % underscore in tt font
\newcommand{\sem}[1]{\ensuremath [\![#1]\!]} % double square brackets
\newcommand{\eval}[1]{{\cal E}\!val\sem{#1}}
\newcommand{\infrule}[2]{\begin{array}{@{}c@{}} #1 \\ \hline #2 \end{array}}
\newcommand{\pindex}[1]{\index{#1@{\tt #1}}} % program elements in index
\newcommand{\comment}[1]{} % ignore the argument
% commands for the syntax:
\newcommand{\term}[1]{{\tt #1}}
\newcommand{\opt}[1]{\rm [\it #1{\rm ]}}
\newcommand{\offside}[3]{\term{\{}\seq{#1}{#2}{#3} \term{\}}}
\newcommand{\seq}[3]{{\it #1$_1$ \term{#2} $\ldots$ \term{#2} #1$_#3$}}
\newcommand{\head}[1]{\makebox[1.3in][r]{\it #1}}
\newcommand{\lexicon}[1]{\production #1 {\rm see lexicon} end}
\def\production #1 #2 end {\head{#1}\makebox[3em]{\rm ::=}\it #2}
{\Huge\bf Curry}\\[4ex]
{\LARGE\bf An Integrated Functional Logic Language}\\[5ex]
{\large\bf Version 0.5.1}\\[1ex]
{\large \today}\\[8ex]
Michael Hanus$^1$ [editor] \\[3ex]
{\large Additional Contributors:}\\[2ex]
Sergio Antoy$^2$ \\
Herbert Kuchen$^3$ \\
Francisco J. L\'opez-Fraguas$^4$ \\
Juan Jos\'e Moreno Navarro$^5$ \\
Frank Steiner$^6$ \\[20ex]
(1) RWTH Aachen, Germany, {\tt} \\
(2) Portland State University, USA, {\tt} \\
(3) University of M\"unster, Germany, {\tt} \\
(4) Universidad Complutense de Madrid, Spain, {\tt} \\
(5) Universidad Polit\'ecnica de Madrid, Spain, {\tt} \\
(6) RWTH Aachen, Germany, {\tt} \\[5ex]~
Curry is a universal programming language aiming at the amalgamation
of the most important declarative programming paradigms,
namely functional programming and logic programming.
Curry combines in a seamless way features from functional programming
(nested expressions, lazy evaluation, higher-order functions),
logic programming (logical variables, partial data structures,
built-in search), and concurrent programming (concurrent evaluation
of constraints with synchronization on logical variables).
Moreover, Curry provides additional features in
comparison to the pure languages (compared to functional programming:
search, computing with partial information; compared to logic
programming: more efficient evaluation due to the deterministic
evaluation of functions).
Moreover, it also amalgamates the most
important operational principles developed in the area of integrated
functional logic languages: ``residuation''\index{residuation}
and ``narrowing''\index{narrowing} (see
\cite{Hanus94JLP} for a survey on functional logic programming).
The development of Curry is an international initiative intended to
provide a common platform for the research, teaching\footnote{%
Actually, Curry has been successfully applied to teach functional and
logic programming techniques in a single course without switching
between different programming languages. More details about
this aspect can be found in \cite{Hanus97DPLE}.}
and application
of integrated functional logic languages.
This document describes the features of Curry, its syntax and
operational semantics.
A Curry program specifies the semantics of expressions, where
goals, which occur in logic programming, are particular expressions
(of type \pr{Constraint}).
Executing a Curry program means simplifying an expression until
a value (or solution) is computed.
To distinguish between values and reducible expressions,
Curry has a strict distinction between (data)
\emph{constructors}\index{constructor} and
\emph{operations}\index{operation} or
\emph{defined functions}\index{defined function}\index{function!defined}
on these data.
Hence, a Curry program consists of a set of type and function declarations.
The type declarations define the computational domains (constructors)
and the function declarations the operations on these domains.
Predicates in the logic programming sense can be considered
as constraints, i.e., functions with result type \pr{Constraint}.
Modern functional languages (e.g., Haskell \cite{HudakPeytonJonesWadler92},
SML \cite{MilnerTofteHarper90}) allow the detection of many programming
errors at compile time by the use of polymorphic type systems.
Similar type systems are also used in modern logic languages
(e.g., G\"odel \cite{HillLloyd94}, $\lambda$Prolog \cite{NadathurMiller88}).
Therefore, Curry is a strongly typed language with a
Hindley/Milner-like polymorphic type system \cite{DamasMilner82}.\footnote{%
The extension of this type system to Haskell's type classes
is not included in the kernel language but will be considered
in a future version.}
Each object in a program has a unique type, where
the types of variables and operations can be omitted and are
reconstructed by a type inference mechanism.
\subsection{Datatype Declarations}
A datatype declaration\index{type declaration}\index{datatype!declaration}
\index{declaration!datatype} has the form
data $T$ $\alpha_1$ \ldots $\alpha_n$ = $C_1$ $\tau_{11}$ \ldots $\tau_{1n_1}$ | $\cdots$ | $C_k$ $\tau_{k1}$ \ldots $\tau_{kn_k}$
and introduces a new $n$-ary \emph{type constructor}\index{type constructor}
$T$ and $k$
new (data) \emph{constructors} $C_1,\ldots,C_k$, where each $C_i$ has the type
$\tau_{i1}$ -> $\cdots$ -> $\tau_{in_i}$ -> $T~\alpha_1\ldots\alpha_n$
($i=1,\ldots,k$). Each $\tau_{ij}$ is a
\emph{type expression}\index{type expression} built from the
\emph{type variables}\index{type variable} $\alpha_1,\ldots,\alpha_n$
and some type constructors. Curry has a number of built-in type constructors,
like \pr{Bool}, \pr{Int}, \pr{->} (function space), or, lists and tuples,
which are described in Section~\ref{sec-builtin-types}.
Since Curry is a higher-order language,
the types of functions (i.e., constructors or operations)
are written in their curried form
\pr{$\tau_1$ -> $\tau_2$ -> $\cdots$ -> $\tau_n$ -> $\tau$}
where $\tau$ is not a functional type.
In this case, $n$ is called the \emph{arity}\index{arity}\index{function!arity}
of the function.
For instance, the datatype declarations
data Bool = True | False
data List a = [] | a : List a
data Tree a = Leaf a | Node (Tree a) a (Tree a)
introduces the datatype \pr{Bool} with the 0-ary constructors
(\emph{constants})\index{constant} \pr{True} and \pr{False},
and the polymorphic types \pr{List a} and \pr{Tree a} of lists
and binary trees. Here, ``\pr{:}'' is an infix operator, i.e.,
``\pr{a:List a}'' is another notation for ``\pr{(:) a (List a)}''.
Lists are predefined in Curry, where the notation ``\pr{[a]}''
is used to denote list types (instead of ``\pr{List a}'').
The usual convenient notations for lists are supported,
i.e., \pr{[0,1,2]} is an abbreviation for
\pr{0:(1:(2:[]))} (see also Section~\ref{sec-builtin-types}).
A \emph{data term}\index{data term} is a variable $x$ or a
constructor application
$c~t_1\ldots t_n$ where $c$ is an $n$-ary constructor and
$t_1,\ldots,t_n$ are data terms. An \emph{expression}\index{expression}
is a variable or a (partial) application $\varphi~e_1 \ldots e_m$
where $\varphi$ is an $n$-ary function or
constructor, $m\leq n$, and $e_1,\ldots,e_m$ are expressions.
A data term or expression is called \emph{ground}\index{ground expression}
\index{expression!ground}\index{ground term}\index{data term!ground}
if it does not contain any variable.
Ground data terms correspond to values in the intended domain,
and expressions containing operations
should be evaluated to data terms.
Note that traditional functional languages compute on ground expressions,
whereas logic languages also allow non-ground expressions.
\subsection{Function Declarations}
A function is defined by a type declaration (which can be omitted)
followed by a list of defining equations.
A \emph{type declaration}\index{type declaration}\index{function declaration}
\index{declaration!function} for an $n$-ary function has the form
$f$ :: $\tau_1$ -> $\tau_2$ -> $\cdots$ -> $\tau_n$ -> $\tau$
where $\tau_1,\ldots,\tau_n,\tau$ are type expressions
and $\tau$ is not a functional type.
The simplest form of a
\emph{defining equation}\index{equation}\index{defining equation}
(or \emph{rule})\index{rule}\index{function rule}\pindex{=}
for an $n$-ary function $f$ ($n \geq 0$) is
$f~t_1\ldots t_n$ = $e$
where $t_1,\ldots,t_n$ are data terms and the
\emph{right-hand side}\index{right-hand side} $e$ is an expression.
The \emph{left-hand side}\index{left-hand side}
$f~t_1\ldots t_n$ must not contain multiple occurrences of a variable.
Functions can also be defined by
\emph{conditional equations}\index{conditional equation}%
\index{equation!conditional}\index{conditional rule}\index{rule!conditional}
which have the form
$f~t_1\ldots t_n$ | $c$ = $e$
where the \emph{condition}\index{condition} $c$ is a constraint
(cf.\ Section~\ref{sec-equality}). In order to apply a conditional equation,
its condition must be solved. Conditional equations with
identical left-hand sides can be also written as
$f~t_1\ldots t_n$ | $c_1$ = $e_1$
\,| $c_k$ = $e_k$
which is an abbreviation for the $k$ conditional equations
$f~t_1\ldots t_n$ | $c_1$ = $e_1$
$f~t_1\ldots t_n$ | $c_k$ = $e_k$
Note that one can define functions with a non-determinate
behavior by providing several rules with overlapping left-hand sides
or \emph{free variables}\index{free variable}\index{variable!free}
(i.e., variables which do not occur in the left-hand
side) in the conditions or right-hand sides of rules.
For instance, the following non-deterministic function inserts an element
at an arbitrary position in a list:
insert x [] = [x]
insert x (y:ys) = x : y : ys
insert x (y:ys) = y : insert x ys
Such \emph{non-deterministic functions}\index{non-deterministic function}
can be given a perfect declarative semantics \cite{GonzalesEtAl96ESOP}
and their implementation causes no overhead in Curry
since techniques for handling non-determinism are already contained in
the logic programming part (see also \cite{Antoy97ALP}).
However, deterministic functions are advantageous since
they provide for more efficient implementations (like the
\emph{dynamic cut} \cite{LoogenWinkler95}).
If one is interested only in defining deterministic functions,
this can be ensured by the following restrictions:
Each variable occurring in the right-hand side of a rule
must also occur in the corresponding left-hand side.
If \pr{~$f~t_1\ldots t_n$ | $c$ = $e$~} and
\pr{~$f~t'_1\ldots t'_n$ | $c'$ = $e'$~} are rules defining $f$ and
$\sigma$ is a \emph{substitution}\index{substitution}\footnote{%
A \emph{substitution} $\sigma$ is a mapping from variables into expressions
which is extended to a homomorphism on expressions by defining
$\sigma(f~t_1\ldots t_n) = f~\sigma(t_1)\ldots \sigma(t_n)$.
$\{x_1 \mapsto e_1,\ldots,x_k \mapsto e_k\}$ denotes the substitution
$\sigma$ with $\sigma(x_i)=e_i$ ($i=1,\ldots,k$) and
$\sigma(x)=x$ for all variables $x\not\in\{x_1,\ldots,x_k\}$.}
with $\sigma(t_1\ldots t_n)=\sigma(t'_1\ldots t'_n)$, then at least
one of the following conditions holds:
\item $\sigma(e)=\sigma(e')$ (\emph{compatibility of right-hand sides}).
\item $\sigma(c)$ and $\sigma(c')$ are not simultaneously satisfiable
(\emph{incompatibility of conditions}). A decidable approximation
of this condition can be found in \cite{KuchenEtAl96NGC}.
These conditions ensure the confluence\index{confluence}
of the rules if they are considered as a conditional term rewriting
system \cite{SuzukiMiddeldorpIda95RTA}.
Implementations of Curry may check these conditions and warn the
user if they are not satisfied. There are also more general
conditions to ensure confluence \cite{SuzukiMiddeldorpIda95RTA}
which can be checked instead of the above conditions.
% Note that we do not require
% the termination of the rewrite system so that the computation with
% infinite data structures is supported as in lazy functional languages.
% In contrast to functional languages,
% \emph{extra variables}\index{extra variables} in the conditions
% (i.e., variables which do not occur in the left-hand
% side) are allowed. These extra variables provide the power of
% logic programming since a search for appropriate values is necessary
% in order to apply a conditional rule with extra variables.
Note that defining \emph{equations of higher-type},
\index{equation!higher-order}\index{higher-order equation}
e.g., \pr{f = g} if \pr{f} and \pr{g} are of type \pr{Bool -> Bool},
are formally excluded in order to define a simple operational semantics based
on first-order rewriting.\footnote{Note that this is also necessary
in an extension of Curry which allows higher-order rewrite rules,
since rules with lambda-abstractions in left-hand sides which are
not of base type may cause a gap between
the standard notion of higher-order rewriting and the corresponding equational
theory \cite{Nipkow91LICS}.}
For convenience, a defining equation \pr{f = g}
between functions is allowed but will be interpreted in Curry
as syntactic sugar for the corresponding defining equation \pr{f x = g x}
on base types.
\subsubsection{Functions vs.\ Variables}
In lazy functional languages, different occurrences of the same
variable are shared to avoid multiple evaluations of
identical expressions. For instance, if we apply the rule
double x = x+x
to an expression \pr{double$\,t$}, we obtain the new expression
\pr{$t$+$t$} but both occurrences of $t$ denote the identical
expression, i.e., these subterms will be simultaneously evaluated.
Thus, \emph{several occurrences of the same variable are
always shared.} This is necessary not only for efficiency reasons
but it has also an influence on the soundness of the operational semantics
in the presence of non-deterministic functions
(see also \cite{GonzalesEtAl96ESOP}). For instance,
consider the non-deterministic function \pr{coin} defined by the rules
coin = 0
coin = 1
Thus, the expression \pr{coin} evaluates to \pr{0} or \pr{1}.
However, the result values of the expression \pr{(double\,\,coin)}
depend on the sharing of the two occurrences of \pr{coin}
after applying the rule for \pr{double}: if both occurrences
are shared (as in Curry), the results are \pr{0} or \pr{2},
otherwise (without sharing) the results are \pr{0}, \pr{1}, or \pr{2}.
The sharing of argument variables corresponds to the so-called
``call time choice'' in the declarative semantics
\cite{GonzalesEtAl96ESOP}: if a rule is applied to a function call,
a unique value must be assigned to each argument
(in the example above: either \pr{0} or \pr{1} must be assigned
to the occurrence of \pr{coin} when the expression
\pr{(double\,\,coin)} is evaluated). This does not mean that
functions have to be evaluated in a strict manner but this
behavior can be easily obtained by sharing the different occurrences
of a variable.
Since different occurrences of the same variable are always shared
but different occurrences of (textually identical) function
calls are not shared, it is important to distinguish
between variables and functions. Usually, all symbols
occurring at the top-level in the left-hand side of some rule
are considered as functions and the non-constructor symbols
in the arguments of the left-hand sides are considered
as variables. But note that there is a small exception
from this general rule in local declarations
(see Section~\ref{sec-localdecls}).
\subsubsection{Conditional Equations with Boolean Guards}
For convenience and compatibility with Haskell,
one can also write conditional equations with multiple guards
$f~t_1\ldots t_n$ | $b_1$ = $e_1$
\,| $b_k$ = $e_k$
where $b_1,\ldots,b_k$ ($k>0$) are expressions of type \pr{Bool}
(instead of \pr{Constraint}).
Such a rule is interpreted as in Haskell: the guards are successively evaluated
and the right-hand side of the first guard which is evaluated to \pr{True}
is the result of applying this equation. Thus, this equation
can be considered as an abbreviation for the rule
$f~t_1\ldots t_n$ = if $b_1$ then $e_1$ else
\,if $b_k$ then $e_k$ else \mbox{\it{}undefined}
(where {\it undefined} is some non-reducible function).
To write rules with several Boolean guards more nicely,
there is a Boolean function \pr{otherwise}\pindex{otherwise}
which is predefined as \pr{True}. For instance, the factorial
function can be declared as follows:
fac n | n==0 = 1
| otherwise = fac(n-1)*n
Since all guards have type Bool, this definition is equivalent to
(after a simple optimization)
fac n = if n==0 then 1 else fac(n-1)*n
To avoid confusion, it is not allowed to mix the Haskell-like
notation with Boolean guards and the standard notation
with constraints: in a rule with multiple guards,
all guards must be expressions of either type \pr{Constraint}
or type \pr{Bool} but not a simultaneous mixture of both.
The default type of a guard is \pr{Constraint}, i.e., in a rule like
f c x | c = x
the type of the variable \pr{c} is \pr{Constraint}
(provided that it is not restricted to \pr{Bool} by the use of \pr{f}).
\subsection{Local Declarations}
\index{local declaration}\index{declaration!local}
Since not all auxiliary functions should be globally
visible, it is possible to restrict the scope of declared entities.
Note that the scope of parameters in function definitions
is already restricted since the variables occurring in parameters
of the left-hand side are only visible in the corresponding
conditions and right-hand sides. The visibility of other
entities can be restricted using \pr{let} in expressions
or \pr{where} in defining equations.
An expression of the form \pr{let {\it{}decls} in {\it{}exp}}
\index{local declaration}\index{declaration!local}\pindex{let}
introduces a set of local names. The list of local declarations
\emph{decls} can contain function definitions as well as
definitions of constants by pattern matching. The names
introduced in these declarations are visible in the expression
\emph{exp} and the right-hand sides of the declarations in \emph{decls},
i.e., the local declarations can be mutually recursive.
For instance, the expression
let a=3*b
in 4*a
reduces to the value \pr{72}.
Auxiliary functions which are only introduced to define another
function should often not be visible outside. Therefore,
such functions can be declared in a \pr{where}-clause
\index{local declaration}\index{declaration!local}\pindex{where}
added to the right-hand side of the corresponding function definition.
This is done in the following definition of a fast exponentiation
where the auxiliary functions \pr{even} and \pr{square}
are only visible in the right-hand side of the rule for \pr{exp}:
exp b n = if n==0
then 1
else if even n then square (exp b (n `div` 2))
else b * (exp b (n-1))
where even n = n `mod` 2 == 0
square n = n*n
Similarly to \pr{let}, \pr{where}-clauses can contain
mutually recursive function definitions as well as
definitions of constants by pattern matching.
The names declared in the \pr{where}-clauses are only visible
in the corresponding conditions and right-hand sides.
As a further example, the following Curry program implements
the quicksort algorithm with a function \pr{split} which
splits a list into two lists containing the smaller and larger elements:
split e [] = ([],[])
split e (x:xs) | e>=x = (x:l,r)
| e<x = (l,x:r)
(l,r) = split e xs
qsort [] = []
qsort (x:xs) = qsort l ++ (x:qsort r)
(l,r) = split x xs
To distinguish between locally introduced functions and variables
(see also Section~\ref{sec-variable-sharing}),
we define a \emph{local pattern}\index{local pattern}\index{pattern!local}
as a (variable) identifier or an application where the top symbol
is a data constructor. If the left-hand side of a local declaration
in a \pr{let} or \pr{where} is a pattern, then all identifiers
in this pattern that are not data constructors are
considered as variables. For instance, the locally introduced
identifiers \pr{a}, \pr{b}, \pr{l}, and \pr{r} in the previous examples
are variables whereas the identifiers \pr{even} and \pr{square}
denote functions. Note that this rule exclude the
definition of 0-ary local functions since a definition of the form
``\pr{where f = \ldots}'' is considered as the definition of
a local variable \pr{f} by this rule which is usually the intended
interpretation (see previous examples).
\subsection{Free Variables}
Since Curry is intended to cover functional as well as logic
programming paradigms, expressions (or constraints, see
Section~\ref{sec-equality}) might contain free
(unbound, uninstantiated) variables.
\index{variable!free}\index{free variable}\index{unbound variable}
The idea is to compute values
for these variables such that the expression is reducible
to a data term or that the constraint is solvable.
For instance, consider the definitions
mother John = Christine
mother Alice = Christine
mother Andrew = Alice
Then we can compute a child of \pr{Alice} by solving the
equation (see Section~\ref{sec-equality})
\pr{mother x =:= Alice}. Here, \pr{x} is a free variable
which is instantiated to \pr{Andrew} in order to reduce
the equation's left-hand side to \pr{Alice}.
Similarly, we can compute a grandchild of \pr{Chistine}
by solving the equation \pr{mother (mother x) =:= Christine}
which yields the value \pr{Andrew} for \pr{x}.
In logic programming languages like Prolog, all free variables
are considered as existentially quantified at the top-level.
Thus, they are always implicitly declared at the top-level.
In a language with different nested scopes like Curry,
it is not clear to which scope an undeclared variable belongs
(the exact scope of a variable becomes particularly important
in the presence of search operators, see Section~\ref{sec-encapsearch},
where existential quantifiers and lambda abstractions are often
mixed). Therefore, Curry requires that
\emph{each free variable $x$ must be explicitly declared}
using a declaration \index{declaration!free}\index{free declaration}
of the form \pr{$x$ free}.\pindex{free} These declarations
can occur in \pr{where}-clauses or in a \pr{let} enclosing
a constraint. The variable is then introduced as unbound
with the same scoping rules as for all other local
entities (see Section~\ref{sec-localdecls}).
For instance, we can define
isgrandmother g | let c free in mother (mother c) =:= g = True
As a further example, consider the definition of the
concatentation of two lists:
append [] ys = ys
append (x:xs) ys = x : append xs ys
Then we can define the function \pr{last} which computes
the last element of a list by the rule
last l | append xs [x] =:= l = x where x,xs free
Since the variable \pr{xs} occurs in the condition but not in the
right-hand side, the following definition is also possible:
last l | let xs free in append xs [x] =:= l = x where x free
Note that the \pr{free} declarations can be freely mixed with
other local declarations after a \pr{let} or \pr{where}.
The only difference is that a declaration like ``\pr{let x free}''
introduces an existentially quantified variable (and, thus,
it can only occur in front of a constraint) whereas
other \pr{let} declarations introduce local functions or parameters.
Since all local declarations can be mutually recursive,
it is also possible to use local variables in the bodies
of the local functions in one \pr{let} declarations.
For instance, the following expression is valid (where the functions
\pr{h} and \pr{k} are defined elsewhere):
let f x = h x y
y free
g z = k y z
in c y (f (g 1))
Similarly to the usual interpretation of local definitions
by lambda lifting \cite{Johnsson85}, this expression
can be interpreted by transforming the local definitions
for \pr{f} and \pr{g} into global ones by adding the non-local
variables of the bodies as parameters:
f y x = h x y
g y z = k y z