Data

Basic Examples

We begin with simple datatypes, and then gradually introduce more complex cases: parameterized data types, indexed families of data types ("inductive families"), and inductive-recursive data types.

Simple Datatypes

Data types are introduced by data declarations. Perhaps the most basic example is the type of natural numbers. It can be defined as follows:

data Nat : Set where
zero : Nat
suc  : Nat -> Nat

Here Nat is the name of the type to be defined, and zero and suc are constructors. Such a type is called inductive, because intuitively, the type defines the "smallest" set containing an element zero and closed under a function suc, and one can prove properties of such sets by induction. Examples of elements of the type Nat are zero, (suc zero), (suc (suc zero)), and so on. Functions over an inductive type can be defined using pattern matching and structural recursion. For example, the addition function on the natural numbers can be defined by

_+_ : Nat -> Nat -> Nat
zero  + m = m
suc n + m = suc (n + m)

In passing, we note that the mixfix form of the _+_ operator can be used on both the left-hand and right-hand sides of such a definition.

To ensure normalisation, inductive occurrences of the type being defined in a data type declaration must appear in strictly positive positions. For instance, the following datatype is not allowed:

data Bad : Set where

because there is a negative occurrence of Bad in the argument to the constructor.

Parameterized Datatypes

Data types can depend on parameters. For example, consider the following declaration:

data List (A : Set) : Set where
[]   : List A
_::_ : A -> List A -> List A

This declares List A to be the set of lists of elements of type A. Since the definition is parameterized on a type A, we have effectively defined a family of types List A, one for each A.

As usual, the constructors are [] (for the empty list) and _::_ (for the "cons" operation). When these constructors are later used as functions, their types are implicitly quantified over A:

[]   : {A : Set} -> List A
_::_ : {A : Set} -> A -> List A -> List A

In the data type declaration, parameters are written after the name of the datatype and before the :. The scope of the parameters extends over the entire data type declaration, so their names (such as A in the above example) can be used in the constructor declarations.

Indexed Datatypes

Data types can also be indexed; in this case they are also called inductive families. The difference between an index and a parameter is that the index need not be constant throughout the definition of the type. For example, for a natural number n, consider the type IsEven n of proofs that n is even. This can be inductively defined as an indexed datatype as follows:

data IsEven : Nat -> Set where
evenZ  : IsEven zero
evenSS : (n : Nat) -> IsEven n -> IsEven (suc (suc n))

Note that, unlike the parameter A in the definition of lists above, the index n is not constant throughout the definition. Logicians and theoretical computer scientists may find it helpful to think of an indexed datatype as a set of judgements given by inference rules. Indeed, the above declaration of the IsEven predicate is exactly equivalent to the following inference rules in more traditional notation:

n : Nat       IsEven n
------------         --------------------------
IsEven 0                 IsEven (n+2)

In the data type declaration, indices appear to the right of the :. The scope of indices does not extend to the constructor declarations, and the indices can take on arbitrary values in the constructor return types.

When pattern matching on an element of an inductive family we get information about the index. To distinguish parts of a pattern that are determined by pattern matching (the inaccessible patterns) and the parts which constitutes the actual pattern matching, the inaccessible patterns are prefixed with a dot. For instance, we can prove that the sum of two even numbers is also even.

even+ : (n m : Nat) -> IsEven n -> IsEven m -> IsEven (n + m)
even+ .zero          m  evenZ        em = em
even+ .(suc (suc n)) m (evenSS n en) em =
evenSS (n + m) (even+ n m en em)

The proof is by recursion on the proof that n is even. Pattern matching on this proof will force the value of n and hence the patterns for n are prefixed with a dot to indicate that they are not part of the pattern matching. In this case we can make n and m implicit and write the proof as

even+ : {n m : Nat} -> IsEven n -> IsEven m -> IsEven (n + m)
even+  evenZ        em = em
even+ (evenSS n en) em = evenSS _ (even+ en em)
Combining parameters and indices

A data type declaration can contain both parameters and indices. For example, consider the following definition, common in Agda, of vectors of length n with elements from A:

data Vec (A : Set) : ℕ -> Set where
[]  : Vec A zero
_∷_ : A -> Vec A n -> Vec A (suc n)

Here, A is a parameter, appearing to the left of the :, whereas n : ℕ is an index, appearing to the right of the :.

Inductive-recursive types

It is possible, and sometimes necessary, to define a data type A and a function f on that data type by simultaneous mutual induction/recursion: the definition of f(x) requires x : A to be already defined, and the definition of x : A may require f(y) to be already defined for some structurally smaller element y : A. Such a construction is easily possible in Agda, and is called an inductive-recursive type.

For a (contrived) example, consider a type MyList of lists of natural numbers, with the property that each list element is greater than the sum of all list elements to its right. The constructor cons takes three arguments: the head of the list n, the tail of the list ns, and a proof that n > sum ns. We can define the type MyList and the function sum simultaneously by mutual induction/recursion:

data MyList : Set
sum : MyList -> Nat

data MyList where
nil : MyList
cons : (n : Nat) -> (ns : MyList) -> (n > sum ns) -> MyList

sum nil = zero
sum (cons n ns _) = n + sum ns

Syntax

Data declarations include:

• inductive families, and,
• coinductive families.

The syntax of a datatype declaration is:

data <name> <parameters> : <indices> where
<constructor> ... <constructor> : <term>
...

For more flexibility (for example, to allow mutually recursive definitions), the declaration part can also be separated from the definition part. In this case, the syntax for the declaration part is:

data <name> <parameters> : <indices>

And the syntax for the definition part is:

data <name> <parameters> where
<constructor> ... <constructor> : <term>
...

The parts of the syntax are as follows:

<name> is a new unqualified name, the name of the datatype being declared.

<parameters> is an optional telescope. This specifies arguments to <name> that do not vary in the types of the constructors. Variables introduced in <parameters> are in scope for the indices and all the constructor lines.

<indices> is a term of the form <index1> -> <index2> -> ... -> <indexn> -> <universe>, denoting a function space whose innermost range is a universe. This specifies the remaining arguments to <name>, as well as the result type (the universe). The arguments <index1>, ..., <indexn> are called indices and do not need to be constant in the types of the constructors. The scope of any variables introduced in <indices> extends over the indices only.

There may be zero or more <constructor> lines. Each <constructor> is an unqualified name. It need not be new; constructors can be overloaded. The <term> following a constructor is a term denoting a function space whose innermost range is a full application of <name>.