# Building (E)DSLs Like Lego

Chalmers RAW FP Workshop, May 2012

# Background

Different DSLs often have a lot in common

• Similar constructs (e.g. conditionals, tuples, etc.)
• Similar interpretations/transformations (evaluation, constant folding, etc.)

Even within the same DSL there are opportunities for reuse

• E.g. many constructs introduce new variables

# Background

Using Haskell as host for embedded DSLs is supposed to make implementation much easier, but…

• How to deal with variable binding?
• Sharing?
• Unpacking/packing of structured values
• Etc.

These issues are

• nontrivial
• reimplemented over and over again

# Example DSLs

Arithmetic expressions with equality and conditions:

data Arith a where
Lit  :: Show a => a -> Arith a
Add  :: Num a  => Arith a -> Arith a -> Arith a
Mul  :: Num a  => Arith a -> Arith a -> Arith a
Equ  :: Eq a   => Arith a -> Arith a -> Arith Bool
Cond :: Arith Bool -> Arith a -> Arith a -> Arith a

Hardware description:

data Signal a where
Const :: Show a => a -> Signal a
Inv   :: Signal Bool -> Signal Bool
And   :: Signal Bool -> Signal Bool -> Signal Bool
Or    :: Signal Bool -> Signal Bool -> Signal Bool
Mux   :: Signal Bool -> Signal a -> Signal a -> Signal a

Similar?

# Example DSLs: Interpretation

evalArith :: Arith a -> a
evalArith (Lit a)      = a
evalArith (Add a b)    = evalArith a + evalArith b
evalArith (Mul a b)    = evalArith a * evalArith b
evalArith (Equ a b)    = evalArith a == evalArith b
evalArith (Cond c t f) = if evalArith c
then evalArith t
else evalArith f
evalSignal :: Signal a -> a
evalSignal (Const a)   = a
evalSignal (Inv a)     = not (evalSignal a)
evalSignal (And a b)   = evalSignal a && evalSignal b
evalSignal (Or a b)    = evalSignal a || evalSignal b
evalSignal (Mux c t f) = if evalSignal c
then evalSignal t
else evalSignal f

Similar?

# Example DSLs: Optimization

optArith :: Arith a -> Arith a
optArith (Mul a b) = Mul (optArith a) (optArith b)
optArith (Equ a b) = Equ (optArith a) (optArith b)
optArith (Cond c t f)
| eqArith t f = optArith t
| otherwise   = Cond (optArith c) (optArith t) (optArith f)
optArith a = a
optSignal :: Signal a -> Signal a
optSignal (Inv a)   = Inv (optSignal a)
optSignal (And a b) = And (optSignal a) (optSignal b)
optSignal (Or a b)  = Or (optSignal a) (optSignal b)
optSignal (Mux c t f)
| eqSignal t f = optSignal t
| otherwise    = Mux (optSignal c) (optSignal t) (optSignal f)
optSignal a = a

Similar?

Note the repetitive boilerplate code.

# Can we have some reuse, please?

The Expression Problem [Wadler 1998] (rephrased):

How to extend a data type with new cases and new interpretation functions without recompiling existing code.

• Haskell data types are closed!
• Open (or composable) data types would allow reuse of language constructs.
• Side-effect: Implement language features independently

But we want more!

• Reduce boilerplate code in e.g. optimizations (algorithm reuse)
• Generic AST traversals

Existing generic programming frameworks (such as SYB) enable generic traversals, but not composable data types.

# Syntactic library

http://hackage.haskell.org/package/syntactic

Core library:

• The AST type
• An encoding of composable data types in Haskell
• Direct support for generic traversals

• Generic interpretations (evaluation, rendering, etc.)
• Generic syntactic constructs (conditions, variable binding, etc.)
• Generic transformations (constant folding, common sub-expression elimination, etc.)
• “Syntactic sugar” (host-language interaction)

# Syntactic library

Idea:

• Assemble a DSL implementation from reusable components
• Only truly domain-specific constructs need to be added for each new DSL
• The generic interpretations are easily extended for the new constructs

# Rest of the talk

• Syntactic core library
• Examples redone
• Host-language interaction
• Variable binding
• Real use-case: The Feldspar DSL

# Syntactic core

Model data types as application tree:

data AST dom a where
Sym  :: Signature a => dom a -> AST dom a
(:$) :: Typeable a => AST dom (a :-> b) -> AST dom (Full a) -> AST dom b infixl 1 :$

type ASTF dom a = AST dom (Full a)
• Typed abstract syntax
• Parameterized on symbol domain dom

# AST example

Reference DSL:

data DocRef a where
StrRef    :: String -> DocRef String
AppendRef :: DocRef String -> DocRef String -> DocRef String

Representation using AST:

data DocSym a where
Str    :: String -> DocSym (Full String)
Append :: DocSym (String :-> String :-> Full String)

type Doc a = ASTF DocSym a

str :: String -> Doc String
str s = Sym (Str s)

append :: Doc String -> Doc String -> Doc String
append s1 s2 = Sym Append :$s1 :$ s2
• Note: Doc is non-recursive!

# AST example

Isomorphisms:

• Doc $\simeq$ DocRef
• str $\simeq$ StrRef
• append $\simeq$ AppendRef

# Examples redone

Define symbol domains for all language constructs:

data Lit2 a where    -- Literals
Lit2 :: Show a => a -> Lit2 (Full a)

data Num2 a where    -- Numeric operations
Add2 :: Num a => Num2 (a :-> a :-> Full a)
Mul2 :: Num a => Num2 (a :-> a :-> Full a)

data Equ2 a where    -- Equality
Equ2 :: Eq a => Equ2 (a :-> a :-> Full Bool)

data Cond2 a where   -- Condition (and mux)
Cond2 :: Cond2 (Bool :-> a :-> a :-> Full a)

data Logic2 a where  -- Logic operations
Inv2 :: Logic2 (Bool :-> Full Bool)
And2 :: Logic2 (Bool :-> Bool :-> Full Bool)
Or2  :: Logic2 (Bool :-> Bool :-> Full Bool)

# How to combine symbol domains?

Use higher-kinded version of the Either type:

data (dom1 :+: dom2) :: * -> * where
InjL :: dom1 a -> (dom1 :+: dom2) a
InjR :: dom2 a -> (dom1 :+: dom2) a

infixr :+:

Now we get:

type Arith2 a = ASTF (Lit2 :+: Num2 :+: Equ2 :+: Cond2) a

type Signal2 a = ASTF (Lit2 :+: Logic2 :+: Cond2) a

Isormorphisms:

• Arith2 $\simeq$ Arith
• Signal2 $\simeq$ Signal

# Construction

Constructing the expression $1+2$:

ex1 :: Arith2 Int
ex1 = Sym (InjR (InjL Add2))
:$Sym (InjL (Lit2 1)) :$ Sym (InjL (Lit2 2))
• Injections grow linearly with the number of composed domains

Type class magic to the resque:

lit :: (Show a, Lit2 :<: dom) => a -> ASTF dom a
lit a = appSym (Lit2 a)

(<+>) :: (Num a, Typeable a, Num2 :<: dom) =>
ASTF dom a -> ASTF dom a -> ASTF dom a
(<+>) = appSym Add2

Isomorphisms:   lit $\simeq$ Lit,      <+> $\simeq$ Add

# Domain subsumption

Informally, a constraint

Sym :<: dom

is valid iff.

dom = ... :+: Sym :+: ...

Possible types of lit:

lit :: (Show a, Lit2 :<: dom) => a -> ASTF dom a

lit :: Int -> Arith2 Int

lit :: Bool -> Signal2 Bool

See Wouter Swierstra, Data types à la carte, JFP 2008.

# More constructs

(<*>) :: (Num a, Typeable a, Num2 :<: dom) =>
ASTF dom a -> ASTF dom a -> ASTF dom a

(<=>) :: (Eq a, Typeable a, Equ2 :<: dom) =>
ASTF dom a -> ASTF dom a -> ASTF dom Bool

cond  :: (Typeable a, Cond2 :<: dom) =>
ASTF dom Bool -> ASTF dom a -> ASTF dom a -> ASTF dom a

(<*>) = appSym Mul2
(<=>) = appSym Equ2
cond  = appSym Cond2

# Evaluation

The denotation of a symbol:

type family Denotation a

type instance Denotation (Full a)  = a
type instance Denotation (a :-> b) = a -> Denotation b
*Main> :kind! Denotation (Int :-> Bool :-> Full Char)
Denotation (Int :-> Bool :-> Full Char) :: *
= Int -> Bool -> Char

# Evaluation

Generic code:

class Eval expr where
eval :: expr a -> Denotation a

instance Eval dom => Eval (AST dom) where
eval (Sym s)  = eval s
eval (s :$a) = eval s$ eval a

instance (Eval sub1, Eval sub2) => Eval (sub1 :+: sub2) where
eval (InjL s) = eval s
eval (InjR s) = eval s

# Evaluation

Domain-specific code:

instance Eval Lit2 where
eval (Lit2 a) = a

instance Eval Num2 where
eval Mul2 = (*)

instance Eval Equ2 where
eval Equ2 = (==)

instance Eval Cond2 where
eval Cond2 = \c t f -> if c then t else f

instance Eval Logic2 where
eval Inv2 = not
eval And2 = (&&)
eval Or2  = (||)
• Independent instances
• Extensible

# Evaluation

ex2 = cond (lit 10 <=> a) a a
where
a = lit 5 <+> lit (6 :: Int)
*Main> :t ex2
ex2
:: (Cond2 :<: dom, Equ2 :<: dom, Lit2 :<: dom, Num2 :<: dom) =>
ASTF dom Int
*Main> eval (ex2 :: Arith2 Int)
11

# Equality

Generic code:

class Equality expr where
equal :: expr a -> expr b -> Bool

instance Equality dom => Equality (AST dom) where
equal (Sym s1)   (Sym s2)   = equal s1 s2
equal (s1 :$a1) (s2 :$ a2) = equal s1 s2 && equal a1 a2
equal _ _                   = False

instance (Equality sub1, Equality sub2) =>
Equality (sub1 :+: sub2) where
equal (InjL s1) (InjL s2) = equal s1 s2
equal (InjR s1) (InjR s2) = equal s1 s2
equal _ _                 = False

# Equality

Domain-specific code:

instance Equality Lit2 where
equal (Lit2 a) (Lit2 b) = show a == show b

instance Equality Num2 where
equal Mul2 Mul2 = True
equal _ _       = False

instance Equality Equ2 where
equal Equ2 Equ2 = True

instance Equality Cond2 where
equal Cond2 Cond2 = True

instance Equality Logic2 where
equal Inv2 Inv2 = True
equal And2 And2 = True
equal Or2  Or2  = True
equal _ _       = False

# Transformation

The whole generic condition optimizer:

optimize :: (Cond2 :<: dom, Equality dom) => AST dom a -> AST dom a
optimize (cond :$c :$ t :$f) | Just Cond2 <- prj cond , equal t f = optimize t optimize (s :$ a)          = optimize s :$optimize a optimize sym = sym (prj from Data type à la carte) Possible types: optimize :: Arith2 a -> Arith2 a optimize :: Signal2 a -> Signal2 a # Summary What have we achieved so far? • Removed all repetition in the original DSLs • Reusing syntactic constructs (Lit2/lit and Cond2/cond) • Generic interpretation (eval) • Generic optimization (optimize) • No boilerplate code (more or less) Some syntactic noise has been introduced (e.g. :$ and prj), but the overhead is quite small.

The syntactic core library described in my ICFP submission:

A Generic Abstract Syntax Tree Model for Embedded Languages

# Syntactic sugar

Application of ASTs not enough to make a useful EDSL.

Need to make use of existing Haskell features.

Fibonacci in Feldspar:

fib :: Data Index -> Data Index
fib n = fst $forLoop n (1,1)$ \i (a,b) -> (b,a+b)
• Uses tuples and variable binding

# Syntactic sugar

Conversion between “sugar” types and ASTs:

class Typeable (Internal a) => Syntactic a dom | a -> dom where
type Internal a
desugar :: a -> ASTF dom (Internal a)
sugar   :: ASTF dom (Internal a) -> a
• (desugar . sugar) must preserve semantics (eval), but not necessarily syntax.

# Syntactic sugar: pairs

data Pair a where
Pair :: Pair (a :-> b :-> Full (a,b))
Fst  :: Pair ((a,b) :-> Full a)
Snd  :: Pair ((a,b) :-> Full b)

instance (Syntactic a dom, Syntactic b dom, Pair :<: dom) =>
Syntactic (a,b) dom where

type Internal (a,b) = (Internal a, Internal b)

desugar (a,b) = appSym Pair (desugar a) (desugar b)

sugar ab = (sugar (appSym Fst ab), sugar (appSym Snd ab))

# Syntactic sugar: variable binding

All variable binding can be handled by a single construct:

lambda :: ... =>
(ASTF dom a -> ASTF dom b) -> ASTF dom (a -> b)

How to implement?

Two options:

1. Supply a fresh variable to the embedded function. Can be done, but not very efficiently.
2. Higher-Order Abstract Syntax (HOAS)

# Higher-order abstract syntax

Unfortunately, can’t be implemented as a non-recursive AST symbol.

Sufficiently good hack:

type VarId = Integer

data Variable a where
Variable :: Typeable a => VarId -> Variable (Full a)

data HOLambda dom a where
HOLambda :: (Typeable a, Typeable b)
=> (ASTF (HODomain dom) a -> ASTF (HODomain dom) b)
-> HOLambda dom (Full (a -> b))

type HODomain dom = HOLambda dom :+: Variable :+: dom

lambda :: (Typeable a, Typeable b)
=> (ASTF (HODomain dom) a -> ASTF (HODomain dom) b)
-> ASTF (HODomain dom) (a -> b)
lambda = appSym . HOLambda

# HOAST reification

First-order lambda:

data Lambda a where
Lambda :: Typeable a => VarId -> Lambda (b :-> Full (a -> b))

HOAST to “FOAST”:

reify :: Typeable a
=> AST (HODomain dom) a
-> AST (Lambda :+: Variable :+: dom) a
• The ugly recursive HODomain is only used when constructing terms
• All subsequent processing is done after reify
• Type of reify guarantees no HOLambdas in result

The Syntactic library provides interpretations and transformations that correctly deal with variables, e.g:

alphaEq :: AlphaEq dom dom dom [(VarId,VarId)] =>
ASTF dom a -> ASTF dom b -> Bool

instance (AlphaEq dom dom dom env, VarEqEnv env) =>
AlphaEq Lambda Lambda dom env

instance (AlphaEq dom dom dom env, VarEqEnv env) =>
AlphaEq Variable Variable dom env

...
• Also evaluation, code-motion, etc.

By using lambda for all binding constructs the DSL implementor gets important parts of the implementation for free!

• See e.g. Feldspar’s forLoop (soon)

# HOAST as syntactic sugar

instance (Syntactic a (HODomain dom), Syntactic b (HODomain dom)) =>
Syntactic (a -> b) (HODomain dom) where

type Internal (a -> b) = Internal a -> Internal b

desugar f = lambda (desugar . f . sugar)

sugar = error "I don't make sense :("

# Feldspar example

Discrete cosine transform, naive $O\left({n}^{2}\right)$ implementation:

dct2 :: Vector (Data Float) -> Vector (Data Float)
dct2 vec = mat *** vec
where
n   = length vec
mat = indexedMat n n $\k l -> let n' = i2f n k' = i2f k l' = i2f l in cos ((k'*(2*l'+1)*pi)/(2*n')) • Vector is another example of syntactic sugar • Vector enables powerful optimizations on-the-fly (before generating an AST) # DCT, generated C code void dct2(struct array * v0, struct array * out) { uint32_t v4; float v5; v4 = getLength(v0); v5 = ((float)(v4) * 2.0f); initArray(out, sizeof(float), v4); for(uint32_t v1 = 0; v1 < v4; v1 += 1) { float v6; float v3; v6 = (float)(v1); at(float,out,v1) = 0.0f; for(uint32_t v2 = 0; v2 < v4; v2 += 1) { v3 = (at(float,out,v1) + (cosf((((v6 * (((float)(v2) * 2.0f) + 1.0f)) * 3.1415927410125732f) / v5)) * at(float,v0,v2))); at(float,out,v1) = v3; } } } • Note: The matrix has been fused away! # Feldspar implementation Want to hide AST and Syntactic from the user: newtype Data a = Data { unData :: ASTF FeldDomainAll a } instance Type a => Syntactic (Data a) FeldDomainAll where type Internal (Data a) = a desugar = unData sugar = Data class (Syntactic a FeldDomainAll, Type (Internal a)) => Syntax a instance Type a => Syntax (Data a) For loop: data Loop a where ForLoop :: Type a => Loop (Length :-> a :-> (Index -> a -> a) :-> Full a) type FeldDomainAll = HODomain (... :+: Loop :+: ...) # Feldspar implementation Semantics of ForLoop: instance Semantic Loop where semantics ForLoop = Sem "forLoop" forLoop where forLoop 0 init _ = init forLoop l init body = foldl (flip body) init [0..l-1] instance ExprEq Loop where exprEq = exprEqSem instance Render Loop where renderPart = renderPartSem instance ToTree Loop instance Eval Loop where evaluate = evaluateSem instance EvalBind Loop where evalBindSym = evalBindSymDefault instance AlphaEq dom dom dom env => AlphaEq Loop Loop dom env where alphaEqSym = alphaEqSymDefault • Methods inherited from the Semantic instance • No need to worry about variable binding • Constant folding and code motion also obtained for free # User interface based on sugar forLoop :: Syntax st => Data Length -> st -> (Data Index -> st -> st) -> st forLoop len init step = sugar$ appSym ForLoop
(desugar len)
(desugar init)
(desugar step)  -- Introduces two nested lambdas

# User interface based on sugar

The function sugarSym automates the process:

forLoop2 :: Syntax st =>
Data Length -> st -> (Data Index -> st -> st) -> st
forLoop2 = sugarSym ForLoop
• Type directed implementation
• Didn’t even need to think about HOAST

Another possible type:

genericForLoop ::
( Syntax len, Syntax st, Syntax ix
, Internal len ~ Index, Internal ix ~ Index) =>
len -> st -> (ix -> st -> st) -> st
genericForLoop = sugarSym ForLoop
• Probably too general…

# Feldspar implementation, summary

Feldspar contains lots of constructs, all implemented more or less in the same way as ForLoop.

Most of the implementation is straight-forward (ignoring the C back-end)!

Syntactic has made it very easy to experiment with new language constructs.

Downsides:

• Optimization of the Feldspar AST quite complicated
• Largely because of full extensibility

# Syntactic problems and questions

Mostly suited for expression languages

Mutually recursive data types somewhat unnatural to represent

Needs support for declarative rewrite rules

Complicated type hackery sometimes needed

Language support for composable data types?

Relation to Scala, Modelyze?

# Food for thought

Embedding makes life easier for the implementor

…at the cost of the end-user

• Irrelevant type errors instead of informative syntax errors
• Syntactic sometimes makes type error worse

How to improve errors?

How to improve syntax?

fib :: Data Index -> Data Index
fib n = fst for  _ < n
init (a,b) = (0,1)
step (b,a+b)

fib :: Data Index -> Data Index
fib n = fst $forLoop n (1,1)$ \i (a,b) -> (b,a+b)

# Food for thought: improved surface syntax

Syntax must be easily extensible (even by users)

Should be able to access derived combinators like

unroll :: Index -> st -> (Data Index -> st -> st) -> st
unroll n init step =
Prelude.foldl (\st i -> step (value i) st) init [0 .. n-1]

(Feldspar uses derived combinators extensively.)

Custom syntax (for example):

ex = unroll i < n
init   s = 0
step   s+i

ex = unroll n 0 (+)