Building (E)DSLs Like Lego

Emil Axelsson

Chalmers RAW FP Workshop, May 2012

Background

Different DSLs often have a lot in common

Even within the same DSL there are opportunities for reuse

Background

Using Haskell as host for embedded DSLs is supposed to make implementation much easier, but…

These issues are

  • nontrivial
  • reimplemented over and over again

Example DSLs

Arithmetic expressions with equality and conditions:

data Arith a where
  Lit  :: Show a => a -> Arith a
  Add  :: Num a  => Arith a -> Arith a -> Arith a
  Mul  :: Num a  => Arith a -> Arith a -> Arith a
  Equ  :: Eq a   => Arith a -> Arith a -> Arith Bool
  Cond :: Arith Bool -> Arith a -> Arith a -> Arith a

Hardware description:

data Signal a where
  Const :: Show a => a -> Signal a
  Inv   :: Signal Bool -> Signal Bool
  And   :: Signal Bool -> Signal Bool -> Signal Bool
  Or    :: Signal Bool -> Signal Bool -> Signal Bool
  Mux   :: Signal Bool -> Signal a -> Signal a -> Signal a

Similar?

Example DSLs: Interpretation

evalArith :: Arith a -> a
evalArith (Lit a)      = a
evalArith (Add a b)    = evalArith a + evalArith b
evalArith (Mul a b)    = evalArith a * evalArith b
evalArith (Equ a b)    = evalArith a == evalArith b
evalArith (Cond c t f) = if evalArith c
                           then evalArith t
                           else evalArith f
evalSignal :: Signal a -> a
evalSignal (Const a)   = a
evalSignal (Inv a)     = not (evalSignal a)
evalSignal (And a b)   = evalSignal a && evalSignal b
evalSignal (Or a b)    = evalSignal a || evalSignal b
evalSignal (Mux c t f) = if evalSignal c
                           then evalSignal t
                           else evalSignal f

Similar?

Example DSLs: Optimization

optArith :: Arith a -> Arith a
optArith (Add a b) = Add (optArith a) (optArith b)
optArith (Mul a b) = Mul (optArith a) (optArith b)
optArith (Equ a b) = Equ (optArith a) (optArith b)
optArith (Cond c t f)
    | eqArith t f = optArith t
    | otherwise   = Cond (optArith c) (optArith t) (optArith f)
optArith a = a
optSignal :: Signal a -> Signal a
optSignal (Inv a)   = Inv (optSignal a)
optSignal (And a b) = And (optSignal a) (optSignal b)
optSignal (Or a b)  = Or (optSignal a) (optSignal b)
optSignal (Mux c t f)
    | eqSignal t f = optSignal t
    | otherwise    = Mux (optSignal c) (optSignal t) (optSignal f)
optSignal a = a

Similar?

Note the repetitive boilerplate code.

Can we have some reuse, please?

The Expression Problem [Wadler 1998] (rephrased):

How to extend a data type with new cases and new interpretation functions without recompiling existing code.

But we want more!

  • Reduce boilerplate code in e.g. optimizations (algorithm reuse)
  • Generic AST traversals

Existing generic programming frameworks (such as SYB) enable generic traversals, but not composable data types.

Syntactic library

http://hackage.haskell.org/package/syntactic

Core library:

Additional “Lego” modules:

  • Generic interpretations (evaluation, rendering, etc.)
  • Generic syntactic constructs (conditions, variable binding, etc.)
  • Generic transformations (constant folding, common sub-expression elimination, etc.)
  • “Syntactic sugar” (host-language interaction)

Syntactic library

Idea:

Rest of the talk

Syntactic core

Model data types as application tree:

data AST dom a where
  Sym  :: Signature a => dom a -> AST dom a
  (:$) :: Typeable a  => AST dom (a :-> b)
                      -> AST dom (Full a)
                      -> AST dom b

infixl 1 :$

type ASTF dom a = AST dom (Full a)

AST example

Reference DSL:

data DocRef a where
  StrRef    :: String -> DocRef String
  AppendRef :: DocRef String -> DocRef String -> DocRef String

Representation using AST:

data DocSym a where
  Str    :: String -> DocSym (Full String)
  Append :: DocSym (String :-> String :-> Full String)

type Doc a = ASTF DocSym a

str :: String -> Doc String
str s = Sym (Str s)

append :: Doc String -> Doc String -> Doc String
append s1 s2 = Sym Append :$ s1 :$ s2
  • Note: Doc is non-recursive!

AST example

Isomorphisms:

Examples redone

Define symbol domains for all language constructs:

data Lit2 a where    -- Literals
  Lit2 :: Show a => a -> Lit2 (Full a)

data Num2 a where    -- Numeric operations
  Add2 :: Num a => Num2 (a :-> a :-> Full a)
  Mul2 :: Num a => Num2 (a :-> a :-> Full a)

data Equ2 a where    -- Equality
  Equ2 :: Eq a => Equ2 (a :-> a :-> Full Bool)

data Cond2 a where   -- Condition (and mux)
  Cond2 :: Cond2 (Bool :-> a :-> a :-> Full a)

data Logic2 a where  -- Logic operations
  Inv2 :: Logic2 (Bool :-> Full Bool)
  And2 :: Logic2 (Bool :-> Bool :-> Full Bool)
  Or2  :: Logic2 (Bool :-> Bool :-> Full Bool)

How to combine symbol domains?

Use higher-kinded version of the Either type:

data (dom1 :+: dom2) :: * -> * where
  InjL :: dom1 a -> (dom1 :+: dom2) a
  InjR :: dom2 a -> (dom1 :+: dom2) a

infixr :+:

Now we get:

type Arith2 a = ASTF (Lit2 :+: Num2 :+: Equ2 :+: Cond2) a

type Signal2 a = ASTF (Lit2 :+: Logic2 :+: Cond2) a

Isormorphisms:

  • Arith2 Arith
  • Signal2 Signal

Construction

Constructing the expression 1+2:

ex1 :: Arith2 Int
ex1 = Sym (InjR (InjL Add2))
   :$ Sym (InjL (Lit2 1))
   :$ Sym (InjL (Lit2 2))

Type class magic to the resque:

lit :: (Show a, Lit2 :<: dom) => a -> ASTF dom a
lit a = appSym (Lit2 a)

(<+>) :: (Num a, Typeable a, Num2 :<: dom) =>
         ASTF dom a -> ASTF dom a -> ASTF dom a
(<+>) = appSym Add2

Isomorphisms:   lit Lit,      <+> Add

Domain subsumption

Informally, a constraint

Sym :<: dom

is valid iff.

dom = ... :+: Sym :+: ...

Possible types of lit:

lit :: (Show a, Lit2 :<: dom) => a -> ASTF dom a

lit :: Int -> Arith2 Int

lit :: Bool -> Signal2 Bool

See Wouter Swierstra, Data types à la carte, JFP 2008.

More constructs

(<*>) :: (Num a, Typeable a, Num2 :<: dom) =>
         ASTF dom a -> ASTF dom a -> ASTF dom a

(<=>) :: (Eq a, Typeable a, Equ2 :<: dom) =>
         ASTF dom a -> ASTF dom a -> ASTF dom Bool

cond  :: (Typeable a, Cond2 :<: dom) =>
         ASTF dom Bool -> ASTF dom a -> ASTF dom a -> ASTF dom a

(<*>) = appSym Mul2
(<=>) = appSym Equ2
cond  = appSym Cond2

Evaluation

The denotation of a symbol:

type family Denotation a

type instance Denotation (Full a)  = a
type instance Denotation (a :-> b) = a -> Denotation b
*Main> :kind! Denotation (Int :-> Bool :-> Full Char)
Denotation (Int :-> Bool :-> Full Char) :: *
= Int -> Bool -> Char

Evaluation

Generic code:

class Eval expr where
  eval :: expr a -> Denotation a

instance Eval dom => Eval (AST dom) where
  eval (Sym s)  = eval s
  eval (s :$ a) = eval s $ eval a

instance (Eval sub1, Eval sub2) => Eval (sub1 :+: sub2) where
  eval (InjL s) = eval s
  eval (InjR s) = eval s

Evaluation

Domain-specific code:

instance Eval Lit2 where
  eval (Lit2 a) = a

instance Eval Num2 where
  eval Add2 = (+)
  eval Mul2 = (*)

instance Eval Equ2 where
  eval Equ2 = (==)

instance Eval Cond2 where
  eval Cond2 = \c t f -> if c then t else f

instance Eval Logic2 where
  eval Inv2 = not
  eval And2 = (&&)
  eval Or2  = (||)

Evaluation

ex2 = cond (lit 10 <=> a) a a
  where
    a = lit 5 <+> lit (6 :: Int)
*Main> :t ex2
ex2
  :: (Cond2 :<: dom, Equ2 :<: dom, Lit2 :<: dom, Num2 :<: dom) =>
     ASTF dom Int
*Main> eval (ex2 :: Arith2 Int)
11

Equality

Generic code:

class Equality expr where
  equal :: expr a -> expr b -> Bool

instance Equality dom => Equality (AST dom) where
  equal (Sym s1)   (Sym s2)   = equal s1 s2
  equal (s1 :$ a1) (s2 :$ a2) = equal s1 s2 && equal a1 a2
  equal _ _                   = False

instance (Equality sub1, Equality sub2) =>
         Equality (sub1 :+: sub2) where
  equal (InjL s1) (InjL s2) = equal s1 s2
  equal (InjR s1) (InjR s2) = equal s1 s2
  equal _ _                 = False

Equality

Domain-specific code:

instance Equality Lit2 where
  equal (Lit2 a) (Lit2 b) = show a == show b

instance Equality Num2 where
  equal Add2 Add2 = True
  equal Mul2 Mul2 = True
  equal _ _       = False

instance Equality Equ2 where
  equal Equ2 Equ2 = True

instance Equality Cond2 where
  equal Cond2 Cond2 = True

instance Equality Logic2 where
  equal Inv2 Inv2 = True
  equal And2 And2 = True
  equal Or2  Or2  = True
  equal _ _       = False

Transformation

The whole generic condition optimizer:

optimize :: (Cond2 :<: dom, Equality dom) => AST dom a -> AST dom a
optimize (cond :$ c :$ t :$ f)
  | Just Cond2 <- prj cond
  , equal t f              = optimize t
optimize (s :$ a)          = optimize s :$ optimize a
optimize sym               = sym

(prj from Data type à la carte)

Possible types:

optimize :: Arith2 a -> Arith2 a

optimize :: Signal2 a -> Signal2 a

Summary

What have we achieved so far?

Some syntactic noise has been introduced (e.g. :$ and prj), but the overhead is quite small.

The syntactic core library described in my ICFP submission:

A Generic Abstract Syntax Tree Model for Embedded Languages

Syntactic sugar

Application of ASTs not enough to make a useful EDSL.

Need to make use of existing Haskell features.

Fibonacci in Feldspar:

fib :: Data Index -> Data Index
fib n = fst $ forLoop n (1,1) $ \i (a,b) -> (b,a+b)

Syntactic sugar

Conversion between “sugar” types and ASTs:

class Typeable (Internal a) => Syntactic a dom | a -> dom where
  type Internal a
  desugar :: a -> ASTF dom (Internal a)
  sugar   :: ASTF dom (Internal a) -> a

Syntactic sugar: pairs

data Pair a where
  Pair :: Pair (a :-> b :-> Full (a,b))
  Fst  :: Pair ((a,b) :-> Full a)
  Snd  :: Pair ((a,b) :-> Full b)

instance (Syntactic a dom, Syntactic b dom, Pair :<: dom) =>
         Syntactic (a,b) dom where

  type Internal (a,b) = (Internal a, Internal b)

  desugar (a,b) = appSym Pair (desugar a) (desugar b)

  sugar ab = (sugar (appSym Fst ab), sugar (appSym Snd ab))

Syntactic sugar: variable binding

All variable binding can be handled by a single construct:

lambda :: ... =>
    (ASTF dom a -> ASTF dom b) -> ASTF dom (a -> b)

How to implement?

Two options:

  1. Supply a fresh variable to the embedded function. Can be done, but not very efficiently.
  2. Higher-Order Abstract Syntax (HOAS)

Higher-order abstract syntax

Unfortunately, can’t be implemented as a non-recursive AST symbol.

Sufficiently good hack:

type VarId = Integer

data Variable a where
  Variable :: Typeable a => VarId -> Variable (Full a)

data HOLambda dom a where
  HOLambda :: (Typeable a, Typeable b)
    => (ASTF (HODomain dom) a -> ASTF (HODomain dom) b)
    -> HOLambda dom (Full (a -> b))

type HODomain dom = HOLambda dom :+: Variable :+: dom

lambda :: (Typeable a, Typeable b)
    => (ASTF (HODomain dom) a -> ASTF (HODomain dom) b)
    -> ASTF (HODomain dom) (a -> b)
lambda = appSym . HOLambda

HOAST reification

First-order lambda:

data Lambda a where
  Lambda :: Typeable a => VarId -> Lambda (b :-> Full (a -> b))

HOAST to “FOAST”:

reify :: Typeable a
    => AST (HODomain dom) a
    -> AST (Lambda :+: Variable :+: dom) a

Binding semantics for free

The Syntactic library provides interpretations and transformations that correctly deal with variables, e.g:

alphaEq :: AlphaEq dom dom dom [(VarId,VarId)] =>
    ASTF dom a -> ASTF dom b -> Bool

instance (AlphaEq dom dom dom env, VarEqEnv env) =>
    AlphaEq Lambda Lambda dom env

instance (AlphaEq dom dom dom env, VarEqEnv env) =>
    AlphaEq Variable Variable dom env

...

By using lambda for all binding constructs the DSL implementor gets important parts of the implementation for free!

HOAST as syntactic sugar

instance (Syntactic a (HODomain dom), Syntactic b (HODomain dom)) =>
    Syntactic (a -> b) (HODomain dom) where

    type Internal (a -> b) = Internal a -> Internal b

    desugar f = lambda (desugar . f . sugar)

    sugar = error "I don't make sense :("

Feldspar

Feldspar example

Discrete cosine transform, naive O(n2) implementation:

dct2 :: Vector (Data Float) -> Vector (Data Float)
dct2 vec = mat *** vec
  where
    n   = length vec
    mat = indexedMat n n $ \k l ->
        let n' = i2f n
            k' = i2f k
            l' = i2f l
        in  cos ((k'*(2*l'+1)*pi)/(2*n'))

DCT, generated C code

void dct2(struct array * v0, struct array * out)
{
    uint32_t v4;
    float v5;

    v4 = getLength(v0);
    v5 = ((float)(v4) * 2.0f);
    initArray(out, sizeof(float), v4);
    for(uint32_t v1 = 0; v1 < v4; v1 += 1)
    {
        float v6;
        float v3;

        v6 = (float)(v1);
        at(float,out,v1) = 0.0f;
        for(uint32_t v2 = 0; v2 < v4; v2 += 1)
        {
            v3 = (at(float,out,v1) + (cosf((((v6 * (((float)(v2)
                 * 2.0f) + 1.0f)) * 3.1415927410125732f) / v5))
                 * at(float,v0,v2)));
            at(float,out,v1) = v3;
        }
    }
}

Feldspar implementation

Want to hide AST and Syntactic from the user:

newtype Data a = Data { unData :: ASTF FeldDomainAll a }

instance Type a => Syntactic (Data a) FeldDomainAll where
    type Internal (Data a) = a
    desugar = unData
    sugar   = Data

class (Syntactic a FeldDomainAll, Type (Internal a)) => Syntax a

instance Type a => Syntax (Data a)

For loop:

data Loop a where
    ForLoop :: Type a =>
      Loop (Length :-> a :-> (Index -> a -> a) :-> Full a)
type FeldDomainAll = HODomain (... :+: Loop :+: ...)

Feldspar implementation

Semantics of ForLoop:

instance Semantic Loop where
  semantics ForLoop = Sem "forLoop" forLoop
    where
      forLoop 0 init _    = init
      forLoop l init body = foldl (flip body) init [0..l-1]
instance ExprEq   Loop where exprEq = exprEqSem
instance Render   Loop where renderPart = renderPartSem
instance ToTree   Loop
instance Eval     Loop where evaluate = evaluateSem
instance EvalBind Loop where evalBindSym = evalBindSymDefault

instance AlphaEq dom dom dom env => AlphaEq Loop Loop dom env where
    alphaEqSym = alphaEqSymDefault

User interface based on sugar

forLoop :: Syntax st =>
    Data Length -> st -> (Data Index -> st -> st) -> st

forLoop len init step =
    sugar $ appSym ForLoop
        (desugar len)
        (desugar init)
        (desugar step)  -- Introduces two nested lambdas

User interface based on sugar

The function sugarSym automates the process:

forLoop2 :: Syntax st =>
    Data Length -> st -> (Data Index -> st -> st) -> st
forLoop2 = sugarSym ForLoop

Another possible type:

genericForLoop ::
    ( Syntax len, Syntax st, Syntax ix
    , Internal len ~ Index, Internal ix ~ Index) =>
      len -> st -> (ix -> st -> st) -> st
genericForLoop = sugarSym ForLoop

Feldspar implementation, summary

Feldspar contains lots of constructs, all implemented more or less in the same way as ForLoop.

Most of the implementation is straight-forward (ignoring the C back-end)!

Syntactic has made it very easy to experiment with new language constructs.

Downsides:

  • Optimization of the Feldspar AST quite complicated
    • Largely because of full extensibility

Syntactic problems and questions

Mostly suited for expression languages

Mutually recursive data types somewhat unnatural to represent

Needs support for declarative rewrite rules

Complicated type hackery sometimes needed

Language support for composable data types?

Relation to Scala, Modelyze?

Food for thought

Embedding makes life easier for the implementor

…at the cost of the end-user

How to improve errors?

How to improve syntax?

fib :: Data Index -> Data Index
fib n = fst for  _ < n
            init (a,b) = (0,1)
            step (b,a+b)

instead of

fib :: Data Index -> Data Index
fib n = fst $ forLoop n (1,1) $ \i (a,b) -> (b,a+b)

Food for thought: improved surface syntax

Syntax must be easily extensible (even by users)

Should be able to access derived combinators like

unroll :: Index -> st -> (Data Index -> st -> st) -> st
unroll n init step =
    Prelude.foldl (\st i -> step (value i) st) init [0 .. n-1]

(Feldspar uses derived combinators extensively.)

Custom syntax (for example):

ex = unroll i < n
     init   s = 0
     step   s+i

instead of

ex = unroll n 0 (+)