Developing a Simple (E)DSL with Free Monads (Part 1)

This is the first part of a series of posts where I’ll be telling you how to develop an Embedded Domain Specific Language (EDSL) whose primitives are certainly similar to the ones you could find in a Map API (put, get, update…). We’ll be using Free Monads, which are really well explained in this article by Gabriel Gonzalez. We start today showing Haskell code (where I’m a complete novice), but perhaps we’ll show the Scala approach in the next posts, since we’re currently comparing both in order to determine which one is better to fulfill certain academic purposes. The language is part of overreact, a project which is hosted at github.

Introduction

We aim to develop a simple language which is able to store, update, consult and delete values. And we want to deploy external pointers to the stored values, which we’ve called entities. On the other hand, let’s suppose that we don’t even know the final context where the program will be run, I mean, we want the language to be as general as possible. Thereby, we don’t know if the values will be stored in a map, a database or even the web. To do that possible, we’ll work with Free Monads, that let us separate structure from interpretation. Indeed, as you’ll see by the end of this post, we’re going to be able to develop an application with our language, specifically a ZipWithIndex, even without an available interpreter!

Firstly, I’ll tell you the very basics of the language we aim to make. Then, I’ll show the application that we want to implement with this new language. Finally, we’ll glimpse behind the scenes to find out the inner details.

Language Basics

As we mentioned before, the language we want to deploy will empower us to create, update and query any value. All the values, which doesn’t necessarily belong to the same type, conform the state of the application. This state is supposed to be evolving until the objective of the application is eventually satisfied. We can see that evolution as a sequence of transformations on the values that the state keeps internally. Since we’re dealing with immutability we need something to keep track of the evolving values. Thereby, we’ll add entities, which represent the evolution of a value through the time. We can see them as external references to the stored values. So, there’s something that we’re lacking: how we transform the state. We’ve added some primitive actions for that purpose:

    Put stores a new value in the state. It creates and returns the corresponding entity.
    Update uses an entity and a new value to replace the current one.
    Get uses an entity to extract its value and returns it. Notice that there’s no transformation going on, but we can see this as a transformation to the very same state.
    Delete takes the entity to remove the stored value from the state.

This actions can be combined to create new ones. We use the term derived action to refer to them. For instance, we could create modify, which is an action that firstly gets the current value and secondly applies a transformation over it whose result is used to update the old one.

So far so good, but what kind of applications can we implement with all this concepts? Today, we’ll start with ZipWithIndex, well known stuff in functional programming.

ZipWithIndex (ZWI)

This is a classical problem: given a list of things, a list of tuples index-thing should be generated. For instance, if the input for ZWI was [“Italy”, “Portugal”, “Spain”, “Greece”] the result would be [(1, “Italy”), (2, “Portugal”), (3, “Spain”), (4, “Greece”)]. The algorithm we’ll employ to implement that logic is as follows:

If an empty list is received, the empty list is returned. Otherwise, a singleton list is created, hosting a tuple with the index 1 and the first element. This list is used as the accumulator where the final result is kept. Once we have the first element as a reference to know what the next index should be, we have to iterate over the rest of the input list. For each element, we prepend a new tuple with the corresponding index and the element itself. Finally, we reverse the accumulator to get the final result.

The previous lines lead us to the following implementation in our brand new language:

zipWithIndex [] = return []
zipWithIndex (x:xs) = do
  acc <- put [(1, x)]
  forM_ xs (\e -> do
    modify (\l -> (1 + fst (head l), e) : l) acc)
  retrieve reverse acc

Line 1 shows the empty list case. On the other hand, Line 2 shows the beginning of the non-empty list case. Firstly, in Line 3, we register the new value containing the tuple with the first index and the first element. Obviously, we use put for that purpose. Then, we need to iterate over the rest of the elements. We do so by using the forM_ loop utility from Control.Monad in Line 4, that let us iterate over a list as long as we return a suitable monad for each element. Line 5 is where the meat and the potatoes are. modify takes the prepending function and the entity to modify, in this case the accumulator, so we’ll end up with the final result. Since appending is not optimal, we had to use prepending instead, so the last action to achieve consist of getting a slightly modified version of the value: the reversed list. And this is where I introduce you a new derived action, retrieve, that gets the current value and applies a transformation to it, while keeping the stored value untouched.

Now, we understand the language basics and what kind of applications can be implemented with it. It seems like a good moment to show the language inner details.

Behind the Scenes

The core is the module where the language primitives and combinators are defined. We can find there our four actions:

data Action e next = forall v. (Typeable v) => Put v ((e v) -> next)
  | forall v. (Typeable v) => Update (e v) v next
  | forall v. (Typeable v) => Get (e v) (v -> next)
  | forall v. (Typeable v) => Delete (e v) next

In the above lines we see that a new type Action is created, which is parameterized with the entity type constructor -the type of entities we’ll be using- and next -which is the magic field that Free Monads use to chain the primitives. Then, we can see the primitives we mentioned earlier. They all use an existential type for the entities, since we want to store any kind of entity -forget about Typeable, it just provides information to work with heterogeneous collections, we’ll come back to it in the next post. So far so good, but what are the primitives telling us? Firstly Put receives the value and returns the corresponding entity. Slow down! What is returning what…? This is another trick that Free Monads deploy. If we want to “return” something, we have to use a function that receives the type we want to retrieve and returns the next primitive. In this context, “return” means that the value will be attached to the lhs names in the do notation. Secondly, Update receives the entity and the new value to set. Because Update returns void, we can use the raw next as the final parameter, instead of a function, as Put did. Thirdly, Get takes a reference and returns the stored entity. Finally, Delete takes the reference to the entity to be removed.

Free Monads require the language, that’s to say Action, to be a valid functor. We can do that easily with the next lines:

instance (Functor (Action e)) where
  fmap f (Put v g)          = Put v (f . g)
  fmap f (Update ev v next) = Update ev v (f next)
  fmap f (Get ev g)         = Get ev (f . g)
  fmap f (Delete ev next)   = Delete ev (f next)

The pattern is very clear: if our primitive returns something, we have to combine the function to map with the function that returns the something. Otherwise, we just invoke the function over the “next” field. This could result unclear if you’re not familiar with Free Monads, so I recommend you to take a look to the article that was mentioned in the first paragraph if you feel curious about the inner details.

Next step consists of adding an alias for the Free type -which is parameterized with the functor and the result type. By adding it, we aim to reduce the Free Monad noise that otherwise will be spread along the source files. Since we are going to develop programs, we can use exactly that name.

type Program e = Free (Action e)

Remember, ‘Free’ takes two type parameters. Thereby, our alias is a type constructor that needs the program result type to be concreted. Now we need a way to lift our primitives to a monadic version, by means of liftF (from Control.Monad.Free), whose signature is as follows:

liftF :: (Functor f) => f r -> Free f r

It knows how to lift functors into free monads. For our context, that’s the same than turning actions into programs:

put :: (Typeable v) => v -> Program e (e v)
put v = liftF (Put v id)
update :: (Typeable v) => (e v) -> v -> Program e ()
update ev v = liftF (Update ev v ())
get :: (Typeable v) => (e v) -> Program e v
get ev = liftF (Get ev id)
delete :: (Typeable v) => (e v) -> Program e ()
delete ev = liftF (Delete ev ())

As you probably guess, this functions are the ones that we employ inside the do notation of our ZWI.

Finally, once we’ve deployed all our primitives, we can start to declare higher order abstractions such as modify, which is just a combination of get and update, and retrieve, which achieves a get and applies a transformation to the resulting value before returning it:

modify :: (Typeable v) => (v -> v) -> (e v) -> Program e v
modify f ev = do
  v <- get ev
  let newv = f v
  update ev newv
  return newv

retrieve :: (Typeable v) => (v -> a) -> (e v) -> Program e a
retrieve f ev = do
  v <- get ev
  return (f v)

We don’t need to know many things but the existence of primitives to be able to implement a derived action, because we can use the do notation (while being unaware of the concealed Free structures) to create this kind of subprograms, that can be embedded in another program. Hey, this is just one of the monadic laws!

Conclussions

I’m going to split my conclussions in two parts: Haskell and Free Monads, mainly because I am a newbie in both fields.

Firstly, I start talking about my thoughts on Haskell -it may worth to take into account that my nearest background is Scala. This language is purely functional and you rapidly notice that is ready to deal with Functors, Monads, Typeclasses… without extra effort, and that implies clean code. The most difficult thing for me was to forget the OO class idea and start to think in type and data constructors. Indeed, I was quite blocked until I found the Action implementation which uses existential types. By the way, any suggestion to avoid this would be really appreciated. Eventually, I think that the “curryfication” of types is incredibly nice, if we compare it to its Scala counterpart, for example.

Finally, Free Monads are a very powerful tool. From my point of view, the major part of that power is reflected in the separation of concerns. We have implemented the structure of an algorithm which doesn’t have an interpretation associated! We’ll face the interpretation of the programs in the next posts. If you feel curious right now, you can find all the source files in my github account where there is an interpreter that employs the State Monad to keep the underlying state. On the other hand, derived actions turn to be great to shorten the algorithm number of lines.

Macro Annotated Tuple

Scala is a very expressive programming language. Most of time few lines are good enough to suit the programmer needs. However, it is not always possible to escape situations where boilerplate is unavoidable. For instance, I firmly believe that there must be a prettier implementation for Scala tuples (Tuple2, Tuple3, …, TupleN). After playing with the new metaprogramming features for a while, especially Macro Annotations from Macro Paradise, I found that these tools could be useful to reduce this kind of boilerplate. From now on, I will be telling you my modest attempt to deploy the minimal implementation for Scala tuples.

What’s the Problem?

Well, the problem is quite straightforward. This is annoying boilerplate:

class MyTuple1[A](val _1: A)

class MyTuple2[A, B](val _1: A, val _2: B)

...

class MyTupleN[A, B, ..., N](
  val _1: A,
  val _2: B,
  ...,
  val _n: N)

The involved pattern is very clear (MyTupleX requires X type parameters and X accessors) and that leads to a sequence of repetitive stuff. There’s no place for such an ugly code in a language which aims expressiveness.

Macro Annotations to the Rescue

Macro annotations from macro paradise are empowered to generate new types, in contrast with the classical def macros. This opens a huge range of new possibilities such as transforming the annotated type itself or even providing a brand new companion object for it, whose contents could depend on the original type. How can we use them in the tuple scenario? My approach consists of using a class as a kind of type pattern, which will conform the macro input. Then the macro will generate all the required Tuple types. This is easier to understand by showing the pattern:

@expand(30) class MyTupleα[Tα](val _α: Tα)

As you could imagine, expand is the name for the macro annotation. It takes a number as input, which corresponds with the number of types that we want to generate for the pattern (in this case MyTuple1, MyTuple2, …, MyTuple30). Then we can see the pattern, which uses an alpha to represent the variable positions. This character will be replaced in the sequence of types that will be expanded at compilation time.

I’ve just uploaded the sources to github. If you are used to def macros you will notice that the way to implement macro annotations is not very different, once you’ve declared the header:

class expand(val bound: Int) extends StaticAnnotation {
  def macroTransform(annottees: Any*) = macro Macros.expandImpl
}

In fact, this is just an annotation definition which contains a def macro inside. That definition needs to take a sequence of annottees. For our case, you can think of it as a sequence containing a single element: the type pattern.

The macro implementation is partially shown in the following snippet:

def expandImpl(c: Context)(annottees: c.Expr[Any]*): c.Expr[Any] = {
  import c.mirror._
  import c.universe._

  val q"new expand($b)" = c.prefix.tree
  val Literal(Constant(bound: Int)) = b

  // ...

  val classes = for {
    i   } yield q"""
    class ${newTypeName(name.decoded.replaceAll("α", i.toString))}
      [..${newtparams(i)}](..${newparams(i)}
    )"""

  c.Expr[Any](Block(classes, Literal(Constant(()))))
}

The most interesting thing to remark is undoubtly the use of quasiquotes (q”…”), both to parse the upper bound and generate the output classes. You may not be able to appreciate how amazing is this new feature unless you had to deal with the old manually generation of Scala ASTs. Really. Once we’ve made the new types, we just return them inside a block.

Conclusions

Macros are icebergs that hide the expansion encodings real size under the surface of the sea. In this case, we used them to generate a sequence of types whose size can be chosen by the macro user. By adopting this approach, the maintainability is clearly better.

Honestly, the current implementation to generate the tuples is very ad-hoc, but I think that the pattern idea could be generalised to deal with more complex structures than the one shown in this post. By doing so, the number of lines required to implement this kind of boilerplate, that we can find in Functions as well, would be reduced to a minimum.

Speech… What?

There’s no other way to start this blog than talking about Speech. Well, you may be wondering what it is. Indeed it’s a difficult question to be answered in a few lines, but it’s the project where I have been working on the last three years. As I have said, providing the best description for its understanding is not always straightforward, but here’s my modest attempt: “Speech is a programming language that empowers programmers to develop social web applications in a fast and easy manner.” But hey, hold on! I know what you’re thinking. This isn’t about social networks. It’s about developing applications for the society. Why another language? This question is not as difficult as the first one. We are confident that new abstractions are required to increase the productivity remarkably. In fact, you can find some applications on github to see it (I recommend you the Twitter one).  That’s it for today! If you want to read more about the topic, please refer to the Speech official blog. From now on, I’ll be telling what does this project mean to me. And that lead us to Habla Computing.

Habla Computing is the startup where I work. That’s the place where I learnt that Java was not modern at all (what a surprise!), that you can program with immutable variables, that macros are not always as dangerous as you saw in the university and that there’s a community with amazing hackers ready to answer your weirdest questions. As you could guess, I’m talking about Scala, but it’s not only about this great programming language. This is just an entry point for one to find out that there’re other choices. Better choices. At every level. However, don’t misunderstand me. It wasn’t a smooth process. Getting Speech embedded in Scala was really hard for every member of the team. Anyway, we made it and it has been a worthwhile.

Perhaps you didn’t notice it, but this was just an introduction post. An unusual one, I guess. But it tells my nearest roadmap, the one that led me to restart my PhD studies. Didn’t you know that this is about a doctoral thesis? Then, read the about notes. See you soon.