Right Brain Meme Pie

Eric B Merritt - CTO of Afiniate, Inc - is a veteran entrepreneur, author and public speaker. Eric is expert in the architecture, development and deployment of large-scale distributed systems on heterogeneous hardware, and the languages and platforms required to support them. His experience spans from IBM mainframe and mid-range systems, to distributed build systems and massive fleet deployment tools for Amazon.com, and to high-frequency trading systems and financial exchange systems for leading private brokerages. Eric is a co-author of the popular book “Erlang and OTP in Action”.

Differences between Joxa and LFE

In the last few days I have gotten the question ‘What are the differences between LFE and Joxa?’ quiet a few times. So instead of answering them individually I thought I would write up the differences here.

Goals

The primary and most important difference is in the Goals of the two languages. I believe that the primary goal Robert had in mind when implementing LFE was to provide a mutable and syntax extensible version of Erlang. This would allow people to change the language where they needed to. Also I suspect, very strongly, that Robert likes implementing languages and he had a lot of fun implementing LFE. I certainly did with Joxa. However, I had some other very specific goals when I sat down to create Joxa.

  1. I needed a platform for the development of Domain Specific Languages.
  2. I wanted a more iterative and dynamic development environment. Something on the order of Slime and Swank.
  3. I wanted to leverage all of the rather awesome lisp tools that are out there.

Each of these things could have been solved in Erlang. For example, I could have implemented each language using leex and yecc. However, my best experience with DSLs has always come from Lisp and building those DSLs via functions and macros in Lisp itself. However, I have been using Erlang for a long time and I was very unwilling give up the features of the Erlang VM to get those advantages from Lisp. The only solution seemed to be using a Lisp on the Erlang VM.

The obvious first choice was LFE. So I spent several weeks digging into the language and its internals. At the end I decided it did not suit my purposes and the only fallback was to create a language of my own (there was a bit of sanity questioning involved as well).

With that in mind lets enumerate some of the major differences.

Lisp 1 vs Lisp 2

Simply put, LFE is a Lisp 2 while Joxa is a Lisp 1. According to Richard P. Gabriel, the lisp 1 vs Lisp 2 is defined as follows:

Lisp-1 has a single namespace that serves a dual role as the function namespace and value namespace; that is, its function namespace and value namespace are not distinct. In Lisp-1, the functional position of a form and the argument positions of forms are evaluated according to the same rules.

Lisp-2 has distinct function and value namespaces. In Lisp-2, the rules for evaluation in the functional position of a form are distinct from those for evaluation in the argument positions of the form. Common Lisp is a Lisp-2 dialect.

To give a practical example of the above description lets say that you have a function called hello-world that returns an atom hello-world. To define and call that function in Joxa you would do:

(defn hello-world ()
    :hello-world)

(hello-world)

To do the same thing in LFE you would do the following:

(defun hello-world ()
   'hello-world)

(: hello-world)

note: In LFE the : serves the same purpose as funcall in Common Lisp.

The Lisp world has been arguing about which is better since the dawn of the universe. It very much depends on personal preference. For me, I find Lisp 2 to be very unnatural and counter intuitive nearly to the point where I wont code in it.

Erlang Systax vs Lisp Syntax

LFE is, as its name implies, is a Lisp version of the Erlang Language and whose intention is to provide a Lisp very close to Erlang and it’s Semantics.

Joxa has no such intention. It is a unique language that happens to be targeted at the Erlang VM. It makes no effort to provide Erlang, Common Lisp semantics. The entire goal is to provide a tight, small well understood functional language with a clean approachable syntax that allows for the use of Lisp style macros.

As you will notice in using Joxa, there is very little in the way of Erlang syntax and even less of Common Lisp. Some syntax for declarative data structures has been pulled over but very little more. Most of the syntax comes from Clojure and Scheme.

Evaluated Macros vs In System Macros

In LFE macros are evaluated in LFE itself and not with the Erlang VM. This means that macro evaluation has different semantics then function evaluation. In LFE macros are not simply functions that run at compile time like they are lisp. The are special things that have their own evaluation semantics different from those of normal functions. So not only must you keep in mind the normal compile time vs runtime semantics you must also keep in mind the function vs macro semantics. I believe this is a hindrance to the easy use of macros.

For that reason Joxa takes an incremental approach to module compilation that allows macros to be evaluated on the VM in the exact same way as functions and there is no need at all to worry about differences in the evaluation environment between normal functions and macros. Since one of the important goals of Joxa is explicitly supporting DSLs this unified evaluation environment for functions and macros is quite important.

Language Bootstrapping

I am the co-founder of a startup called Afiniate and we are basing important aspects of Afiniate’s business on Joxa. It is very important to us that Joxa be both well tested and stable. To that end the language has been bootstrapped on itself. That is, Joxa is actually implemented in Joxa. This allowed us to test the system and work out many problems very early on. This also serves as an important litmus test for new features and changes. This litmus test caches many types of problems before they ever leave the developers desktop. I believe that this bootstrapping is a fundamental requirement for any language that will be used in production systems.

Joxa is also extremely well tested. As I said we are basing an important part of our business on this platform and we must ensure, as much as possible, that we can iterate on the platform quickly while retaining its stability.

How to Manage Erlang/OTP Releases

It took me quite a few years to arrive at an optimal mental model for thinking about the Erlang, ERTS and releases in the context of an operating system. It is a different enough method of thought that its worth talking about.

I think most folks have not yet come to a fundamental realization when it comes to Erlang, or more specifically, ERTS. That realization is that ERTS is a Virtual Machine in the classic sense. That is, it views each release as a complete self contained ‘machine’. That concept is central to the way it expects releases to be organized and managed.

Once you get your mind around this idea that each release is a self contained machine, things begin to make more sense. For example, why are there only version numbers and not names in release directories and tarballs created by sys_tools? This is because an Erlang Release is a self contained universe, a complete machine Virtual Machine, sort of like an install of Linux. So asking why you cant have multiple releases in the same node is very similar to asking why linux only has one rc.d directory. It only has one because in the context of a machine only one ‘operating system/bootstrapping system’ makes sense.

So how do you integrate this very different module into the Unix-y way of doing things? The simple answer is that you don’t.

Approaches to Managing a Release Install

You can take two approaches when it comes to installing a release on a Unix box. You can go through the efforts of splitting out the release into its constituent applications, installing those applications separately so that nothing is shared. Then you can come up with a scheme for your release metadata so that they do not collide. After that you can shoehorn all this into whatever package management system exists on your preferred platform. The question I have with this approach is simply why? To retain some idea of the purity of the Unix model? so you can save a few tens or hundreds of megabytes of disk space? With this approach you will be constantly fighting ERTS, coming up with ways to get around the Virtual Machine and what it is natively expecting. In the end, You will be coming up with all sorts of ways to create unanticipated pain for yourself with no real gain.

The alternative is that you treat an Erlang/OTP release as the VM expects. You could use the tools distributed with Erlang to create and handle releases. You could treat a release tarball as a single distributable thing. You could even take this to the logical extreme and, if you are targeting homogeneous hardware even include the ERTS binary in that tarball so that everything is completely self contained. The whole reason ERTS is structured in this manner is due to the problem it was intended to solve: providing a way to make a self contained system that could be installed trivially on target machines.

So taking advantage of this model, You end up with a system (in either /opt or /var) that looks something like this.

/opt/erts//-*

That is, you have a separate directory structure to handle Erlang Releases, some where on your system and each release is in its own specific location.

Lets look at an example. Lets say that I have a release foo. On your target systems you expect to have versions 0.1.0, 0.2.0 and 0.2.1. You also have the release bar with versions 0.1.1 and 0.1.2. Then your tree might look like:

/opt/erts
    |-- bar
    |   |-- 0.1.1
    |   |   |-- ....
    |   `-- 0.1.2
    |   |   |-- ....
    `-- foo
        |-- 0.1.0
        |   |-- ....
        |-- 0.2.0
        |   |-- ....
        `-- 0.2.1
            |-- ....

Where the ellipses identify all the files and directories relating to the release.

Your chef/puppet/management scripts end up being very trivial, simply untaring releases into that version scheme and starting up the self contained releases in those directories, then running the relevant startup commands for ERTS.

This becomes an even more important factor when you start looking at hot code loading and live upgrades with Relups. While I don’t recommend that this live upgrading facility for anything but the most trivial projects or those projects where extremely high up-time is worth the monumental costs of getting it right, it still is there and relies on this well understood layout to function.

What You Gain

  1. Each release is self contained and has no dependencies aside from OS dependencies
  2. You can trivially roll forward and backward with no worries about version conflicts or mismatches.
  3. Its what Erlang/OTP expects, its what the build systems based on Erlang/OTP expect, you are saving yourself trouble by following the garden path.

What You Loose

  1. Disk space. You have possibly redundant information in each release dir
  2. You cant use the native package management system
  3. Peace of mind. Its not the Unix-y approach

Conclusion

Unless you have a very good reason not to, I suggest you embrace the Erlang approach. Its simple, straight forward, easy to understand and even easier to manage. It has few downsides and major wins for your infrastructure management and deployment.

Team Development with Git

The process of collaboration and development on a project is important. The means by which you get that code into a deliverable state matters and matters a lot. Now the method of delivery and what constitutes a deliverable state changes from company to company, team to team and even project to project but there is always a point at which you want ‘make something available’.

The process you use to get to that point should involve a good balance of time to market and quality. While you don’t want to make the code so perfect that you never deliver it or you deliver it so late that it ceases to matter, you also do not want to deliver code that is so poor and has such high maintenance costs that it doesn’t actually solve the problem its designed to solve. You must strike a balance between the two and using good git principles along with a little bit of sound basic engineering practices you can get there. With that in mind, I am going to discuss my preferred team development model with Git, along with a few additional changes if you are using Github.

The Model

  1. There is a single repository that serves as the Canonical Repository (the main store of code that will be delivered).
  2. Each developer has a clone of this repository that they use for development on their local workstations. If they are not on the same network then they may have a remote clone that serves as a means of collaboration with their peers (ie GitHub).
  3. When code is complete in the Developer’s workstation the Developer announces the fact.
  4. Another team member pulls the code, reviews, compiles and tests it.
  5. If any stage of the review fails, then the review pushes back to the originating developer.
  6. If it passes it is signed off and pushed to the target branch of the Canonical Repository.

The Canonical Repository

The Canonical Repository is the repository we deliver code from, it serves as the central reference point, the tip of development for the project. No one pushes Work In Progress (WIP) code to this repository and no single person owns it. It exists to hold the history of the ‘thing’ that is currently in production or that will be in production and it is the point that Developers rebase their work in process code on.

There is a bit of a mind hacks going on here that I encourage you keep intact. That is the ‘sacredness’ of canonical. It should never become something that a developer pushes too has part of his daily workflow. It should always carry the sense that things that go into it are important. In my experience this vastly reduces the screw-ups, bugs, bad pushes etc that can be so painful to a team. It also helps encourage both the developer and the code reviewer to take their job seriously without forcing a lot of process onto them. That lack of strict painful process is what makes this approach so powerful.

The Developers Repositories

Developers work in clones of the Canonical Repository. They create branches, work on experimental changes, code to meet the requirements of the system and collaborate with each other through these clones. If they are all on the same network then they setup the Git Server on their development boxes and push and pull clones directly to and from their team mates. If they are in different parts of the world or simply on different networks they may have to use some intermediary like Github to share their code. In that case, they replicate to their github clone and collaboration goes on through github. Of the two approaches I prefer addressing my peers peers repositories directly. It removes an unnecessary step from the process and makes it that much easier and less error prone.

There is a second big mind hack going on here. That is the fact that nothing in the developers repo matters until the developer says it matters. We want the developer to be productive, we want him to use the tools that work for him and the process that is most comfortable for him use to produce code. It doesn’t matter if the rest of the team uses emacs and he uses vim, it doesn’t matter if he wants to use OSX and the rest of the team uses Centos. What matters is the code he produces meets the standards the team has set, through previous reviews, automated test suites and the like.

We want him to explore freely. We want him to be the most productive that he can be using the tools that he is comfortable with. We don’t want him to worry about what impact that exploration will have or if someone is going to be looking over his shoulder trying to validate the quality of code that literally wont matter until and if it makes it into canonical.

How Code Gets Into Canonical

We don’t impose tools, or process on developers in their own repo. They can code using any process they would like, using any editor, compiler, platform etc. It doesn’t matter in the least as long as the output meets the teams standards. Code that exists in the developers repo is kind of like Potential Energy. Potential Energy is impotent, has no interaction with the world around it until it becomes Kinetic Energy. Once that Potential Energy becomes kinetic it has the potential to change the world. Code in a developer’s repository is very similar. It has no impact on the world (ie the project/team/organization) until the Developer(s) feel that it is ready, until the developer explicitly converts it to Kinetic Energy. Lets talk a bit about the process used to convert that Potential Energy to Kinetic Energy.

There are always two parties to the process. The Developer or Developers producing the code and the engineer that’s going to review/validate that code. Lets get started with the producer side.

The Developers Responsibility

The Developer puts the completed code onto a dedicated branch in his repository and refactors it to meet the standards of the project. He should ensure that the following invariants hold.

  • The commits are small, self contained and well named. If you have not already take a quick look at my previous posting on Git Commit Hygiene.
  • The code follows the coding conventions of the project (these should be lightweight and non-intrusive).
  • That the code is good, in the eyes of the code reviewer. Functions are not to long, modules are focused, etc. Basically, that the code follows normal practice for producing good code.
  • The code compiles on the target platform.
  • All the tests in the system pass. (This should be a project invariant).

It could be that you have a convention that code ready for review goes onto a branch specifically named, perhaps something like ‘reviewable’ or ‘rv’. It could also be that the Developer lets his team mates know what branch the code is on when he makes the announcement. In either case, once code is ready he makes an announcement letting the team know that fact. Usually this is in the form of patches sent to a mailing list, or a github pull request. The mechanism doesn’t actually matter so much as the fact that an intentional announcement with a short description is made to the team working on the project.

Once the Developer announces it, a team mate should be selected to do the review. There are a ton of ways to select a reviewer to handle shepherding the code into canonical. In my experience the best way to do it is just let the code reviewer self select. As long is its not always the same person stepping up things should be fine. If you have put together a good mature team this approach is, by far, the best. Other ways to do it, is just to randomly assign a team mate, or you could also have the Developer producing the code to pick the team mate that is going to review. In the end how the Reviewer is selected matters a lot less then the fact that one is selected.

The final responsibility of the Developer is to make sure make sure the code gets reviewed. The longer code sits out without review the more likely it is that bit rot will occur, that merging into the Canonical branch will become painful etc. So the Developer needs to do what is necessary to make the review happen in a reasonable time frame.

The Reviewer Responsibility

The Reviewer’s job is to review the change to validate that it meets the project standard. This comprises a few things.

  • Look at the change itself to make sure it does what its purported to do and meets the standard of the organization.
  • Make sure that the Developer’s changes rebase or merge cleanly with the Canonical branch that is being targeted.
  • Make sure that the code compiles cleanly and all related tests pass.

If any of these steps fail the Reviewer pushes back on the original Developer to make changes and finish the work. This may take several iterations depending on the quality of the Developer writing the code. That is completely fine. This is normal development, we are engineers not code monkeys and we want to get the code as right as reasonably can. The above steps are the minimal steps that need to be done ensure this standard. Sub-par code should never make it into Canonical simply because the developer is annoyed with the process or the or timelines are tight. You invariably pay more in the long run by giving in to these pressures then you gain in the short run.

Once the change is reviewed and accepted the reviewer signs off on the code. Yes, we expect the Reviewer to put his name on it and take responsibility for the fact that that code is in Canonical. If the code breaks the build it should be embarrassing to the Reviewer.

Once The Change is in Canonical

Once the code is in Canonical and out of the Developers repository, history revision, commit amends and the like should all stop. At that point your team has people depending on that code and changes to history become painful. If there is a bug or a fix that needs to be made it should go into a new commit and go through the process previously outlined.

Wrapping Up

This process is actually very smooth, but it does assumes a few things:

  • You have decided to adopt the process as a team.
  • You have a reasonably competent engineers.
  • You have at least some tests for your system (if you do not run, don’t walk away from that system).
  • You have some standards for the codebase that should be applied.

If these things are true then you are golden. It may take a little bit of time to get a feel for the process and work around the little hiccups that will inevitably occur. However, with competent engineers the process should be flowing smoothly after a few weeks.

Things to Watch Out For

This is not intended to be a rigid process. The only place that real process actually comes into it is in the transition from the Developer’s repository to the Canonical repository. This is true by design. Its whole purpose is to encourage the wild and wooly exchange and growth of ideas, creativity and productivity in the project wherever that is possible while providing a bit of rigidity and discipline only when it is required. Getting that balance right in such a way that you get both the most creativity and productivity possible and the most maintainable, well founded code while at the same time keep the team happy is the goal. This is what this process seeks to encourage.

One of the ways I have seen the process become founder and become rigid is in respect to a team that does not know git well. This process takes quite a bit of comfort with Git to work. Your developers need to be comfortable with local and remote repositories, rebasing, merging, pushing and pulling from peers, signing off on commits etc. If you have new or incompetent developers you may be tempted to wrap all of these steps in scripts that, apparently, take the need for knowledge away from your developers. However, these scripts don’t actually remove the need for knowledge of git, all they really do is delay that need slightly. What they actually do is encourage the team to avoid learning git, while at the same time spending an inordinate amount of time fixing problems when the scripts inevitably break. At the same time they lock down and rigidify a process that should, by its nature, not be rigid.

Go ahead, bite the bullet and give the team the time and resources they need to learn git well. It will take a few weeks and you will take a productivity hit. However, in the long run you will make that back many times over. If you have team members that either do not, can not, or will not pick up git for whatever reason. Well maybe its time to consider those folks as bad hires and let them go. The team will be better off for it in the long run.

The other big thing to watch out for is the urge to bypass the process. Sometimes when you are in a tight situation it can be tempting to want to push code directly to Canonical, bypassing the review. For example, you might have a bug in production lets say your system is down and its costing the company a million dollars a minute. Pressure is high and you may have a huge urge to tell the devs (or yourself if you are the dev) that the process would only slow you down and the code needs to get out right now. This is almost invariably a bad decision. This process should only take a few minutes assuming the code is good. The likelihood that the code is bad in that situation is high and its much cheaper timewise to catch those problems in the review cycle then to deploy the code and realize in production that you fixed one bug but introduced another.

Commit Hygiene and Git

I have a very strong standard for commits when it comes to git. In general, commits should contain one unit of change and one unit of change only. When looking at a Git log you should see a very clear, linear history of change. A history where each commit contains a single change has a good short commit line explaining what the commit contains and a complete commit detail containing a description of the whys of the commit.

I get a fair amount of flack on this from many of my peers. They tend to see Git as simple a place to store code or as an audit trail, like most non-distributed version control system. This opinion just does not serve in most cases. Commit hygiene is an import part of development using Git. In the same way that readability and factoring is important to code. It takes work to do it right and the benefits may be intangible, but in the long run its well worth it.

Why is it Important?

It takes time and effort to clean up your patches and get them into a publishable, well factored state. Effort that many people don’t want to spend. If thats the case why do it?

Do it so git revert and git bisect will work correctly

In the alternate case, where change is spread willy nilly accross multiple commits git bisect will not work correctly. That is, you will not be able to test each patch on its own, nor will you be able to identify easily all the patches related to the problem. You will end up digging through the commits in this specific deployment looking for each thing that might have caused the problem and reverting that. Of course, because change is spread willy nilly around you will end up reverting things you do not wish to revert and that will cause other problems. In reality what you will probably do, is spend some number of hours trying to get a fix in place that will allow things to run, push those fixes and hopefully come back at some point in the future to revist your fixes.

Lets say you roll out a new set of patches one of those patches contain an error. In the case of well defined well factored patches where related change is part of the same patch it is fairly trivial to run git bisect against your repo find the offending patch, then do git revert on said patch and redeploy your system.

Do it so code reviews are easier

A single patch focused on a single set of change is much easier to review and comment on then a single change spread over a number of patches or a single patch that contains a large number of unrelated changes.

Knowing what reason generated a patch (a ticket, a story, a customer request) and being able to tie the change that accomplishes that reason to the reason itself goes a long way towards making your patch comprehensible to the reviewer.

Do it so that the change going on in the system is obvious

The other reason to factor your changes before they make it into production is simple higene. That a person looking at change sets in linear temporal order has a good idea of what is changing in each patch. That is, that each patch presents a good self contained step in the march of code over time.

This unfortunately, is much harder to justify then the first point. Much like refactoring its one of those things that you can point to and say “This is a Good Idea”, while beeing unable to give hard and fast reasons for that. There is no way to say that a clean commit history is going to save you 20% of your time on maintenance, or that its going to help you get to your next milestone 5% more quickly.

In my opinion, it will help you in maintaining your code base, in understanding why a particular change occured and what the steps leading up to that change will be. Also the discipline of patch higheene will help you focus on the change that needs to occur for a particular. I consider all these useful results for very little cost.

What is One Unit of Change?

What exactly is one unit of change? The answer is of course, it depends. The rule of thumb is that if its directly related to the change you are actively working on, that is it can be tied directly to what you are holding in your head, then its probably part of the same change.

However, if you see a bug or refactoring opportunity while doing other things, that’s not part of the change. If you have two things in your head and are kind of working on them at the same time because they are somehow related in your mind those also are not part of the same change.

Much like with refactoring you will, over time, develop a sense for what a good patch should look like and what you should be publishing.

When to Practice Commit Hygiene

There is one right answer to when to practice commit hygiene. That is always ‘before the code is published’. Its very painful for your consumers if the git history changes after they have pulled from your repo. To avoid this, you want to make sure that you don’t refactor after you have pushed to the canonical repo.

One thing to be aware of is that the word publish can have several meanings. In the case where you have a canonical repo where people expect to find the latest released code, in time you publish there it locks you from further refactoring. But lets take a slightly more nebulous case. Lets say that you and a peer are working together on a piece of task generating commits for the purpose of sharing code. Many of these commits are going to be temp commits, or commits that only exist so that the in-process code can be shared. In this case, you also don’t want to refactor while you are in the process of building. You have a consumer after all, your peer. However, once you and your peer have your work done but before you publish your work to your canonical repo or your peers it would be a very good idea to for you and/or your peer to sit down and work on refactoring the commit stack, practicing a bit of commit hygiene before you publish your work. You should consider that the last step in your build process. As a note, once this in done you should do any future work on top of that newly refactored branch.

How to Practice Commit Hygiene

The key is the Interactive Rebase in git. This is the tool you will use to clean up and manipulate your git history. I wont go into too much detail here, but I will refer you to many other good discussions on the subject.

  1. http://codeutopia.net/blog/2009/12/10/git-interactive-rebase-tips/
  2. http://help.github.com/rebase/
  3. http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html
  4. http://blogs.gnome.org/alexl/2009/10/12/the-gospel-of-git-interactive-rebase/

Conclusion

In the end the thing that must be kept in mind is that git is a developer’s tool more than anything else. Not an audit trail, not a place to stick all the crap that we produce. It is a tool, but you need to follow some guidelines for that tool to be as useful as it can be.

Enhancing Lisp Flavored Erlang

I have a serious personal investment in Erlang as a language and a platform. I have written a book on the subject and spent a fair amount of my professional and open source life working in it. I have done all of that because I believe its a system that simply just works for much of the type of development that I do. However, I also have a secret long term love affair with lisp. Unfortunately, I have never really been able to use it in anything but toy projects. Recently I happened upon LFE again. I remember seeing it when it was announced and thinking it wasn’t quite baked, but that seems to have changed quite a bit. It may very well solve both my Erlang as an awesome platform and Lisp as on awesome language itches for me.

I decided that my very first project was going to be a Swank (Slime backend) server implementation for LFE. Think about it, you could code in Slime using Lisp on the Erlang platform. Thats pretty exciting to me. Unfortuanatley, I immediately ran into a problem. That is that Macros are accessable after compliation. That is they go away when the system is compiled and are no longer accessable. Well, that doesnt work in a system like slime whene there isn’t a lot of distinction between compilation time and run time (nor is it very acceptable in Lisp in general as the authors acknowledge).

Macros are not First Class

Macros are not first class citizens of the LFE world. That is they exist only at run time and they are not evaluated by the Erlang virtual machine, the are evaluated by the LFE build system itself. This means that macro evaluation has different semantics then the evaluation of code compiled and evaluated by the virtual machine in the normal manner. This forces the developer to make a distinction in his mind between things that will be evaluated at compile time vs things that will evaluated at run time. This is probably best illustrated by the ‘eval-when-compile’ construct.

A problem related to this is the fact is that, because macros only exist at compile time, if you want to use macros in another file you must import those files with the Macros, those macros are then evaluated by the expander in the namespace provided (along with its other functions). The fact that macros are always evaluated in the namespace of the macro caller provides its own set of problems.

All of these problems can be solved by compiling macros and making them first class entities in LFE.

Making Macros First Class

In the end, Macros are really just functions. They are functions that are evaluated at compile time, who’s input are lists that represent the AST of the program, but they are just functions just the same. So it makes sense that we could just compile them as functions and mark them somehow so that the system knows that they are macros.

So that is exactly what I am going to do. First and formost I am going to convert a specified macro to a function. Recursively expanding the macro (of course) until a specific form is completely expanded. Then convert that form into a ‘define-function’ expression. I have to mark this function as a macro, so the easiest way to do that is to keep a list of macros around for that particular namespace. When a macro function is created we add it to the list of names returned by that special function. So when the code generation occures the macro gets compiled into a function. We mark the function as being a macro by generating a special function in lfe compiled modules. I think we can call that function ‘macros’ (its simple enough after all), and when called that function will return a list of all macro functions in the system.

As we generate each ‘function’ be it an actual function or a macro as function we will go through a complete generation process. That is we will go from a sexpr form describing a ‘function’ to a compiled erlang function in the correct namespace. In this manner the full module will be available at compile time for every function compiled before the current function being compiled and/or expanded. That should give us the flexibility to just about anything we want, at the cost of additional compilation time. A nice fringe benefit is that we should be able to git rid of the eval-when-compile special form.

Implications of Compiling Macros

There are two big implications that compiling macros in this way entails. The first is that, if you have macros that depend on functions then those functions must be defined before the macro that calls them. It will become a very simple top down evaluation process, or at least, it will appear that way from the point of view of the developer.

The second is that macros can not be defined before the module definition, since in both Erlang and LFE the module definition be the first thing in the file. In current lfe code it seems fairly common for macros to come at the head of the file. Because macros will now be generated and compiled into functions in that modules namespace the must be defined after the def module call. Macros can be called before define module (as long as they evaluate in such a way that the define module call is the first expanded form in the file) but macros ahead of modules can only be called from other modules.

Future Directions

This is the first of several planned changes to LFE. My next project will be to complete the a swank backend for the language. Once that’s done I am actually going to try to switch over to LFE as my primary Erlang language. I suspect this alone will generate some changes as well. One thing I would really love to see is some more Clojure goodness in the language. Not a complete Clojure implementation, but Clojure does have a ton of great syntax ideas that we can borrow. However, those things are for the future. First and foremost, I need to finish macro compilation.

Building an Application with the ClojureScript and the Closure Library

Note: This was taken whole cloth from the Google Closure Tutorial and translated to (hopefully) idiomatic ClojureScript. I take credit as the translator. However, I can take no authorship credit. You can find the complete source for this tutorial on github.

This tutorial gives you hands-on experience using ClojureScript and the Google Closure Library by walking you through the construction of a simple application. To do this tutorial you should have some experience with JavaScript and Clojure. You should have already gone through the ClojureScript the setup process, as described on the ClojureScript Site or as I described in a previous post. This tutorial explains the different parts of these source files step by step.

The Notes Application

This tutorial illustrates the process of building a simple application for displaying notes. The example:

  • creates a namespace for the application,

  • uses the Closure Library’s goog.dom.createDom() through the ClojureScript dom-helpers functions to create the Document Object Model (DOM) structure for the list,

  • and uses a Closure Library class through ClojureScript in the note list to allow the user to open and close items in the list.

Creating a Namespace

When you use JavaScript libraries from different sources, there’s always the chance that some JavaScript code will redefine a global variable or function name that you, yourself, have defined in your code, creating a nasty bug. Clojure doesn’t have this problem and, by extension, neither does ClojureScript, though it compiles to javascript.

In clojure script we define namespaces in exacty the same way we do in Clojure. For our notepad application we can define the tutorial.notepad.note namespace as follows:

(ns tutorial.notepad.note
  (:require [tutorial.dom-helpers :as dom]))

Once the tutorial.notepad.note namespace exists, the example creates the initialization function in the Note namespace.

(defn init
  [title content node-container]
  {:title title :content content :parent node-container})

The note init function is now in the tutorial.notepad.note namespace created with the ns macro.

Creating a DOM Structure with dom-helpers

First and formost, go get the dom-helpers.cljs from the twitterbuzz sample app and stick it in your project. It has the twitterbuzz namespace and you probably wont want to keep that. I changed it to the tutorial.notepad.dom-helpers namespace and refer to it as such below.

To display a Note in the HTML document, the example gives the note namespace the following method:

(defn make-note-dom
  [self]

  (let [header-element  (dom/build
                     [:div {:style "background-color:#EEE"}
                      (:title self)])
    content-element (dom/build
                     [:div (:content self)])
    new-note (dom/build
              [:div header-element content-element])]

   (dom/append (:parent self)
          new-note)
  ;; Return an updated self object with the above declarations
  (-> self
      (assoc :header-element header-element)
      (assoc :content-element content-element))))

This function uses the dom-helpers function build. The following ‘require’ statement includes the code for this function:

(ns tutorial.notepad.note
  (:require [tutorial.dom-helpers :as dom]))

The require directive for the namespace macro in ClojureScript works exactly the same way as the require in Clojure. There is nothing further to worry about.

The function build in dom-helpers creates a new DOM element using a syntax very similar to that used in the Hiccup library. For example, the following statement from dom-helpers creates a new div element.

;; Remember we imported dom-helpers as dom
(dom/build
  [:div {:style "background-color:#EEE"} (:title self)])

The map in the vector that starts with the :div keyword specifies attributes to add to the element, and the vector is terminated by the child to add to the element (in this case a string). Both the map and the child specifiers are optional.

The make-note-dom function just makes a single Note argument. To make a list of notes, the example includes a make-notes function that takes a vector of note data, as a vector of maps, and instantiates a Note for each item, calling the make-note-dom function for each one.

(defn make-notes

[data node-container]
 (doseq [cont data]
   (let [self
         (init (:title cont) (:content cont) node-container)]
     (make-note-dom self))))

Using a Closure Library Class

With just a few lines of code the example makes each note a Zippy.A Zippy is an element that can be collapsed or expanded to hide or reveal content. First the example adds a new require element to the tutorial.notepad namespace. The namespace should now look like.

(ns tutorial.notepad.note
  (:require [tutorial.notepad.dom-helpers :as dom]
            [goog.ui.Zippy :as zippy]))

Then it adds a line to the make-note-dom function:

(defn make-note-dom
  [self]

  (let [header-element  (dom/build
                         [:div {:style "background-color:#EEE"}
                          (:title self)])
        content-element (dom/build
                         [:div (:content self)])
        new-note (dom/build
                  [:div header-element content-element])

        ;; NEW LINE
        zippy (goog.ui.Zippy. header-element content-element)]

      (dom/append (:parent self)
              new-note)

      ;; Return an updated self object with the above declarations
      (-> self
          (assoc :header-element header-element)
          (assoc :content-element content-element)
          (assoc :zippy zippy))))

The constructor call (new goog.ui.Zippy. header-element content-element) attaches a behavior to the note element that will toggle the visibility of content-element when the user clicks on header-element. For more information about the Zippy class, see the Zippy API documentation

Using the Notepad in an HTML Document

Here is the complete ClojureScript code for this example application:

(ns tutorial.notepad.note
  (:require [tutorial.dom-helpers :as dom]
            [goog.ui.Zippy :as zippy]))

(defn init
  [title content node-container]

  {:title title :content content :parent node-container})

(defn make-note-dom
  [self]

  (let [header-element  (dom/build
                         [:div {:style "background-color:#EEE"}
                          (:title self)])
        content-element (dom/build
                         [:div (:content self)])
        new-note (dom/build
                  [:div header-element content-element])
        zippy (goog.ui.Zippy. header-element content-element)]

      (dom/append (:parent self)
              new-note)
      ;; Return an updated self object with the above declarations
      (-> self
          (assoc :header-element header-element)
          (assoc :content-element content-element)
          (assoc :zippy zippy))))

(defn make-notes
  [data node-container]
  (doseq [cont data]
    (let [self
          (init (:title cont) (:content cont) node-container)]
      (make-note-dom self))))

We can’t embed ClojureScript in html like we can with javascript. This is a very good thing, but I digress. This does mean that we need to do things just a bit differently. In our case we are going to create another file in the notepad namespace called core.cljs that will define a main function for us. It basically takes the data required and calls the various note functions.

(ns tutorial.notepad
  (:require [tutorial.notepad.note :as note]
            [tutorial.dom-helpers :as dom]))

(defn main
  []
  (let [note-data [{:title "Note 1" :content "Content of Note 1"}
                   {:title "Note 2" :content "Content of Note 2"}]
        note-container (dom/get-element :notes)]

    (note/make-notes note-data note-container)))

;; Take note of this, this is how we call main at startup!
(main)

Including Kicking off the Process with HTML

With Google Closure and ClojureScript how we call the generated javascript depends on whether or not we compiled with the advanced optimizations. If you compiled without advanced optimizations (see getting started) then your html file should look like this:

If you compiled with advanced optimizations your html should look like this:

<!doctype html>
<!--[if lt IE 7 ]> <html lang="en" class="no-js ie6"> <![endif]-->
<!--[if IE 7 ]>    <html lang="en" class="no-js ie7"> <![endif]-->
<!--[if IE 8 ]>    <html lang="en" class="no-js ie8"> <![endif]-->
<!--[if IE 9 ]>    <html lang="en" class="no-js ie9"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--> <html lang="en" class="no-js"> <!--<![endif]-->
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">

    <title>Google Closure Tutorial for Clojurescript</title>

    <meta name="description" content="Demo showing off the ClojureScript hotness">
    <!--[if lt IE 9]>
    <script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
    <![endif]-->

</head>
<body>
    <header>
    <div id="notes"></div>

        <script type="text/javascript" src="js/goog/base.js"></script>
        <script type="text/javascript" src="js/deps/tutorial.js"></script>
        <script type="text/javascript">
            goog.require('tutorial.notepad');
        </script>
</body>
</html>

Note two things, the first is that we no longer need to include the goog/base.js script and the second is that we no longer need to have the explicit require in our html file.

Also note that we are including the script after the notes div has been defined. That’s rather important.

Putting It All Together

Go ahead and checkout the project from github.

$ get clone https://github.com/ericbmerritt/google-closure-tutorial.git
$ cd google-closure-tutorial

The Unoptimized Build

From the project root (where you should be right now). Run the the following:

$ ./bin/compile.sh

This does the unoptimized build. You may inspect the compile.sh script to see the actual command that is run. There is a very simple web server included in the distro. From the root again, do the following:

$ ./bin/webserver.sh

Now you can point your browser to http://127.0.0.1:8000/index-unoptized.html to see the result.

The Optimized Build

Again from the project root, run the command:

$ ./bin/compile-optimized.sh
$ ./bin/webserver.sh

This will do the exact same thing as the unoptimized, just with optimization turned on. Again, you may inspect compile-optimized.sh to see the command details.

Notes

I came a across a few realizations as I was working through this and thought I would share them with you. You should compile your ClojureScript with both advanced options on and off. Advanced options being on is going to help you catch a lot of errors at compile time that you would otherwise have to find at run time. However, its absolutely impossible to debug. So during development you will mostly be compiling without advanced options. Make sure, though, that you compile with advanced options on a fairly frequent basis. You will be happy you did in the long run. At the very least anyway you are going to want to compile with advanced options before deploying.

Setting Up Clojurescript

Setting up Clojurescript is fully described over at the Clojurescript Quick Start. What I am doing here is just reorganizing and rewriting to make things a bit more clear to someone that thinks like I do. While I was setting up clojurscript, I missed a couple of things as I was going along. So I thought it might be useful to others to go through the process for others. I work on linux, and this method will work there. I suspect very strongly that it will work on OSX as well.

Download Clojurescript

First and foremost download Clojurescript as below.

$ git clone https://github.com/clojure/clojurescript

Yes, you need to go ahead and do the git clone. Clojurescript is moving forward rapidly and will be for the foreseeable future. You are going to be updating the repo on a regular basis. So go ahead and clone it. I suggest you put in whatever working directory you use for projects. I tend to keep my projects in $HOME/workspace and that is where clojurescript lives on my box. This works out rather well because I expect to be contributing back. Hopefully, you will too.

Bootstrapping

You can bootstrap everything by running the bootstrap script in the clojurescript directory.

$ ./script/bootstrap
Fetching Clojure...
Fetching Google Closure Library...
Fetching Google Closure Compiler...
Building goog.jar...
Copying closure/compiler/compiler.jar to lib/...

it will pull down all the dependencies for clojurescript. Note that clojurescript relies on clojure 1.3.0beta1 (at the time of this writing). Its very probable that you are thinking about using clojurescript as the front end to a project that uses clojure as a back end. If that is the case its probably best to base your back end project on clojure 1.3.0beta1 or what ever happens to be the current version of clojure that clojure script uses.

Setup

If everything went well, the next thing we need to do is to setup the CLOJURESCRIPT_HOME env variable. This should point to where you installed clojurescript. If you remember I put my clojurescript in $HOME/workspace/clojurescript and thats where my CLOJURESCRIPT_HOME points.

$ export CLOJURESCRIPT_HOME=$HOME/

Of course, replace with actual location of your clojurescript repo.

Setting up the Paths

Finally we want to set up the paths. You can either set your system PATH env variable to include both $CLOJURESCRIPT_HOME/script and $CLOJURESCRIPT_HOME/bin or you can symlink the cljsc and repl scripts to a location already in your bin directory. So you might do

$ export PATH=$PATH:$CLOJURESCRIPT_HOME/bin:$CLOJURESCRIPT_HOME/script

for putting those two in your path. Or you might do

$ ln -s $CLOJURESCRIPT_HOME/bin/cljsc $HOME/bin/
$ ln -s $CLOJURESCRIPT_HOME/script/repljs $HOME/bin/

As you might imagine I have $HOME/bin already in my path. Either one works just fine, so its up to you. What you do not want to do is copy those scripts into locations in your PATH. Remember those scripts are going to change.

Try it Out

So lets try it out. Run the repljs, you should see something that looks like this.

$ repljs
#'user/jse
"Type: " :cljs/quit " to quit"
ClojureScript:cljs.user>

If you do, then WOOT! you are golden. If you don’t then you probably forgot a step or did something wrong (or something fundamental has changed since I wrote this). Go back and take a look and see what went wrong.

Compiling Clojurscript

Having a repl is pretty awesome, and you are going to use it; but what you really want to do is integrate it into your project. At the moment how you do that is going to vary a lot depending on your project. However, in general here are two ways to do that. The first is to use the repl, the second is to run cljsc. I tend to use the cljsc script because I can stick it in a script.

I am building a project in Google App Engine using appengine-magic. I would love to actually have a compiler built into leiningen. However, there are no lein plugins for clojurescript yet. So I have a little shell script the runs in the root of the project. That shell script basically does the following.

$ cljsc ./src '{:optimizations :advanced :output-dir "war/js" :output-to "war/js/myprojectname.js"}'

That will do the whole program compilation over all your cljs files. As soon as we get some lein goodness I will be migrating to that. I like an integrated build system quite a lot.