>You Are Here<

How to compile Lisp Programs into executable Perl

The operation of the compiler is actually very straightforward, thanks to the simplicity of Lisp and the power of Lisp macros. The compiler is compiled by itself (ie is self hosting) and so is written in Lisp.

The main function involved is 'compile-form' which is defined in 'core/compiler.scm' (search for '(define (compile-form' in this file). This knows how to compile 'nil' (as undef), numbers, strings and symbols. It also compiles lists as follows:-

  • First, the list is macro expanded
  • After macro expansion, if the list is a '(perl ...)' form it calls 'compile-perl-form'
  • If the list is an '(eval-when-compiling ...)' form it calls 'compile-eval-when-compiling'

'compile-eval-when-compiling' just compiles its argument in the normal way, then evaluates the resulting perl code and also returns that perl code as its argument. If this expression is encountered while compiling a lisp file, the perl code which is generated (and output as the compiled perl) will also be evaluated at compile time. This is needed in order to define macros and functions while compiling.

Compilation of '(perl ...)' forms happens like this:-

(perl nil nil nil ("(" "+" ")") nil nil nil (1 2 3))

...compiles to

(1+2+3)

You can try this by executing the 'rep.pl' script then typing:-

(compile-form '(perl nil nil nil ("(" "+" ")") nil nil nil (1 2 3)))

This will return a list of 2 items, the first being a string containing perl code. Ignoring all the extra 'nil's for a moment, the perl code is generated by outputing the first of the 3 strings in the 4th argument list, "(", followed by all the items of the last argument (1, 2 and 3) separated by the 2nd item in the 4th arg (+) followed, finally, by the close paren (last item of 4th arg).

This is very straightforward and amounts to little more than having lisp expressions of the form '(perl "print 'hi'")'. This results in a very very simple compiler (it's slightly more involved because it keeps track of needed symbols, lexical variables and the difference between expression and statement context -- but not much). Ordinarily, a programming language consisting of only one expression for embedding bits of code from another language would be fairly useless - you might as well type the embedded language in directly. However, thanks to 'defmacro' we can use this, very primitive facility, along with 'eval-when-compiling' to define the rest of the language, and produce a fully working and useful compiler, just from this very simple beginning.

core/macros.scm contains a collection of macros used to define the rest of the language, and is a good starting point for knowing what is available. 'defmacro' is defined in here too (using itself)!

The macro expander also expands any list which is not defined as a macro and not either a 'perl' form or an 'eval-when-compiling' form using a special macro called '*FCALL*'. This means that you can change the way function calls are compiled by redefining this macro. See misc/lisp2.scm for an example of this.

Datatypes

Lisp cons cells are represented as 2 element perl array refs. Thus:-

(1 2 3 4)

... is represented as

[1, [2, [3, [4, undef]]]]

This gives the expected lisp semantics.

Lisp strings are represented as normal perl scalars. So are numbers - this means that it can't properly differentiate between numbers and strings like most lisps do. 'nil' is represented by 'undef'. The perl concept of truth is used, which is different from the normal lisp concept.

Symbols behave properly - they are represented as references to scalars containing the symbols name. They are stored in a symbol table so that 'some-symbol always refers to the same symbol.