(car '(2 3))

Posted on 2011-05-07 01:10 by Idorobots

It's been a while since my last post, more than I'd wish for actually, and that is because I was hunting a man by the name Iain Buclaw on the IRC. I had to make sure he was ok with me using some of his work in this next project/post (and he obviously is, oh the kind gentleman he is).

Last time I shared some thoughts on latmab, a simple algebraic calculator, I have... commited and todays topic is somewhat related. It's a continuation I'd say, but let's go on with the story... There I was with latmab finished and this awful, awful feeling it was lacking. (Well, it's still lacking, and we're talking serious issues, not like "yeah man, it could use some random-feature", we're talking a fudge-ton of missing features.) Instead of doing the right thing to do and fixing all those issues I went "meh" and started a new project inspired by the man I've mentioned a few lines above.

It was a Lisp implementation. evilish music-laughter-thing Scheme to be exact, and this time it turned out quite well.

The funny part is I didn't have any previous experience with Scheme (or any other Lisp dialect for that matter), besides some general knowledge about it ("lol, a language made entirely of parens, that's dumb") and a bunch of myths I've heard ("lol, a language made entirely of parens, that's dumb"). With no experience nor proper time for tweaks (I had like five days to implement it before presentation) it was quite a challenge but Scheme turned out to be a simple, yet elegant and most of all pleasant to implement language. ~~You can get it here~~ (Page has been lost to the sands of graduation).

Let's take a look, shall we? Starting with the REPL:

//Interactive read-eval-write mode:
auto prompt = ">";

writeln(welcome, "Type '(?help)' for help.\n");
write(prompt, " ");

foreach(char[] input; stdin.byLine) {
    auto m = match(input.idup, regex("prompt\\s*=\\s*"));
    if(!m.empty && m.post != "")    prompt = m.post;
    else if(input == "quit")        break;
    else try {
        writeln("\t", interpreter.doString(input.idup));
    }
    catch(Exception e) {
        writeln("\t", e);
    }
    write(prompt, " ");
}
writeln("Wait! You forgot your parentheses!");

This time it's a little cleaner, I like it way more now. We have: The all usefull and tottaly justified feature - the prompt assignment. Next, the quit command. Could be done just like the other built in REPL commands, but meh, I found it nonelegant to exit violently out of a closure. Lastly, the eval-print part together with exception handling (pretty basic though, cause all it ever throws are syntactic/semantic error messages).

Speaking of builtins, this is how builtin REPL commands (and all the builtin functions) are added to the interpreter:

//A bunch of useful builtins:
void addFunction(string name, string stuff) {
    interpreter.add(name, delegate Cell(Cell[] args) {
        writeln(stuff);
        return interpreter.NIL;
    });
}

addFunction("?help", help);
addFunction("?usage", usage);
addFunction("?author", author~" ("~contact~")");
addFunction("?version", ver);
addFunction("?welcome", welcome);

interpreter.add("?list", delegate Cell(Cell[] args) {
    writeln(interpreter.symbols);
    return interpreter.NIL;
});

Using a local function and closures we add several simple, stuff-printing functions and one listing all available symbols. Looks great and there's not much code repetition thanks to the addFunction function.

This is how it looks in the Intepreter class:

/***********************************************************************************
 arithmeticOp
 Convinience template for creating new builtins.
 *********************/

Cell arithmeticOp(string op)() {
    return new Cell(delegate Cell (Cell[] args) {
        if(!args.length)
            throw new SemanticException("#[compiled-procedure "~op~"] needs at least 1 argument.");

        real result = to!real(args[ 0].value);
        foreach(arg; args[1 .. $]) {
            mixin("result "~op~"= to!real(arg.value);");
        }
        return new Cell(Cell.ATOM, to!string(result));
    });
}

global.add("+", arithmeticOp!("+"));
global.add("*", arithmeticOp!("*"));
global.add("/", arithmeticOp!("/"));
global.add("%", arithmeticOp!("%"));
global.add("^", arithmeticOp!("^^"));

/***********************************************************************************
 comparisonOp
 Convinience template for creating new builtins.
 *********************/

Cell comparisonOp(string op)() {
    return new Cell(delegate Cell (Cell[] args) {
        if(!args.length)
            throw new SemanticException("#[compiled-procedure "~op~"] needs at least 1 argument.");

        if(args.length == 1) return TRUE;

        real first = to!real(args[ 0].value);
        foreach(arg; args[1 .. $]) {
            if(!mixin("first "~op~" to!real(arg.value)")) return FALSE;
        }
        return TRUE;
    });
}

global.add("=", comparisonOp!("=="));
global.add("<", comparisonOp!("<"));
global.add(">", comparisonOp!(">"));
global.add("<=", comparisonOp!("<="));
global.add(">=", comparisonOp!(">="));

Since most of the arithmetic and comparison operators use the same pattern we create templetes, that make a use of this schema. [sic] Note that some of the operators listed here don't work entirely like they should and some of them aren't even supposed to be in the language. It's also worthy to note that it assumes real as the argument type as it's the only type supported - a major f*ck up on my side, deal with it. There are also name bugs in the power and equality operators. The templates use string mixins to mix the operators into the code, and... that's it. Pretty simple.

The Cell class represents a single lisp form, be it a list, a lambda or a number literal. The general idea for this class was borrowed from Iain Buclaws implementation. I wanted to go for a class hierarchy for the different form kinds (like expressions in matlab implementation), but this could have proven tricky in the evaluation part later on. This is the class:

class Cell {
    static const LIST       =   1;
    static const ATOM       =   2;
    static const SYMBOL     =   3;
    static const LAMBDA     =   4;
    static const BUILTIN    =   5;

    int type = 0;

    string value;
    Cell[] list;

    Builtin builtin;    //For builtins.
    Shell shell;        //For lambdas.

    ...
}

The type field determines the type of a cell, it's usually Cell.LIST for starters and mutates to other types when evaluating. This is the reason lisp is easy to implement - everything is a list, doesn't get simplier than that. The value field is the string representing the value of the sexp. It's usually a number literal, since there are no other datatypes implemented, or it's empty. Next, we have a builtin function closure used for builtins and a Shell reference for lambda functions defined during run time by the user. The shell is a reference to the scope in which the lambda was defined - the cell stores it so the local symbols are available for the lambda - it's supposed to be a closure after all.

class Shell {
    private Cell[string] symbols;
    private Shell parent;

    ...
}

The Shell class is fairly simple as well. It represents a scope and stores symbols paired with their definitions. It also stores a reference to the enclosing scope. Shell uses a couple of methods for the symbol access, most notably get():

public Cell get(string symbol) {
    if(symbol in symbols) return symbols[symbol];
    else if(parent) return parent.get(symbol);
    else throw new UndefinedSymbol(symbol);
}

If a symbol is absent in the scope of a shell the get call is redirected to the outter shell, or an =UndefinedSymbol= exception is thrown if there is no outter shell. Both of these classes have unittest blocks defined that use an updated version of the utils.testing module I presented a few posts ago. It's bundled up with the source.

The Interpreter class does both parsing and evaluating the scheme sexps. Strategy for parsing is similar both for files and the interactive REPL, it starts by statementizing the input leaving an array of statements, after that each statement is tokenized leaving an array of tokens and lastly the tokens are parsed to a Cell tree. All of those methods are rather boring, so I'll skip them.

Now that we have a lisp sexp parsed we want to evaluate it:

private Cell eval(Cell cell, Shell shell, uint depth = 1) {
    debug(25) writeln("eval: ", cell);

    if(depth == MAXDEPTH)
        throw new SemanticException("Aborting: maximum recursion depth exceeded ("~to!string(depth)~")!");

The eval method uses recursion to evaluate the sexp, and that's why it needs a depth protector. The little debug(25) statement is a great feature of D. The statements it encloses (can be a block statement) are compiled only if there was a -debug (-fdebug for GDC) passed to the compiler. Furthermore, the switch can be parameterized to pass a level of debugging which we wish to perform. For example, passing -debug=26 will compile all debug statements that are of a level 26 and lower. Neat.

    if(cell.type == Cell.ATOM) return cell;
    if(cell.type == Cell.SYMBOL) return shell.get(cell.value);
    if(!cell.list.length) return NIL;

The few simple cases are an ATOM, a SYMBOL and an empty list. The ATOM evaluates to itself, a SYMBOL has to be looked up in the shell first and the empty list is a NIL equivalent. (In many Lisp dialects, for example in Common Lisp nil, 'nil, () and '() all mean the same, what may seem confusing but is a well thought language feature, I'll write more about it some time.)

    if(cell.list[ 0].type == Cell.SYMBOL) {
        //Keywords:
        switch(cell.list[ 0].value) {
            case "begin":
                auto len = cell.list.length;
                foreach(exp; cell.list[1 .. len-1]) eval(exp, shell, depth + 1);
                return eval(cell.list[$-1], shell, depth + 1);
            case "define":
                auto symbol = cell.list[ 1].value;
                if(cell.list.length == 3) {
                    auto exp = eval(cell.list[ 2], shell, depth + 1);
                    shell.define(symbol, exp);
                }
                else shell.define(symbol, NIL);
                return cell.list[ 1];
            case "if":
                auto then = cell.list[ 2];
                auto els = (cell.list.length == 4) ? cell.list[3] : NIL;
                return eval((eval(cell.list[1], shell, depth + 1) == TRUE) ? then : els, shell, depth + 1);
            case "lambda":
                cell.type = Cell.LAMBDA;
                cell.shell = shell;
                return cell;
            case "quote":
            case "'":
            case "`":
                return cell.list[ 1];
            case "set!":
                auto symbol = cell.list[ 1].value;
                auto exp = eval(cell.list[ 2], shell, depth + 1);
                shell.set(symbol, exp);
                return cell.list[ 1];

            default: break; //Skip.
        }
    }

Next case is a list that contains at least one element. If that element is a symbol we need to dispatch it. There are just a few crucial Scheme keywords implemented, it lacks many, many more, and notably it lacks quasiquotation.

This branch uses a little feature of D that I like - switch statement allows strings in the case labels. It may seem small, it may seem not really usefull, but having so many little features, D takes programming to another level of comfort and elegance. I mean, look how elegant the idea behind this switch is (the code itself might not be, can't really tell), it's not cluttered with nested ifs and whatnot.

    Cell form = eval(cell.list[ 0], shell, depth + 1);
    Cell[] args;
    foreach(arg; cell.list[1 .. $])
        args ~= eval(arg, shell, depth + 1);

    if(form.type == Cell.LAMBDA) {
        auto inner = new Shell(form.shell);
        foreach(i, param; form.list[ 1].list) {
            inner.define(param.value, args[i]);
        }
        return eval(form.list[ 2], inner, depth + 1);
    }
    else if(form.type == Cell.BUILTIN) return form.builtin(args);

    throw new SemanticException("The expression "~cell.toString~" is not applicable.");
}

The rest of the function deals with lambdas and builtins. Lambdas need to be evaluated in their local scope with their enclosing scope kept in mind. Builtins are simply evaluated using their stored D closures.

The interpreter too uses unittests, in fact it makes quite a few assertions, but sadly I didn't have enough time to make it pass 100% of them. (Fookin 3.8% left...) The unittests were written by Iain Buclaw for his implementation, and I just borrowed them for mine, so all the credit goes to him. Thanks once again.

Test-run time:

Scheme interpreter v29.03.11 - a simple scheme interpreter.
Copyright (C) 2011 Kajetan Rzepecki nope.nope@nope.nope

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses.
Check README.txt and LICENSE.txt for more info.

Type '(?help)' for help.

> (define *PI* 3.14159265)
    *PI*
> (define *E* 2.71828183)
    *E*
> *PI*
    3.14159265
> (^ 2 (+ 2 (erf *E*)))
    Undefined symbol: erf in (^ 2 (+ 2 (erf *E*)))
> (define erf (lambda (x) (sqrt (- 1 (exp (* (- (* x x)) (/ (+ 1.27323981 (* 0.140012 x x)) (+ 1 (* 0.140012 x x)))))))))
    erf
> (^ 2 (+ 2 (erf *E*)))
    7.99933
> (?list)
Available symbols: [#f #t % * *E* *PI* + - / < <= = > >= ?author ?help ?list ?usage ?version ?welcome ^ append car cdr cons erf exp false length list nil null? sqrt true write ]
    nil
> quit
Wait! You forgot your parenthesis!

Next time on /Idorobots/: we learn why this Scheme implementation happened and... other stuff.

This Scheme implementation was based on an implementation by Iain Bucław. I didn't write it all by myself, blame it on my lack of knowledge, lazyness or no lisp experience at all. The next one will be ~~better~~ marginally better.

2016-02-20: Adjusted some links & tags.