(car '(1 2 3))

Posted on by Idorobots

So, I'm going to break the usual design-quirk rutine and actually show something fun this time and then we'll return to some OOD stuff I was planning to post.

It all started with an assignment me and my brethren freshmen got on our programming subject that would be loosely translated to /Programming languages and design methods/ I believe. We had to code a rather simple algebraics calculator that uses infix notation and supports several math functions such as factorial, power et cetera. The more advanced version of this assignment had to implement integrals and plotting so I thought "why the hell not?". Obviously I didn't aim for the second version as it implied a rather advanced framework that would support all kinds of goodies, because I can't stand writting a piece of crap that's only purpose is to work past the rating tests, and such a framework, the fun experience it might be, would take lot's and lot's of time to become of a satisfying elegance level. Yeah, I am picky, you don't even imagine. Anyway there it was, although I wasn't quite satisfied with it, keeping in mind all this "advanced" and "framework" stuff, it turned out quite... we'll see. I called it "latmab" for giggles and you can get it right here. (Nope. Chuck Graduatesta. I guess you're better off this way.)

And now some explanation: we normally use C++ but the profesor, a kind gentleman he is, allowed me to use D, hence latmab is written in D.

...

I liek D.

Latmab has a rather simple parser that makes a havy use of regexps and RPN. I also had some design issues so... Go read the README. That's it. NEXT!

No seriously, this code is ugly so I'll be skipping alot. We'll start with the REPL:

char[] input;

write(prompt);
while(stdin.readln(input)) {
    auto list = "Aviable operations: "~parser.getBinaryOps~"\n"
                "Aviable variables:  "~parser.getGlobalScope~"\n"
                "Aviable functions:  "~parser.getGlobalFunctions~"\n";

    auto m = match(input, regex("prompt\\s*=\\s*"));
    if(!m.empty) {
        prompt = m.post[0..$-1].idup~" ";
    }
    else if(input == "exit\n" || input == "x\n" ||
            input == "quit\n"  || input == "q\n")       break;
    else if(input == "help\n" || input == "h\n")        writeln(help);
    else if(input == "list\n" || input == "l\n")        writeln(list);
    else if(input == "welcome\n" || input == "w\n")     writeln(welcome);
    else try {
        writeln("\t", parser.doString(input.idup));
    }
    catch(Exception e) {
        writeln("\t", e);
    }
    write(prompt);
}

Short yet ugly. First, it creates a list of aviable symbols in the environment... Using three different functions. Next, it checks for prompt assignment, cause that's a must-have feature. (Yeah, I'll be like "no man, I had nothing to do with this crap, nu'uh" all the time.) Next, it searches for some REPL commands and evaluates them if neccessary. (Oh man, it could have been written so much better... An alist of thunks for instance.) After that, there has to be an actual piece of latmab code to be evaluated, and so it evaluates it using the Parser instance (sooo modular, it does parsing, evaluating, and it keeps track of the environment). It uses the try-catch block, since the parser throws up on chunks of malformed code, such as syntacticly incorrect code.

Now let's have a look under the hood, a very short though, because it's a pretty boring piece of software:

First of all, the grammar - latmab uses no grammar, just some obvious regexps:

/***********************************************************************************
 Regular Exressions
 RegExps used in this parser.
 *********************/

const string space              =   "[\\t\\r\\n ]+";                        //Anything between 1 .. * whitespaces.

const string digit              =   "[0-9]";                                //Decimal digits.
const string integerLiteral     =   "-?"~digit~"+";
const string realLiteral        =   integerLiteral~"\\."~digit~"+";
const string numberLiteral      =   "("~realLiteral~"|"~integerLiteral~")";

const string alnum              =   "[a-zA-Z0-9_]";                         //Alphanumeric characters.
const string identifier         =   "[a-zA-Z_]"~alnum~"*";                  //Can't start with a digit.
const string rvalue             =   "("~identifier~"|"~numberLiteral~")";
const string lvalue             =   identifier;

                                                                            //Builtin operators.
const string binaryOp           =   "(!=|==|>=|<=|\\*=|\\/=|\\+=|\\-=|[\\^\\*\\/\\%\\+\\-\\=\\>\\<\\,])";

const string statementSeparator =   ";";                                    //Various special characters.
const string tokenSeparator     =   " ";
const string lineStart          =   "^";
const string lineEnd            =   "$​";
const string bracketStart       =   "\\​(";
const string bracketEnd         =   "\\​)";

const string all                =   "(" ~space~"|"
                                        ~identifier~"|"
                                        ~numberLiteral~"|"
                                        ~binaryOp~"|"
                                        ~statementSeparator~"|"
                                        ~tokenSeparator~"|"
                                        ~bracketStart~"|"
                                        ~bracketEnd~"|"
                                        ~")";

In top-down order: whietespaces, number literals with all intermediate stages, symbols along with r/lvalues for improved readability later on, and the first design error - it treats binary operators as a separate entity from general symbols, what proved to cause much trouble in actual parsing. Next, there are some special symbols and a rarely used regexp for identifying unwanted characters.

This is how the grammar is represented in the AST:

/***********************************************************************************
 Expression classes:
 *********************/

abstract class Expression {
    real eval();
    real value() {
        return eval();
    }
}

class Value : Expression {
    private string name;

    this(string name) {...}

    real eval() {...}
}

class NullValue : Expression {
    real eval() {
        return LATNULL;
    }
}

class BinaryOperator : Expression {
    string name;
    Expression lvalue = null;
    Expression rvalue = null;

    this(string name, Expression l, Expression r) {...}

    real eval() {
        switch(name) {
            case "^": return lvalue.eval ^^ rvalue.eval;
            case "*": return lvalue.eval * rvalue.eval;
            case "/": return lvalue.eval / rvalue.eval;
            case "%": return lvalue.eval % rvalue.eval;
            case "+": return lvalue.eval + rvalue.eval;
            case "-": return lvalue.eval - rvalue.eval;
            case ">": return lvalue.eval > rvalue.eval;

            ...

        }
    }
}

alias real delegate(Expression[] args) functionDef;

class FunctionCall : Expression {
    functionDef func;
    Expression[] args;

    this(Expression[] args, functionDef func) {...}

    real eval() {
        return func(args.reverse);
    }
}

Going top to down there is: abstract Expression - a base for AST classes, Value - an atom - either a number or a symbol, NullValue (actually NaN, since latmabs only datatype is 'real'). And here be dragons... BinaryOperator class is just a missunderstanding, seriously, who wrote that piece of-

Next, we have something that I actually like. (Ah forget it. Allright, I did write this thing. In the other news this whole program took me only a single evening with no refactoring ever made.) The FunctionCall class uses a ready-made implememnation of a function and evaluates it using args passed from the latmab code. It uses funcionDef type, which is an alias to a delegate - D's closure, for the function definition. It features lazy evaluation, since the arguments are passed in an array to the builtin function as-is and then get evaluated in their implementation if needed.

The FunctionCall class allows for quite a few cool things, such as this:

parser.add("sq", 1, delegate real (Arg[] args) {
                        return args[ 0].value^^2;
                    });
parser.add("pow", 2, delegate real (Arg[] args) {
                        return args[ 1].value^^(args[ 0].value);
                    });
parser.add("sqrt", 1, delegate real (Arg[] args) {
                        return args[ 0].value^^(1./2.);
                    });
parser.add("root", 2, delegate real (Arg[] args) {
                        return args[ 1].value^^(1./args[ 0].value);
                    });

And this:

parser.add("if", 3, delegate real (Arg[] args) {
                        if(args[ 0].value) args[ 1].eval;
                        else args[ 2].eval;
                        return LATNULL;
                    });
parser.add("for", 4, delegate real (Arg[] args) {
                        args[ 0].eval;
                        while(args[ 1].value) {
                            args[ 3].eval;
                            args[ 2].eval;
                        }
                        return LATNULL;
                    });

AAAND even that:

parser.add("helloWorld", 1, delegate real (Arg[] args) {
                        writeln("Oh hi ", parser.getGlobalScope);
                        return LATNULL;
                    });

In the first chunk we add some simple oneliners, nothing special. In the second one though we have something interresting: Thanks to the lazy evaluation it's possible to implement both conditionals and loops. Latmab turns out to be a not-quite-there-yet-but-going-steadly programming language. At the time I was implementing this feature I didn't realise how cool could that have been, so a few terrible design decisions later it was too late to benefit from that. (Allright, I was too lazy to rewrite it. Fine. Guilty as charged). The third chunk uses the advantage of delegates in D - it prints the current list of the latmab global variables using D variable from the outter scope - an instance of the parser itself. The rest of the code is rather boring, so I'll skip it, just some simple regexp based tokenization and RPN. Nothing new there, you can download it and see for yourself.

Long story short, I ended up having a global variable scope, a global function scope, a builtin operators scope and not too complicated syntax, but due to some not so clever ideas latmab was at its peak although it wouldn't go anywhere. It's worthy to note here that it was my first project of this kind. Refactoring was badly needed, but I, beeing a lazy ass I am, didn't do it. Instead I've created a fodder for the next blog entry...

Lastly, a test-drive from the latest latmab version:

Latmab v13.03.11 Copyright (C) 2011 Kajetan Rzepecki nope.nope@nope.nope
This program comes with ABSOLUTELY NO WARRANTY; for details see README.txt.
This is free software, and you are welcome to redistribute it
under certain conditions; see README.txt for details.
Type 'help' for help.

> PI
        3.14159
> 2 ^ (2 + erf(E))
        7.99933
> list
Available operations: [%, *, *=, +, +=, ,, -, -=, /, /=, =, ^, <, <=, >, >=, ==, !=].
Available variables:  [E, LN10, LN2, LOG2E, PI, PI2, PI4].
Available functions:  [abs, cos, ctg, do, erf, exp, fact, for, gamma, helloWorld, if, log, log10, log2, pow, rand, root, sin, sq, sqrt, tan, while, write].

> quit

2016-02-20: Adjusted some links & tags.