Using a Rule-based System for type inference.

Posted on 2016-07-18 20:23 by Idorobots

Recently, I came across this post about implementing custom type systems using Prolog and it inspired me to revisit an old logic-programming-related project of mine.

About a year and a half ago I took a shot at using Prolog for type reconstruction (a.k.a. type inference), but that resulted in a load of not much for one reason or another. I did write this Rule-based system shortly thereafter, however I did not give much thought to using it for type reconstruction. Let's change it...

(assert! (: map (-> ((-> (?a) ?b) (list ?a)) (list ?b))))

Rule-based Systems

...but first some theory. A Rule-based System, or RBS for short, is a way of organizing and processing knowledge - any statement of fact about the domain you're modeling. As the name implies, RBSes use rules to modify a knowledge base consisting of facts. For the purpose of this post, a rule has the following form:

(whenever condition action)

That is, whenever a certain condition is met, an action will be performed. That's not very novel, it's just a function with some pattern matching. The clever bit is the fact, that rules are triggered by changes to the knowledge base, which they can themselves modify by asserting and retracting facts. Here's a rule that makes some true assertions about functions in the Scheme programming language (this is actual code):

(whenever (and (language-of ?f Scheme)
               (is-a ?f function))
  (?f) => (assert!* `(: ,?f (-> ?anything ?anything-else))))

(assert! (is-a factorial function))
(assert! (is-a map function))

;; factorial is now typed! Kind of...
(assert! (language-of factorial Scheme))

So, what's the difference between Prolog and RBSes? Mostly, the direction of logic inference - in Prolog you usually assert some facts and then query the knowledge base to find relations between them. This is called backward-chaining because the inference engine takes a look back at the data it already has:

mother(alice, bob).

father(chad, bob).
father(chad, daryl).
father(daryl, alice). % ¯\_(ツ)_/¯

parent(X, Y) :- father(X, Y); mother(X, Y).
sibling(X, Y) :- parent(Z, X), parent(Z, Y), X \= Y.

?- sibling(bob, X). % daryl

Rule-based Systems, however, use a different approach called forward-chaining where logic inference is performed live, going forward, whenever any new knowledge is available.

Of course, backward-chaining is more suitable for building automatic theorem provers, such as type reconstruction engines, because that's pretty much what it does - it proves theorems given some initial conditions. This doesn't necessarily mean that RBSes are utterly useless in this matter. The cool thing about the algorithm I used in my RBS implementation is that it can be slightly modified to emulate backward-chaining in addition to its normal forward-chaining operation - all you need to do is run a rule in a fresh environment giving it all currently known facts as inputs, which isn't as bad as it sounds. Trust me.

You can query the knowledge base in my RBS by invoking select. For instance, to get all Scheme function from the previous example try this:

(select (?f)
        (and (is-a ?f function)
             (language-of ?f Scheme)))

RBS-based type inference

So, let's infer a type for the factorial function defined as follows:

(define (fact n)
  (if (= n 0)
      1
      (* n (fact (- n 1)))))

Let's assume that we know a thing or two about the types of the built-in functions used by fact:

(assert! (: = (-> (?any ?any) bool)))
(assert! (: - (-> (int int) int)))
(assert! (: * (-> (int int) int)))
(assert! (: if (-> (bool ?t ?t) ?t)))

Now, inferring the correct type given this knowledge amounts to constructing a clever knowledge base query:

(assert! (: fact (-> (?n) ?fact))) ;; Assume fact is a function.
(assert! (: n ?n))                 ;; Assume fact's argument has a type.

(select (?type)
        (and (: = (-> (?n int) ?eq))           ;; (= n 0)
             (: - (-> (?n int) ?n))            ;; (- n 1) - passed to fact.
             (: * (-> (?n ?fact) ?mult))       ;; (* n fact-result)
             (: if (-> (?eq int ?mult) ?fact)) ;; (if ...) - returned as the result.
             (: fact ?type)))

It properly reports a type of (-> (int) int), namely a function of one parameter of integer type returning a value of integer type. Here's a somewhat more involved example, the my-map function:

(define (my-map f l)
  (if (null? l)
      null
      (cons (f (car l))
            (my-map f (cdr l)))))

;; Some prior knowledge of the builtin functions:
(assert! (: null? (-> ((list ?a)) bool)))
(assert! (: car (-> ((list ?b)) ?b)))
(assert! (: cdr (-> ((list ?c)) (list ?c))))
(assert! (: cons (-> (?d (list ?d)) (list ?d))))
(assert! (: null (list ?e)))

;; Some assumptions as before:
(assert! (: my-map (-> (?f ?l) ?my-map)))
(assert! (: f ?f))
(assert! (: l ?l))

(select (?type)
        (and (: null? (-> (?l) ?null?))
             (: null ?null)
             (: f (-> (?fa) ?fb))
             (: car (-> (?l) ?fa))
             (: cdr (-> (?l) ?l))
             (: cons (-> (?fb ?my-map) ?cons))
             (: if (-> (?null? ?null ?cons) ?my-map))
             (: my-map ?type)))

This time it reports a type of (-> ((-> (?c) ?e) (list ?c)) (list ?e)), that is: a function taking two parameters, the first one being a function from a, uh, logical variable ?c to a logical variable ?e and the second being a list of ?c's, that returns a list of, uh, ?e's...

¯\_(ツ)_/¯

This is significant, because we just inferred a generic type for a function - my-map doesn't care about specific types of its arguments as long as they are consistent. This is a feature that can be found in Standard ML among others.

So, you could in principle use a Rule-based System to perform type reconstruction, but... Why would you do that anyway, when you have Prolog available? Enter forward-chaining, or in this case, interactive, incremental typing:

;; What's the type of result?
(define int-list (list 1 2 3 4 5))
(define result (my-map fact int-list))

(whenever (and (: fact ?f)
               (: int-list ?l)
               (: my-map (-> (?f ?l) ?result)))
          (?result) =>
          (display "Finally got all the info to infer the type of result!")
          (assert!* `(: result ,?result)))

(assert! (: fact (-> (int) int)))                            ;; Not enough knowledge.
(assert! (: my-map (-> ((-> (?a) ?b) (list ?a)) (list ?b)))) ;; Still nope.
(assert! (: int-list (list int)))                            ;; Now we're talking!

Here we infer the type of result as soon as we infer the types of all its dependencies (in this case, my-map, fact and int-list). I could see this used in an implementation of a incremental, interactive compiler allowing for tight integration with an IDE. Think Ensime-type deal.

Still not convinced? Well you know, you still have access to all the usual RBS perks in addition to the type system - be in event programming, decision making or plain-old knowledge mining. For instance, here's a rule that automatically registers callbacks whenever a function with a suitable type is found:

(define *loggers* (ref '()))

(define (register-callback! callback)
  (assign! *loggers* (cons callback (deref *loggers*))))

(define (log str)
  (map (lambda (f) (f str))
       (deref *loggers*)))

(whenever (: ?function (-> (string) unit))
          (?function) =>
          (display "Found a suitable logger!") (newline)
          (register-callback! ?function))

;; ...elswhere in the code:

(log "Hello world!") ;; Nothing happens.

(define (simple-logger str)
  (display "[LOG] ")
  (display str)
  (newline))

(assert!* `(: ,simple-logger (-> (string) unit)))

(log "Hello world!") ;; Simple logger runs.

(define (logstash-logger str)
  (send-to-logstash str))

(assert!* `(: ,logstash-logger (-> (string) unit)))

(log "Hello world!") ;; Both loggers run.

Pretty neat!