Common Lisp Extensions: UNIFICATION Dictionary
The notion of unification originated in the field of formal logic (e.g. Robinson, 1956) and has been used extensively in Computer Science and Programming Languages. Prolog obviously uses the full power of unification, however, unification is also at the core of type checking algorithms in the tradition of Milner's, and a limited form - pattern matching - is available to the user in languages of the ML and Haskell family.
The library presented in these pages provides a full blown unification framework for Common Lisp.
Writing a pattern matcher or a an unifier in Common Lisp is easy, as long as we limit ourselves to manipulate only ATOMs and CONSes.
Alas, it would be much nicer if we could manipulated arbitrary Common Lisp objects as the ML programmer can with arbitrary ML objects.
The library presented here is the first one (to the best of the author's knowledge) that is capable of manipulating arbitrary Common Lisp objects.
The hope is that this library could be incorporated directly in an implementation in order to provide better type checking.
This should not come as a surprise, as a compiler like CMUCL does include a type inference engine, which does very similar things.
Unification Basics
The unification process makes sure that two object descriptions containing some holes - i.e. variables - can be made equal (almost in the EQUALP sense) by assigning consistent values to the variables involved.
Suppose we had a function U performing unification and returning a set of variable assignments, often called a substitution. A very simple example involving the unification of two numbers could be
U(42, 42) ==> {}The two numbers are EQL, so no variable is involved and the empty substitution is returned.
U(42, 123) ==> <unification failure>The two numbers are not EQL, so the unification fails.
U(42, x) ==> {x -> 42}The only way to make the unification process to succeed is to bind the value 42 to the variable
x
.
The UNIFICATION library defines all the necessary functions and a unification sub-language to handle most of Common Lisp.
UNIFICATION Library
The UNIFICATION library has one main entry point, the generic function UNIFY, and a sub-language definition that allows us to talk about Common Lisp objects.
The UNIFY generic function has the following signature:
unify x y &optional substitutionWhere
x
and y
are either
arbitrary Common Lisp objects, variables,
or object templates. These items constitute the so-called
extended terms manipulated by the unification
machinery.
Variables are symbols with a #\?
as the
first character of the name. This is a rather traditional choice,
although a different one based on quoted symbols is possible.
Therefore, the following are examples of variables.
?A ?s ?qwe ?42z ?a-variable-with-a-very-long-name ?_There are two special variables,
?_
and _
,
which are used as anonymous place holders, they match anything, but
never appear in a substitution.
Hence, the above examples result in the following
cl-prompt> (unify 42 42) #<EMPTY ENVIRONMENT xxxxxx>
cl-prompt> (unify 42 123) ==> error: unification failure
cl-prompt> (unify 42 ?x) #<ENVIRONMENT xxxxxx> cl-prompt> (find-variable-value '?x *) 42Where FIND-VARIABLE-VALUE is the accessor used to find the value of a variable in a substitution.
As a more complicated example, consider the usual CONS based unification
cl-prompt> (unify '(foo (bar frobboz) ?baz) '(foo ?gnao 42)) #<ENVIRONMENT xxxxxx> cl-prompt> (find-variable-value '?gnao *) (BAR FROBBOZ) cl-prompt> (find-variable-value '?baz **) 42
Of course note the following behavior
cl-prompt> (unify '(foo ?x 42) '(foo 42 ?x)) #<ENVIRONMENT xxxxxx> cl-prompt> (unify '(foo ?x 42) '(foo baz ?x)) ==> error: unification failure
UNIFY works also on arrays and vectors. Strings are treated as atomic objects
cl-prompt> (unify #(1 2 3) #(1 ?x ?y)) #<ENVIRONMENT xxxxxx> cl-prompt> (find-variable-value '?x *) 2
cl-prompt> (unify #2A((1 2 3) (a s ?z)) #2A((1 ?x ?y) (a s d))) #<ENVIRONMENT xxxxxx> cl-prompt> (find-variable-value '?z *) D
So far so good, but how can you unify two structures? First of all there is no portable way (yet) to list all the slots of a given structure. Secondly, by allowing the unification of arbitrary CONSes, we have created a short-circuit in the unification machinery.
Dealing with object instances raises similar problems.
Suppose we have the following definition
(defstruct foo a s d)The straighforward
(unify #S(FOO A 42 S NIL D NIL) (make-foo :a 42))cannot be built portably, besides, we do not even have the equivalent of the
#S(...)
notation for regular CLOS objects. Moreover we want
to do other things with other data types.
A simple solution is to define a template sub-language to express unifications between structure objects or standard objects (and other data types as well.) Nevertheless, the definition of such sub-language cannot be CONS based, because in such case, we would conflate arbitrary CONSes and the expressions of the sub-language.
To circumvent this problem we resort to the usual trick a ML programmer uses to placate the type-checker: we introduce an "intermediate" data type. Essentially the following
(defclass template () ((spec :accessor template-spec :type (or cons symbol number) ...)))
The TEMPLATE class is accompanied by a reader macro
(#T
for template, or type, with an
abuse of language) and an
appropriate PRINT-OBJECT method. The #T
reader macro
expands as
#Tsomething ==> (make-instance 'template :spec something)With this infrastructure we can express the unification of the FOO instance as
(unify #(FOO A 42 S NIL D NIL) #T(foo foo-a 42))I.e. we use the actual structure accessor FOO-A to get to the the value of the slot A in a FOO. This is an example of the template language.
A more interesting example, which involves vectors is the following
cl-prompt> (unify #(1 2 3 4 5) #T(vector 1 ?x &rest ?rest)) ; You get the idea... #<ENVIRONMENT xxxxxxx> cl-prompt> (find-variable-value '?rest *) #(3 4 5)I.e. we have a DESTRUCTURING-BIND on steroids.
Note that separataing the templates is necessary if we want to do something like
cl-prompt> (unify '(1 2 3 4 5) #T(list 1 ?x &rest ?rest)) #<ENVIRONMENT xxxxxxx> cl-prompt> (find-variable-value '?rest *) (3 4 5)Without the template denoted by
#T(list ...)
the
unifier would have been utterly confused.
In the following the full extent of the UNIFICATION facility is described in its main components.
Unifying Substitutions
The Template Sub-language
Control Flow
The UNIFICATION Dictionary
Site Map
Enjoy!
Questions? Queries? Suggestions? Comments? Please direct them at me.