From ok@atlas.otago.ac.nz  Mon May 15 04:46:07 2000
Received: from atlas.otago.ac.nz (atlas.otago.ac.nz [139.80.32.250])
	by swi.psy.uva.nl (8.9.3/8.9.3) with ESMTP id EAA17935
	for <prolog@swi.psy.uva.nl>; Mon, 15 May 2000 04:46:05 +0200 (MET DST)
Received: (from ok@localhost)
	by atlas.otago.ac.nz (8.9.3/8.9.3) id OAA08987;
	Mon, 15 May 2000 14:46:15 +1200 (NZST)
Date: Mon, 15 May 2000 14:46:15 +1200 (NZST)
From: "Richard A. O'Keefe" <ok@atlas.otago.ac.nz>
Message-Id: <200005150246.OAA08987@atlas.otago.ac.nz>
To: jsevaj@corp.phone.com, prolog@swi.psy.uva.nl
Subject: Re:  Variable implementation

	I'm currently writing a (pseudo-)prolog interpreter. (It is a personal, =
	non-commercial project). I was wondering if anyone knows any good =
	resources on the internet or books on this subject. I particularly =
	interested in information related to variable implementation, how to =
	distinguish bound and unbound variables and how the binding actually is =
	performed.=20

Hassan Ait-Kaci's book on the WAM (MIT Press, I _think_) describes this
in some detail.  The WAM representation for data structures goes roughly
like this

	(bits << 2) + CONST_TAG		a constant
					another bit distinguishes (small)
					integers from atom numbers, which
					are indices into an expanding table;
					8 bits at the top of atoms are
					reserved for arity (0 in this case).
	(pointer)   + VAR_TAG		if the pointer points to the same
					cell, an unbound variable;
					if the pointer points to another
					cell, a variable that was bound
					to another variable
	(pointer)   + PAIR_TAG		represents [H|T]; the pointer
					points to the first word of a
					two-word block (which holds H;
					T is in the second word)
	(pointer)   + STRUCT_TAG	represents either a compound
					term f(T1,...,Tn) or a "boxed constant"
					For a compound term, the pointer
					points to an array of n+1 words;
					the first is the code for f with
					n in the top bits, the remaining
					ones represent the arguments.
					For a boxed constant, the pointer
					points to a block of words; the
					first is tagged as an integer and
					encodes the size and subtype.
					The remaining words are untagged.

Deep unification goes like this, where r1 and r2 represent registers
(so never point to themselves)

    unify(r1, r2) =
	loop					-- dereference r1
	    exit when tag(r1) /= VAR_TAG	-- stop for non-variable
	    t := r1[0]			
	    exit when t = r1			-- stop for unbound var
	    r1 := t				-- follow the binding
	end loop
	loop
	    exit when tar(r2) /= VAR_TAG
	    t := r2[0]
	    exit when t = r2
	    r2 := t
	end loop

	case (tag(r1), tag(r2)) is
	when (CONST_TAG, CONST_TAG) => if r1 /= r2 then FAIL end if
	when (PAIR_TAG, PAIR_TAG) =>
	    unify(r1[0], r2[0]);
	    unify(r1[1], r2[1]);
	when (STRUCT_TAG, STRUCT_TAG) =>
	    if r1[0] /= r2[0] then FAIL end if
	    if r1[0] represents an atom/arity pair then
	        for i from 1 to arity(r1[0]) loop
		    unify(r1[i], r2[i])
		end loop
	    else
	        for i from 1 to arity(r1[0]) loop
	            if r1[1] /= r2[i] then FAIL
		end loop
	    end if
	when (VAR_TAG, VAR_TAG) =>	-- both unbound
	    if r1 < r2 then
	        bind_and_trail(r2, r1)
	    elsif r1 > r2 then
	        bind_and_trail(r1, r2)
	    end if
	when (VAR_TAG, ?) =>		-- r1 unbound
	    bind_and_trail(r1, r2)
	when (?, VAR_TAG) =>		-- r2 unbound
	    bind_and_trail(r2, r1)
	when others => FAIL
	end case

where
    bind_and_trail(var, value) =
	if var < deterministic limit then
	    *trail++ := var
	end if
	*var := value

I've typed this in a bit of a rush, but that will give you the
idea how *one* kind of Prolog implementation works.  There are
several other ways to implement Prolog, and you should read
Paul Tarau's papers at the BinProlog web site.
	

