Elaborating on the State of the Proof State

The Epigram proof state is a collection of definitions in various states of repair. At any given time, it represents a snapshot in the process of elaborating a program in Epigram’s Source Language (which doesn’t really exist yet) into terms in Epigram’s Language Of Verifiable Evidence (perpetually in flux). In this post, I’ll try to sketch out the plan of the proof state and explain a bit about how elaboration sits on top.

Global Definitions

The basic idea is that the proof state is a sequence of global definitions, growing at the local end (a snoc-list, as it were), any of which may be a hole. Each definition gives a name either a ? or a known value (in the form of a term), and a type.

name = (?|term) : type

To check that a definition is ok with respect to a bunch of prior definitions, check that the type is a type, that the term (if given) has that type, and that the name is fresh. E.g.,

S = ? : Set
T = ? : S → Set
F = Π S T : Set

relies on the existence (but not any particular value) of S to ensure that T has a type; F’s value relies on both S and T.

Note that all definitions are in scope for subsequent definitions (but not for themselves — no recursion!) and that the equational theory of the new definition in scope is the obvious: she equals herself and also her value if she has one. We know that the type F, above, admits λ-abstractions, because she is, by definition, equal to a Π-type. Evaluation (to weak-head-normal form) expands definitions in head positions. Full normalization leaves only the holes. To make evaluation easier, share the whole definition with every reference: it’s the parser that looks up definitions, so the evaluator can just use their cached values.

Got the model of the snoc-list of definitions? Good, because the team didn’t quite implement it that way…but it is a good reference model for what actually is implemented. Whenever we introduce more structure, let’s make sure we can always relate it back to that simple snoc-list model.

Development Trees

Firstly, we don’t just use a flat list. We structure the list into a tree, and we give the definitions long names which specify paths to definitions from the root of the tree. Nodes in the tree are called developments, and they may contain a bunch of subdevelopments,

F [ S = ? : Set
T = ? : S → Set
] = Π S T : Set

where F.S and F.T are holes, and F herself is a construction made from those holes. That is, a development’s subdevelopments scope over her definition if she has one. A development can have subdevelopments but no definition herself, in which case we call her a module (although SML enthusiasts will be rightly underwhelmed by this use of the term). The purpose of this tree structure is to reflect the source language syntax of expressions and programs. Each node in the source syntax corresponds to a development in the proof state: the subdevelopment structure respects the subnode structure.

Our little ‘Cochon’ interface lets you zipper around in this tree, inserting new developments, turning modules into full definitions, and filling in holes with values. In Cochon interaction, but at no deeper level, naming is relative: e.g., the type of F.T refers to her sister F.S as S. However, the terms we actually build use the longnames everywhere. This is a bit of a change from chapter 2 of my thesis, where subdevelopments were always tentative ‘guesses’ with only local scope.

Elaboration Drives Construction

Cochon-driving is a low-level way to do business: it’s a crude command language wrapping the API we use for building higher-level elaboration processes. An elaboration process is a strategy, associated (by ←) with a hole, whose purpose is to deliver a value for that hole, possibly spawning subprocesses attached to subdevelopments. E.g., if we had the hole

F = ? : Set ← check (x : A) -> B

the ‘check’ process might well elaborate thus

F [ S = ? : Set ← check A
T = ? : S → Set ← check \ x -> B
] = Π S T : Set ← check (x : A) -> B

You can interpret such a tree of definitions as a snoc-list, just by a postorder traversal, so that each definition follows (and may thus have a value depending upon) her daughters. Just as in the snoc-list model, definitions never go out of scope once they have made their début. Daughters readily escape from mother. In fact, that’s crucial. Suppose, following our F above, we had

f = ? : F ← check \ y -> t

we should need access to F.S to type y and F.T to check t. But how will we check t? I haven’t told you how to go under a binder!

Local Parameters

As well as definitions, the proof state also allows local parameters. Our new example problem might elaborate thus:

f [ λ y : F.S
b = ? : F.T y ← check t
] = b : F.T y ← check \ y -> t

A parameter’s scope is local to his development: he scopes over his younger sisters and his mother’s value and type. Lifting over her parameters, f’s type is Π F.S (λ y → F.T y), which (thanks to η-rules) is only a superficial change from the original F. Meanwhile, from outside f, we really have a subdevelopment called f.b whose type is also lifted over y to give Π F.S (λ y → F.T y). The relative name b, used in f’s value, really means the application f.b y, and more generally, we suppress (in Cochon interaction) the application of a definition to the sequence of parameters shared between her definition site and any given usage site. The underlying term representation is fully explicit (and hence portable anywhere in the definition’s scope), with long names applied to spines of shared parameters.

I hope you can still see how to flatten our trees into snoc-lists of definitions. It’s just the same postorder traversal, but now you λ-lift each definition over the parameters on the path back to the root, breaking sharing like billy-o.

‘Nixed’ Developments

In the elaboration of expressions, the parameters of definitions tend to correspond to variables bound in the source language. Bound variables scope over a particular subexpression and remain in scope for the nested subdevelopments involved in the elaboration of that subexpression. Our policy that parameters scope over their little sisters fits perfectly with that scenario. But it is not the only scenario: programming is a touch more complex.

Let’s have a concrete example. Suppose we have a type Nat : Set with constructors ‘ze and ’su. (Constructors are a separate class of symbol, currently distinguished by a quote mark, and overloaded in a type-directed way.) Let us also have an induction principle, like a fancy if-then-else

induction : (n : Nat)(P : Nat → Set) →
P ‘ze → ((k : Nat) → P k → P (’su k) → P n

If we’re implementing Boolean ‘eq’, say, we’ll end up (after a certain amount of setup) in a situation like this

g [ λ x : Nat
λ y : Nat
] = ? : ⟨eq x y : Nat⟩ ← refine <= induction x

where ⟨eq x y : Two⟩ is set up to be an isotope of Two, marked up with a left-hand-side involving pattern variables x y.

We’ll need x and y in scope to figure out what induction x means, but we don’t want useless cruft in the subgoals. We want two cases,

g [ λ x : Nat
λ y : Nat
z [ λ y : Nat
] = ? : ⟨eq ‘ze y : Two⟩
s [ λ x : Nat
λ h : (y : Nat) → ⟨eq x y : Two⟩
λ y : Nat
] = ? : ⟨eq (’su x) y : Two⟩
] = induction x (λ x → (y : Nat) → ⟨eq x y⟩) z s
: ⟨eq x y : Two⟩ ← refine <= induction x

The subgoals give the left-hand-sides for the subprograms, and the inductive hypothesis handily tells you which recursive calls make sense. But there’s a nuisance.

As things stand, both the old pattern variables and the new pattern variables are in scope for the subgoals z and s, even though they’re redundant. (Moreover, they’re not shadowed! In Epigram, every name is a space of de Bruijn indices: if you repeat a name, you shift rather than shadowing. E.g., the type of the polymorphic identity function is (X : Set)(X : X) → X^1, where X^1 means skip one local X on the way to find the intended X.) We certainly don’t want them involved in any further refinements, which they easily could be: the original y was copied into the motive to ensure that the inductive hypothesis h could be invoked with any old second argument, but we don’t want to extend the same generosity to outdated rubbish that just happens to be lying around.

We still want to keep z and s where they are in the proof state, reflecting the subprogram structure, but we want to kick the old pattern variables out of scope. The team have been ignoring this problem for years, as it only shows up when elaboration starts to get interesting. The fix (as of last week) is to allow developments whose context of parameters has been ‘nixed’ — reset to empty. If we nix z and s, above, we get just the behaviour we want.

The translation from the tree-structure proof state to the snoc-list structure now λlifts over just the parameters on the path back to the nearest nixed development, and the root is nixed. Nixing affects only the scoping of parameters, not definitions. Cochon continues to use relative names and suppress shared parameter prefixes: if the path from definition site to use site passes into or out of a nixed development, there are no shared parameters.

Elimination with a Motive

With the proof state thus organised and nixed developments added, we’ve been able to implement the ‘elimination with a motive’ refinement step that’s central to Epigram program elaboration, pretty much the way we want it (one or two sneaky optimizations are missing, but all in good time). An eliminator, in the Epigram sense, is anything with a type of this shape, quantifying universally over its return type:

(Motive : … → Set) →
(method₁ : …) →
…
(method_n : …) →
Motive target…

So, induction x, above, is an eliminator. To refine a goal by such an eliminator, one must synthesize a suitable Motive from that goal in the manner I learned from James McKinna (and about which I have written at great length elsewhere). The methods then become the subgoals.

Nixing means we can keep the method subgoals where in the tree we want them but manage the parameters in scope separately. Eliminations like the induction I showed above, and indeed, more complex constrained eliminations over indexed sets (GADT pattern matching, as it were), are now back in the system and at last working as they should.

EWAM gives a programmable version of Agda’s C-c C-c case analysis manoeuvre. Elaboration of Epigram programs really is just construction by refinement in the underlying Language Of Verifiable Evidence, with careful use of type-based encodings (as in the eq example) managing the source-level purpose of the underlying definitions in the proof state.

More Threads to Unravel

I haven’t said very much about how the consequences of hole-filling are pushed around the proof state. In the above example, we shall have much more chance of filling in what f.b is once F.S and F.T have acquired values. We can’t rely on that information turning up in any particular order (even if you have the whole program, you can’t rely on any particular order of constraint-solving). We need a way to transmit new information through the proof state in a way which impacts on elaboration processes, potentially unblocking those awaiting that information. The proof state is, in some sense, a model of threads running concurrently in an operating system. But that’s another story.

This entry was posted on Tuesday, July 26th, 2011 at 5:53 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

Epilogue