No Representation without Taxation

I think we’re at a point where we need to question our term/value representation technology. That’s not to say we need to change it, but we should look at where things are getting a bit out of hand and ask if there’s anything we can do. The system is not so large that…actually, it’s small. However, before we do anything foolish, I thought I’d try to remember where we’ve been and why. Please join in. We need to figure out what breaks what. Most representation choices have pros and cons (a few just have cons).

Where are we at the moment? Terms come in two directions (In and Ex, according to whether their typechecking process inhales or exhales types), and two phases (TT and VV, for ‘term’ and ‘value’). What? Phase affects representation of binding and of redexes. Note that free variables are always references, carrying type and value information for ease of checking and evaluation.

  • Terms may contain non-normal forms: unexpanded definitions, type-annotated canonical forms in elimination positions. Terms use de Bruijn indices for bound variables.
  • Values are beta-delta normal: they contain no opportunities for further computation, although they may admit eta-expansion; they use the Haskell function space to represent binding, which makes instantiation blessedly easy.

We get from terms to values by evaluating (with respect to an environment which assigns values to bound variables: evaluating a binder yields a Haskell closure). We piggy-back on Haskell’s laziness: values are computed on demand. Demand consists of pattern matching, and it makes sense to pattern match on values for types without eta-laws (positive types, e.g. Set). Values in positive types have the ‘Geuvers property’: if a value is definitionally equal to a constructor form, then it already has that constructor form.

We get from values to terms by quotation. Quotation always requires a name supply, so we can generate fresh variables when we go under a binder (applying the Haskell functions to something concrete but inert). If you want quotation to do any eta-expansion, you’d better supply a type (usually not hard to find). But it is possible to quote a value as a beta-delta normal form without supplying type information. Quotation is expensive, and it seriously breaks sharing.

Programming with values is rather pleasant, because it’s programming directly with what things really are, using pattern matching, lambda-abstraction, and so on. In Epigram 1, I used values for as much as possible; terms were never persistent; quotation happened where necessary (to allow metavariable instantiation, for equality testing, for display). This made my life easier, but proofstate update super-duper-wuper-slow. You remember? Ah, you’re still waiting…

In Epigram 2, we started out with an all-first-order representation of terms and values, achieved just by replacing the Haskell closure representation of binding with term-environment pairs. This was just fine for instantiation, but murder for abstraction. It became clear that implementing our repertoire of hardwired operations in any maintainable way was just not manageable without yer actual lambda. So here we are, flipping between terms and values: whenever you want one, you’ve always got the other. Sharing is hard to hang onto. At least we ensure that the proof state holds a term representation which is easy to update, but if those terms are the output of quotation, they can be nasty.

There are lots of things we might do to alleviate the situation. One clear win is to delay computing definitions until they are applied to as many arguments as they have lambdas. Crucially, that preserves the Geuvers property, as our non-delta-normal forms inhabit negative types (function types, in particular); eta-expansion saturates these applications and forces computation.

But I’m contemplating the more radical prospect of returning to a first-order representation of values in weak-head-normal form (or maybe something which forces computation in positive positions, to allow matching). It would be good to have a constant time embedding back to terms, which would be possible if we allowed closures in terms. So I’m trying to remember: in what ways did first-order values make our lives miserable? I’m guessing the trouble was that it was awful writing closures instead of lambdas. If values embed cheaply in terms, can we ensure that we always read values and write terms?

One thing’s for sure. We’ve been working too hard. We might work a bit smarter if we had a clearer picture of our requirements. What is it that we need to be able to do? What’s pushing us around? (If it were just typecheck and evaluate, then the values-everywhere approach would be just dandy. Sadly, there’s more to it than that.)

3 Responses to “No Representation without Taxation”

  1. Conor says:

    As a footnote, some info on what we care about when.

    To typecheck a term, it must be beta-normal. Environment-body pairs cannot be typechecked without substituting them out (hereditarily).
    To match on a term, it must be beta-delta-normal and of positive type. (Positive types are Set, Prop, enumerations, datatypes. Not Pi, Sig, proofs, codatatypes.)
    To update a term, we care only that it is represented in a first-order way.
    To construct a term, e.g. when implementing an operator, we like some sort of lambda (Haskell or de Bruijn), not environment-body pairs.
    To go from functional binding or environment-body pairs to a first-order lambda requires a name supply.
    To typecheck requires a name supply and bidirectional type information.
    To eta-expand requires a name supply and bidirectional type information.
    To beta-normalize or beta-delta-normalize requires only a name supply.
    Whatever ‘values’ operators consume must be easily embedded anywhere in whatever they produce, without a name supply.

    These constraints admit a variety of solutions. Are there more constraints?

  2. Glenn Willen says:

    “We get from values to types by quotation.”

    This should presumably read “from values to terms”. I suspect your target audience won’t be confused by this, but it took me a minute or two to figure it out.

  3. Adam says:

    Thanks for pointing this out. I’ve fixed the typo.

Leave a Reply