I thought I’d just collect together some disparate observations about W-types, and their use in encoding other recursive datatypes. My motivation, bluntly, is to point out the inevitable unsuitability of W-types as the basis for recursive datatypes in an implementation of decidable type theories. I don’t want to downplay the conceptual importance of W-types as a uniform basis for recursive structures in an extensional setting. I’m just saying that computers aren’t clever enough to cope all that well with W-type encodings: if we want to make our understanding go, we need to refine it. This is a long post, so if you want the bad news first, skip to the end.
What are W-types?
Martin-Löf identified a general notion of well-ordering — data with a well-founded notion of ‘structurally smaller’. The power of dependent types enabled him to express this parametrically, once for all, rather than by fiddling about with syntactic criteria for acceptable recursive definitions. Here goes (informal notation)
W (S : Set) (P : S → Set)  :  Set
(s : S) ⊲ (f : P s → W S P)  :  W S P
Martin-Löf chose ‘sup’ as the name of the constructor: each node is the least upper bound (supremum) of its subtrees. I’ve used an infix constructor to draw a picture of a tree growing rightward from a root labelled with a shape s : S, and subtrees attached by the function f to each available position in P s.
Encoding inductive types as W-types
Before we go any further, let’s get some examples nailed down. Let 0 be the empty set with eliminator Ψ yielding whatever you want, () : 1 the unit set, and tt,ff : 2 the Booleans. Let me write anonymous pattern-matching functions like this {p1 → e1 ; … ; pn → en}.
Nat := W 2 {tt → 1 ; ff → 0}  :  Set
zero := ff ⊲ Ψ  :  Nat
suc (n : Nat) := tt ⊲ {_ → n}  :  Nat
Natural numbers have a choice of two shapes: ff, the ‘zero shape’, has no positions for subnumbers; tt, the ‘successor shape’ has just one. Correspondingly, we can implement the constructors, using Ψ to assign subnumbers to all the positions in a zero, and a constant function for all the positions in a suc.
We can play a similar game for unlabelled binary trees.
Tree := W 2 {tt → 2 ; ff → 0}  :  Set
leaf := ff ⊲ Ψ  :  Tree
node (l : Tree)(r : Tree):= tt ⊲ {tt → l ; ff → r}  :  Tree
Note how finite collections of subobjects are represented as functions from a finite domain. X*X = X2.
If you want your data to have contents (or labels) as well as shape, you can follow this pattern, making use of Σ-types to create dependent record structures (s : S, t : T s) : Σ S T.
LW (S : Set)(LP : S → Set * Set) := W (Σ S (fst ⋅ LP)) (snd ⋅ LP ⋅ fst) : Set
I’ve factored the shape into the part S which determines the positions P, and the part which just labels the node without affecting its notion of substructure. I’ve then grouped label and position choices together into a single function LP. We get that cons has a label in X and one child, nil has a trivial label and no children.
List (X : Set) := LW 2 {tt → (X, 1) ; ff → (1, 0)}  :  Set
nil := (ff, ()) ⊲ Ψ  :  List X
cons (x : X)(xs : List X) := (tt, x) ⊲ {_ → xs}  :  List X
Note that the formulation of W-types does not require that the number of children be finite, just the depth. W-types thus capture transfinite well-orderings. W 2 {tt → Nat ; ff → 0} gives you infinitely wide, finitely deep trees.
Induction principles
There’s a generic induction principle for W-types. It’s a bit of an eyeful.
indW (S : Set)(P : S → Set)
(C : W S P → Set)     — some conclusion we hope holds
(c : (s : S)→      — given a shape…
(f : P s → W S P)→      — …and a bunch of kids…
(h : (p : P s)→ C (f p))→     — …and C for each kid in the bunch…
C (s ⊲ f)     — …does C hold for the node?
)   :   — If so,…
(x : W S P)→ C x      — … C always holds.
This thing computes recursively:
indW S P C c (s ⊲ f) = c s f {p → indW S P C c (f p)}
Now, an obvious desideratum is that the usual induction principles we expect for datatypes should also hold for their W-type encodings, as a consequence of the above general principle. They do, if you have a sufficiently extensional equality to hand. The good news (as witnessed in Observational Equality, Now!, by Thorsten Altenkirch, Wouter Swierstra, and me) is that we can have that much extensionality in a decidable setting. Let’s look at where the extensionality comes in.
Let’s try to show that our encoded Nat can be equipped with
indNat (C : Nat → Set)     — some conclusion we hope holds
(zc : C zero)      — does C hold for zero?
(sc : (n : Nat)→ C n → C (suc n))      —  does C respect suc?
:   — If so,…
(n : Nat)→ C n      — … C always holds.
Given zc and sc, let us try to construct
?c : (s : 2)→ (f : {…} 2) → …
OK, I’ve seen enough to know that the first move is case analysis on s — do we have zero or successor? Let me also expand the definition of what we know in each case:
make ?ctt : (f : 1 → Nat)→ ((x : 1)→ C (f x)) → C (tt ⊲ f)
from sc : (n : Nat)→ C n → C (tt ⊲ {_ → n})
make ?cff : (f : 0 → Nat)→ ((x : 0)→ C (f x)) → C (ff ⊲ f)
from zc : C (ff ⊲ Ψ)
I’ve done all the case analysis I can. It’s just a chase now.
In the successor case, I can take n = f (), make a proof of C n, and acquire a proof of C (tt ⊲ {_ → f ()}) but I need a proof of C (tt ⊲ f). Extensionally, f and {_ → f ()} agree at all canonical input values, as () is the only such value. Intensionally, is it mechanically obvious that these functions may be identified? You’ll find that Coq says no, but Agda says yes. That’s because Agda performs type-directed η-expansion for functions, transforming f to {x → f x}, and for unit, taking the latter to {_ → f ()}. Lucky for some!
In the zero case, I have a proof of C (ff ⊲ Ψ), but I need a proof of C (ff ⊲ f) for any old vacuous f. And here we hit trouble. How do I know Ψ and f coincide? Extensionally, that’s vacuously the case — both functions agree on all of the no possible inputs. Intensionally, we can apply η-expansion and ask if {x → Ψ x} is the same as {x → f x}, but even if we can identify all expressions in 0, we can’t make any further progress. Agda and Coq agree on the negative here. Later, I’ll explain why you should be grateful for this.
The problem gets worse for trees. In the node case, we have to show C (tt ⊲ f) for any function f : 2 → Tree, but we know C only for those f’s defined by case analysis: we have C (tt ⊲ {tt → f tt ; ff → f ff}). Extensionally, that’s just f; intensionally, it tends not to be.
The heart of the matter: too many implementations
Of course, the failure of my proof attempts should not convince you that yours are equally doomed. How can I convince you that you’re just as screwed? Let’s go back to the zero case of indNat. You know C zero, that is (ff ⊲ Ψ). Do you also know C (ff ⊲ {_ → zero})? Do you also know C (ff ⊲ {_ → suc zero})? There are infinitely many alternative implementations of ‘zero’, because there are lots of ways to write a function in 0 → Nat, even though there are only no ways to run them. Similarly, there are lots of ways to implement the same node of a tree.
In trying to derive induction principles for encoded datatypes from indW, we assume that our favourite implementations of the constructors preserve C, but we are trying to prove that every implementation of the constructors preserves C. If we view C as a property of an implementation of a function, we clearly don’t know enough. Indeed, if you have Martin-Löf’s intensional identity type, I, you’ll find that you can express favouritism in implementation. I(Nat, zero, ff ⊲ {_ → zero}) does not hold.
Observational Equality to the Rescue
I don’t want to go down the rabbit hole of explaining how observational equality works just now. I do want to say that we have a (heterogeneous) equality, ==, which is extensional for functions and substitutive in all contexts. Logically, this is what you get if you add an extensionality axiom to Coq or Agda, but we’ve cracked the problem that axioms gum up computation: axioms get stuck in the teeth of programs defined by pattern matching because they’re not canonical constructor forms. Our implementation of substitutivity
coerce (S : Set)(T : Set)(Q : (S:Set)==(T:Set)) : S -> T
inspects S and T, rather than Q. Edwin’s compiler erases coerce at run-time, so there’s no real despatching-on-types to worry about, (unless you’re into running erased open terms, of course, in which case you have plenty more to worry about, but that’s another (horror) story). Whenever S and T have compatible type constructors, they have the same canonical value constructors, so we can transport data from one to the other. The details are hairy but essentially tedious.
We disallow Martin-Löf’s intensional equality type, and any other method to form predicates which make non-observational distinctions on functions. Correspondingly, we are always in a position to explain how to transport data from C f to C g, whenever f and g take equal inputs to equal outputs. In particular, for the zero case of indNat, we can prove that (Ψ : 0 → Nat)==(f : 0 → Nat), and hence coerce maps C (ff ⊲ Ψ) → C (ff ⊲ f).
If you like, you can look at it this way: extensionality gives us case analysis principles for functions. Every function from 0 is Ψ; every function from 2 is {tt → t ; ff → f} for some t and f. We can thus show that a property holds for all implementations by showing that it holds for a covering of favoured implementations. We win.
Compubloodytation?
Just being able to show that induction principles for encoded types hold is not enough. For a drop-in replacement, we need to know that the computational properties of each induction principle remains the same. For natInd, we expect the following computational behaviour
natInd C zc sc zero = zc
natInd C zc sc (suc n) = sc n (natInd C zc sc n)
Do we get it? Close. We certainly get this far
natInd C zc sc zero =
coerce (C zero) (C zero) (resp {f → C (ff ⊲ f)} Ψ) zc
(where resp asserts that a function respects equality) and a similarly gummed up successor case. Can we do any better? Yes. Note that the above coerce mess only gets stuck there if zc is not a canonical term: if it were canonical, then (C zero) would be a canonical type, and coerce would fire away.
We can supplement evaluation with a second phase, tidying up computations which have got stuck provided we don’t unstick them in the process. (The separability of tidying-the-stuck is an idea which occurred to me as I walked home from work on my 34th birthday. It’s one of the best birthday presents I’ve ever had.) We may safely tidy
coerce X X Q x = x
in that phase (for stuck x). We get just enough oomph (1) to ensure that the signature for Martin-Löf’s identity type is inhabited and that its computation laws hold judgmentally (fewer things fail), and (2) to make the computation rules for W-encoded inductive types hold judgmentally. We win again.
Where is the bad?
Tidying-the-stuck only delivers the computation rules for our favourite implementations of the constructors. If we bump into data constructed by non-favourite means, we’re really relying on coercion to transport favourites-only constructions across the intensional gap. Non-favourite implementations of constructions are really easy to create with generic operations, such as this ‘doubling’ operation, which works for any W-type.
wubble (S : Set)(P : S -> Set) :=
indW S P {_ → W S P} {s f h → s ⊲ {p → s ⊲ {_ → h p}}}
: W S P → W S P
This function adds a layer of new output nodes between each layer of input nodes: the new nodes inherit their shape from their parent and have identical children, also computed with wubble. For Nat, this inserts a new suc after every old suc, doubling the number, but it also replaces zero by the rather weird zero-alike:
ff ⊲ {z → ff ⊲ {_ → wubble (Ψ z)}}
Now, that thing is provably equal to zero, but judgmentally distinct from it. Applying natInd to it leaves nasty coercions.
Similarly, wubbling a binary tree will introduce nodes with identical children, defined by nodes of form tt ⊲ {_ → x}, rather than our favourite tt ⊲ {tt → x ; ff → x}. Again, the tree induction principle will sometimes produce an ugly result for these unpopular kinds of node (whilst still yielding canonical output if there is any to be had). Promises of good computational behaviour hold only if data is implemented according to a specific pattern which is characterised more finely than by typing alone.
The too-many-implementations problem may not scupper the attempt to establish the induction principle, as was the case before we acquired an extensional propositional equality, but it still induces serious degradation of the judgmental equality. Infinitely many zeros, all different. But we’ve shimmed up saggy bits of the judgmental equality before. Can we do it again? Is this degradation inevitable?
You don’t need Equality Reflection to get Equality Collapse
Extensional variants of Martin-Löf’s Type Theory are delightful and disastrous for the same reason: the equality reflection rule, asserting that provable equations hold judgmentally. This has the notorious consequence of undecidable typechecking: to figure out whether an (s : S) can be used where a T is wanted, you must check whether S=T, and that now involves arbitrary proof search. Types cannot be checked; terms are not proof objects but computational realisers. Typing derivations can be checked, in mutual recursion with equality derivations. By contrast, Agda and Coq keep judgmental equality decidable, so terms really are proofs and need carry no witness to equations which can be checked by the machine.
But there is another, less notorious problem with equality reflection: equality collapse. Under a false hypothesis, all equations hold and every program is well-typed. As you cannot tell whether your current context contains a false hypothesis (what with the undecidability of propositions and all), it’s basically unsafe to execute open terms.
However, it’s quite easy to trigger equality collapse without going as far as equality reflection. If, for example, you merely decide that you want to identify all those implementations of zero, you’re in deep enough. Let’s see why. I can equip W-types with projections
shape (w : W S P) := indW S P {_ → S} {s f h → s} w  :  S
kids (w : W S P) := indW D P {x → P (shape x) → W S P} {s f h → f}
:  P (shape w) → W S P
Now consider two unpopular variants of zero
zero’ := ff ⊲ {_ → zero} : Nat zero” := ff ⊲ {_ → suc zero}
If we have, judgmentally, that zero’ = zero”, then in any (hypothetical) context where z : 0, we also have judgmentally that
shape (kids zero’ z) = shape (kids zero” z)
but having a hypothetical position allows me to project kids and expose the difference in implementation, with the above computing down to
shape zero = shape (suc zero)
also known as
ff = tt
If we can compute sets by case analysis on 2 (as we do in the encoding of Nat, for example), then we’ve just collapsed the judgmental equality.
Let me repeat that. By allowing the judgmental equality to identify these two closed extensionally equal values, we collapse the judgmental equality on terms under a false hypothesis. Bye bye typechecker.
Conclusion
W-types are a powerful conceptual tool, but they’re no basis for an implementation of recursive datatypes in decidable type theories. To encode a first-order structure by a function space is to throw away canonical choice of representation. When you care about what things are (because they show up in types), not just what they do (when you run them), that price isn’t worth paying.
Interesting stuff, thanks for taking the time to write it down. It’s a bit over my head, but I think I got most of the important bits.
There seems to be a punchline missing, though. The “Observational Equality, Now!” paper bases its Agda 2 implementation on a W-set constructor for trees. Is it the case (I’m only barely familiarizing myself with dependent types) that the very extensionality of W-types is what OTT tries to overcome by using observable equality? Or does this W-type non-canonicity issue have bad implications for OTT itself?
Sorry for slow response: I wonder if we can set this blog up so that pending comments get more attention.
But anyhow, the impact of this observation about W-types (infinitely many implementations of zero, all provably equal but definitionally distinct), is just that we need a slightly larger base set of types from which to work. Whatever the base set, we apply the same technique to compute the meaning of equations.
In Epigram 2, we don’t take W-types as the primitive source of inductive structure. Instead, we supply a universe of inductive types, giving first-order data a first-order representation. We’ve just written a paper about it: The Gentle Art of Levitation.
I don’t see how the last example is any worse than the issues that have already been dealt with in OTT. In a context with a hypothesis in 0 you will already be able to prove absurdities, but as long as these equalities over terms in the W-type are propositional you would still need to explicitly coerce by them, and if you are going to that trouble you can get up to worse mischief simply by induction on the hypothesis from 0.
Of course all _propositional_ equalities hold under an absurd hypothesis, but I think Conor’s point is rather that the _definitional_ (judgmental) equality also collapses if it identifies the different W-type implementations of zero. Definitional equality needs to make sense even on open terms, as otherwise computation on them can “go wrong” without a false hypothesis to blame.