Crude but Effective Stratification

James McKinna and I defined Epigram as a programming notation for the predicative fragment of Luo’s UTT, but I always dodged implementing the stratified hierarchy of universes, hacking in the (fun but) inconsistent Set : Set as a holding pattern. It’s not that I don’t care about consistency. It’s more that I’m unhappy with the pragmatics of many of the treatments I’ve seen and wanted to think a bit more about the problem. Now I think I have a little more to say.

The usual starting point is to consider a tower of ever-larger universes where each inhabits the next.

Set0 : Set1 : Set2 : …

We expect there to be some way ^ : Setn -> Setn+1 of embedding all the types from each layer up into the next. Note that ^Set0 is not Set1, but rather the thing that Set0 is called in Set2. Some like to make ^ silent; some do not.

We expect that the native function spaces satisfy at least this uniform rule, which effectively says that every level has a function space.

  S : Setn   x:S |- T : Setn
————————————
  (x : S) → T : Setn

Note that, mixing levels is quite common. The type of the (a?) polymorphic identity function is, annoyingly, (X : Set0) → ^X → ^X : Set1

Apart from these essentials, there are many variations. Agda gets around the annoying type for polymorphic identity by saying that a function type inhabits the smallest universe large enough for its domain and codomain.

  S : Setm   x:S |- T : Setn
————————————–
  (x : S) → T : Setm∨n

Note that we get Set0→Set1 : Set2 but not Set0→Set0 : Set2 by this rule, which makes some sense from a type synthesis perspective.

Others introduce a preorder ≤ on types, cumulativity, closed under definitional equality and Setn ≤ Setn+1. Luo also adds a rule for function types which is covariant in the codomain but demands equality in the domain. We then extend the conversion rule to a kind of subsumption

  s : S   S ≤ T
————————
  s : T

which certainly justifies the adoption of a silent ^ operation, and makes the following admissible, even with the uniform rule for the function space:

  S : Setm   x:S |- T : Setn
————————————– m,n ≤ p
  (x : S) → T : Setp

Coq is cumulative, but Agda is not. If you want ^ in Agda, you need to define it: the ‘maxy’ function space rule does not eliminate all use cases, e.g. if you declare Nat : Set0 you don’t have Nat : Set1.

Where Agda does help is to allow universe polymorphism, allowing us to declare constructions which work uniformly at multiple levels. You get to abstract over universe levels and write expressions with level variables, zero, successor and max. You can leave levels implicit: there’s a constraint-solver. The lack of cumulativity helps make constraint-solving tractable. However, the lack of cumulativity also means that you have to use maxy rather than uniform presentations when defining composite types, because the component types won’t move to the level of the compound, just as with the function space. The library is full of such complexities, which I consider regrettable.

Coq supports typical ambiguity: you can drop the level annotations on universe constants and the checker will try to ensure that they can be inferred without breaking constraints. Coq does not have universe polymorphism, but sometimes does something cleverly generative instead. Adam Chlipala knows the ins and outs.

What chance of cumulativity and universe polymorphism coexisting? We may need to put up with some slightly more explicit notation if we want to avoid underconstrained typing problems, but perhaps that’s ok. We might also get some help from adopting a bidirectional style, so here goes…

You can read ∋ as ‘accepts’ and ∈ as ‘has type’. As is my wont, I separate the syntactic classes to which each applies, with an explicit - marking where terms of inferrable type embed in type-checkable terms. The ⋅ symbol is head-normalising application, ensuring that we always compute types enough to see if they are canonical. I write the rules Prolog-style, for brevity, and I thread the context silently, remarking only to extend or access it.

Setj ∋ Seti   if   j > i
Setj ∋ Pi S T   if   Setj ∋ S   and   Pi S (\ _ -> Setj)   ∋ T
Pi S T ∋ \ x -> t   if   x:S |- T⋅x ∋ t
T ∋ e   if   e ∈ S   and   S ≤ T

x ∈ S   if   x:S is in the context
f s ∈ T⋅s   if   f ∈ Pi S T   and   S ∋ s

The pattern is that introduction forms are made sense of with respect to known type, while variables (whose types come from the context) and their eliminations (as admitted by their type) have inferrable types. As such, we don’t have to give a unique type to Set3 or a rule for calculating the universe a Pi lives in from those of its pieces. We start from the universe level and ask if whatever-it-is is small enough to fit.

Bidirectional systems push definitional equality (as opposed to mere head-normalisation) to the change-of-direction rule, where a type inferred meets a type demanded. Here, rather than equality, I invoke the subtyping relation, ≤. I had better tell you what it is.

S ≤ T   if   S = T       includes definitional equality
Seti ≤ Setj   if   i ≤ j       cumulativity
Pi S T ≤ Pi S’ T’   if   S’ ≤ S   and   x:S’ |- T⋅x ≤ T’⋅x       contra/co

I could tell you what the definitional equality is (it’s βη), but that’s not hugely important right now.

Note that if people are kind enough to index their Set-levels, this system requires only inequality checking. That’s no surprise. The trouble starts when you let variables stand for levels and start trying to infer them from loose constraints. I’m just not going to do that. Tough.

But I am going to do something else. By construction, all judgments are invariant under uniform increment of levels: take a derivation and add n to all the superscripts (in context, terms, types, everywhere) — it still holds. What’s important is universe displacement. Let’s define an operation -n which increments indices by n, acting structurally, except that Setm n = Setm+n.

We have Γn |- Sn ∋ sn if Γ |- S ∋ s, and so forth.

Let’s exploit that. What we can provide is an environment of global top-level definitions which you can invoke with a displacement.

[] ok
Δ; f=s:S ok   if   Δ ok   and   f ∉ Δ   and   Δ |- Seti ∋ S   and   Δ |- S ∋ s

with the invocation and evaluation rules

fn ∈ Sn   if   f=s:S in Δ
fn = sn   if   f=s:S in Δ

and not forgetting fm n = fm+n.

What’s the deal, then? You build things starting at level 0 and going up to whatever constant level you need, but then you can lift the whole thing up any number of other levels. Let us allow the suppression of trivial lifting, -0. E.g. define

id = \ X -> \ x -> x : Pi Set \ X -> Pi X \ x -> X

and you get that id is the polymorphic identity function for level 0, id1 for level 1, and so on. You don’t need to repeat constructions to get ‘big’ versions of things, but you have to be explicit about how big you want them. Cumulativity is silent. Polymorphism amounts to abstracting level 0 in top-level definitions, with generalisation for free but explicit specialisation. It’s not fancy. It’s not complicated. I guess we’ll find out in time whether Agda’s extra freedom to abstract over more level variables than just the base (allowing you to stretch constructions upward internally as well as lifting them en bloc) buys you anything essential.

Loose ends, there are plenty. I had better do the metatheory for this lot. It’s reassuringly predicative, so a suitable model construction should just iterate the ‘next-level’ construction. I need to check that subsumption preserves typability and that evaluation preserves type.

There are potential sources of universe variable, and hence constraint solving. If you look at that rule for adding a definition, you’ll see that the type must make sense ‘at some level’. Does that mean speculating an unknown x and checking at Setx? Not really. Let’s try to add judgments for being a set (or family) at some level and see what happens.

SET Seti
SET Pi S T   if   SET S   and   (S)SET T
(S)SET \ x -> T   if   x:S |- SET T
(S)SET E   if   E ∈ Pi S’ T   and   S ≤ S’   and   x:S |- T⋅x = Seti

It’s reassuringly syntax-directed. The last condition in the last rule can be checked by head-normalising T⋅x and matching for Seti without needing i in advance, so we can avoid represent unknown levels just to check that something is a set at some level. You get the same effect by allowing Setω and S → Setω as types you are allowed left of ∋ but nowhere else.

The other place where universe variables might appear is once we start trying to do a bit of argument synthesis. How do we solve constraints like ?X ≤ Set1 in a minimal commitment way. We’re sure that ?X is Seti for some i ≤ 1, but is it worth introducing level variables just to make that step? Here, I think we must wait to see what bites us.

What else? Well, I’ve only told a story for yer basic Pi types. I need to roll it out across the rest of the type theory, but the basic ingredients will still be displacement-invariance of judgments (to allow lifting of definitions) and structural subtyping with appropriate respect for variance. I’m resistant to universe variables and constraint-solving, but I accept that they may prove necessary. I hope you agree that my proposal is simple enough to grasp and flexible enough to avoid at least some of the repetition that lack of polymorphism usually forces upon us. I suggest that its power-to-weight ratio is large enough that it’s worth a try.

2 Responses to “Crude but Effective Stratification”

  1. Mathnerd314 says:

    Have you looked at implementing NFU?

  2. Conor says:

    No, but it has occurred to me that it might be interesting to think in that direction, not least because it would be an excellent excuse to visit my old teacher, Tom Forster. There’s nothing about these rules which suggests that there has to be a bottom universe. Things might shift downwards as easily as they shift up.

    That said, the usual inductive-recursive method for making models of predicative type theories might struggle without an inductive characterization of the hierarchy. I wonder what we’d do instead…

Leave a Reply