Insertion Sort

I’ve just managed to beat insertion sort into the system. It grumbles a bit, but it works. This is not Epigram 2 source code: it’s a Cochon script. Cochon is a low-level command-line interface to the Epigram 2 proof-state. Basically, you’re zippering around inside an unfinished hierarchical development, locally tinkering with this and that, incrementally solving the missing bits. But some Cochon commands are beginning to resemble a programming language.

Yer natural numbers…

data Nat := (’zero : Nat) ; (’suc : (n : Nat) -> Nat);

Note that these tags, ‘zero and ’suc, are distinguished from ordinary identifiers by their initial quote. (I’m not convinced that the quote is the best way to make the distinction, but I am convinced that the distinction is important.) Tags are unscoped lumps of syntax whose role is to be distinguishable. Unlike variables, tags do not have any intrinsic type. Tags are meaningful only when a type is being pushed in: the type specifies which tags are acceptable and how they are used. This is bidirectional typechecking in yer face.

Next up, I define “less or equal” as a recursive predicate. We have a proof search mechanism which tries to prove propositions automatically, by obvious structural methods TT as a goal, FF as an hypothesis, or appeal to assumption. The choice of a recursive definition here helps to extract work from that machinery. The “root” command at the end just takes the Cochon cursor back to the top level, ready for the next definition.

let Le (x : Nat)(y : Nat) : Prop;
refine Le x y <= Nat.Ind x ;
refine   Le 'zero y = TT;
refine   Le ('suc x) y <= Nat.Ind y;
refine     Le ('suc x) 'zero = FF;
refine     Le ('suc x) ('suc y) = Le x y;

Given two numbers, they are ordered “one way or the other”. Let’s express that as an eliminator, explaining how to split a problem P into two subproblems, covering the possibilities by adding an extra assumption in each case. Note that propositions are not types. Rather, the :- operator maps a proposition to the type of its proofs. The indirection helps us spot when a type is a type of proofs. The typechecker considers all proofs of a given proposition to be equal, which is how come it’s ok to let machines just go fetch: one proof is as good as another, a boon which does not apply for types like Bool.

let owotoNat (x : Nat)(y : Nat)(P : Set)
      (ple : (:- Le x y) -> P)
      (pge : (:- Le y x) -> P)
      : P ;
refine owotoNat x y P ple pge <= Nat.Ind x;
refine   owotoNat 'zero y P ple pge = ple _;
refine   owotoNat ('suc x) y P ple pge <= Nat.Ind y;
refine     owotoNat ('suc x) 'zero P ple pge = pge _;
refine     owotoNat ('suc x) ('suc y) P ple pge <= owotoNat x y;
refine       owotoNat ('suc x) ('suc y) P ple pge = ple _;
refine       owotoNat ('suc x) ('suc y) P ple pge = pge _;

As in Epigram 1, we can program with anything eliminator-like. The appearance of pattern matching is an illusion. We really run those higher-order functions marked with <= (pronounced "by"), generating their arguments from the solutions to the subproblems.

In the recursive case, we find out which way x and y are ordered. In honest Epigram 2 source code, it might be nice to document the appearance of the extra assumptions, as the program looks kind of confusing without them. Of course, such code is developed interactively: the machine will tell you what extra assumptions you have -- no need to type extra text. The above script relies on the subproblems being solved in the order specified by the type of the eliminator, so at least what's going on is documented somewhere.

Now, I show how to bolt bottom and top elements to a type, and extend a given ordering.

data BT (X : Set) := (’bot : BT X) ; (’el : X -> BT X) ; (’top : BT X) ;

let BTLe (X : Set)(LE : X -> X -> Prop)(x : BT X)(y : BT X) : Prop ;
refine BTLe X LE x y <= BT.Ind X x ;
refine   BTLe X LE 'bot y = TT ;
refine   BTLe X LE ('el x) y <= BT.Ind X y ;
refine     BTLe X LE ('el x) 'bot = FF ;
refine     BTLe X LE ('el x) ('el y) = LE x y ;
refine     BTLe X LE ('el x) 'top = TT ;
refine   BTLe X LE 'top y <= BT.Ind X y ;
refine     BTLe X LE 'top 'bot = FF ;
refine     BTLe X LE 'top ('el y) = FF ;
refine     BTLe X LE 'top 'top = TT ;

I’ll extend Nat in this way, to use in specifying loose intervals. I specify what it means for a Nat to be bounded by such an interval.

let Bounds : Set
refine Bounds = Sig (l : BT Nat ; u : BT Nat ; ) ;

let Bdd (x : Nat)(lu : Bounds) : Prop ;
refine Bdd x [l u] = BTLe Nat Le l (’el x) && BTLe Nat Le (’el x) u ;

Ok, now it gets a bit more manual than I’d like. Peter’s idata tactic generates traditional “Catholic” families with indices constrained by equations (relying on the miracle of transubstantiation). These ordered lists in an open interval are more naturally “Presbyterian”. It’s kind of like Haskell’s GADT-style versus traditional-style distinction. To get the definition I want, I have descend to the level of coding datatype descriptions by hand. At least I have that option.

let OListD (lu : Bounds) : IDesc Bounds ;
refine OListD [l u] = ‘fsigmaD ['nil 'cons] [? ?];
  give ‘constD (:- BTLe Nat Le l u) ;
  give ’sigmaD Nat \ x -> ‘prodD ‘xs (’varD [('el x) u]) (’constD (:- BTLe Nat Le l (’el x))) ;

let OList (lu : Bounds) : Set ;
refine OList [l u] = IMu Bounds OListD [l u] ;

What does that mean? In a made up informal notation, it’s this (note that [l u] means a somewhat lispy right-nested tuple of bounds, not a singleton list of l applied to u; all this notation is under review).

OList [l u] = ‘nil {:- l ≤ u} | ‘cons (x : Nat)(xs : OList [x u]){:- l ≤ x}

The elaborator likes nodes of datatypes to be right-nested tuples with a proposition in the tail. Proofs of these propositions are handled implicitly, being unpacked into the context when we decompose data, and searched for when we build data. For ordered lists, the ‘nil case demands that the bounds are in order; the ‘cons case requires that the head is above the lower bound — the head then becomes the lower bound for the tail. In effect, the constraint on ‘nil will ensure that the last element (if there is one) is below the upper bound.

Now, it’s loose intervals for the win! If I have a list in an interval and a new element in the same interval, then I can insert the element, preserving the interval. This is a lot less fiddly than working with exact bounds, where inserting an extreme element can change them. The exact bounds versus loose bounds design issue is a serious Jagger/Richards morality tale: nailing the invariant is not automatically good design; there’s more to the ergonomics of such definitions. Translation: even if you’re a serious Haskell-fiend, don’t expect your current skill-set to be all you need. I’m sure I’ll get around to telling this story at greater length, another time.

But I digress. Here’s insertion. I’ve used iinduction, the generic induction principle for indexed datatypes, only because my datatype wasn’t built by Peter’s tactic. The end result is the same: structural recursion on ordered lists.

The good bits, though, are the proofs. If we had implicit syntax in decent shape, I’d be able to hide the (lu : Bounds) and the (p : :- Bdd a lu) completely, within the program text. Meanwhile, the elaborator jolly well is hiding all the proofs unpacked from and packed into the actual ordered lists. And lo, barketh not the dog!

let insert (lu : Bounds)(a : Nat)(as : OList lu)(p : :- Bdd a lu) : OList lu ;
refine insert [l u] a as _ <= iinduction Bounds OListD [l u] as ;
refine   insert [l u] a 'nil _ = 'cons a 'nil ;
refine   insert [l u] a ('cons b bs) _ <= owotoNat a b ;
refine     insert [l u] a ('cons b bs) _ = 'cons a ('cons b bs) ;
refine     insert [l u] a ('cons b bs) _ = 'cons b (insert [('el b) u] a bs _) ;

What is it that’s hidden here? In the base case, the fact that a’s in [l u] allows the ‘cons because l ≤ a and the ‘nil because a ≤ u. In the step, we start off knowing that a’s in [l u], l ≤ b, bs is in [('el b) u]. We learn the ordering of a and b. If a ≤ b, we know that (’cons b bs) can have tighter bounds [('el a) u]. If b ≤ a, we know we can insert a into bs and remain between b and u.

That’s the hard work done. Now the last hurrah. Here are yer ordinary lists (reusing those ‘nil and ‘cons tags, natch).

data List (X : Set) := (’nil : List X) ; (’cons : X -> List X -> List X) ;

And here’s yer actual insertion sort, building up an ordered list on ['bot 'top] by repeated insertion.

let isort (xs : List Nat) : OList ['bot 'top] ;
refine isort xs <= List.Ind Nat xs ;
refine   isort 'nil = 'nil ;
refine   isort ('cons a as) = insert ['bot 'top] a (isort as) _ ;

I even ran it. The “elab” command elaborates and then evaluates a term.

elab isort (’cons (’suc ‘zero) (’cons ‘zero (’cons (’suc (’suc (’suc ‘zero))) (’cons (’suc (’suc ‘zero)) ‘nil)))) ;

It worked, yielding.

‘cons ‘zero (’cons (’suc ‘zero) (’cons (’suc (’suc ‘zero)) (’cons (’suc (’suc (’suc ‘zero))) ‘nil)))

We’ve got a long long way to go, but we’re getting there.

3 Responses to “Insertion Sort”

  1. PROOF WANTER says:

    Please Sir, can I have some more epigram code?

  2. Jonas says:

    I think there’s a typo in the “made up informal notation”: where it says x ≤ u, it should say l ≤ x.

    Also, wouldn’t it be enough to just extend the naturals with ‘bot and just always let the upper bound in OList be infinity?

  3. Conor says:

    Well spotted. I’ve fixed the informal notation to reflect the actual code.

    Meanwhile, I chose to add ‘bot and ‘top, and to bound the lists at both ends, because these ordered lists are but one instance of a general scheme for constructing ordered data structures. It’s certainly possible to define ordered cons-lists with only a lower bound, but binary search trees require both bounds.

    The grammar F ::= 1 | F+F | F[]F | rec determines a class of functors whose fixpoints are readily orderable, given some notion of element X with order < =

    [[_]] : F -> (X -> X -> Set) -> (X -> X -> Set)
    [[ 1 ]] R l u = :- l <= u
    [[ F + F' ]] R l u = [[ F ]] R l u + [[ F' ]] R l u
    [[ F [] F’ ]] R l u = (x : X) * [[ F ]] R l x * [[ F' ]] R x u
    [[ Rec ]] R l u = R l u

    The idea is that everywhere there’s a pair, there’s a pivot. Ordered lists are given by the code 1 + (1 [] Rec). Binary search trees are given by the code 1 + (Rec [] Rec).

Leave a Reply