Design: Lay(ab)out

Where functional programs from my Dad’s thesis to ML and Haskell are given by sequences of equations and progress is made by matching patterns to determine which equation applies, Epigram is rather more explicit about the tree structure of problem refinement. (We may well get around to allowing the flattening of tree regions which just do ordinary case analysis, but the tree structure will not entirely disappear.) Correspondingly, a document is, morally, a block of components, where each component consists of one header line followed by a (possibly empty) subordinate block. It seems quite natural to allow (but not demand) that subordination be indicated by indentation.

Would that the world were wide, then layout would be easy. If the head line of a component is one physical line indented by n spaces, it should capture for its block either (i) the following {la; la; la} delimited by braces and semicolons, or (ii) all the contiguously subsequent lines indented by more than n spaces (blank lines are indented to ω). Let me show you what I mean, writing . for space and indicating subordination by longnames:

A
….A.A
……A.A.A
……A.A.B
..A.B
….A.B.A
….A.B.B
……A.B.B.A
…A.B.C
..A.C
B

Note that subordinate header lines can slide a little to the left as we go down: they need to be indented enough to be subordinate to the parent, but not so much that they become subordinate to an elder sibling. When header lines all occupy one physical line, the subordination is reassuringly geometric: you just trace upward from the left edge of the line’s text until you hit the interior of another line, and there’s your guv’nor.

The problem, of course, is that the world is not so wide. I’m a bit of an 80-column fogey, myself. If a header line is rather long, it’s nice to split it up into several physical lines, indenting all but the first. But that raises the key question of layout: how do you tell the difference between indented continuations of the header and the beginning of the block?

A Haskelly sort of solution is to designate layout keywords (e.g., where) to signal the end of a header line when a block is expected. Haskell acts as if there’s a vertical line feed (with no carriage return) just after a layout keyword, allowing the rest of that line to be subordinate. (Haskell also forbids the leftward slide I described above: perhaps the argument is that leftward slide is not so neat, and more likely to be a genuine mistake.) I don’t like indentation being controlled by anything other than indentation, so I tend to write where, etc, at the end of a line, so that the new indentation level is set entirely by spaces. That way, alpha-conversion can’t change the semantics of my program by altering the length of a line with stuff after a layout keyword.

In any case, the Haskell way won’t quite cut it for Epigram, if we want to make sense of stuff like

let   x, y : Nat
  ———————
    x + y : Nat

  x + y <= induction x
    zero + y = y
    suc x + y = suc (x + y)

There’s no obvious keyword to trigger the indentation. Except that, somehow, once you start refining, you’ve had it with your header.

So here’s my alternative plan. A fixed set of symbols, being the construction keywords (currently data, let, lemma), the rule (—, at least three minuses) are deemed to be logical lines unto themselves, and <= (‘by’) will be deemed to occur only at the start of a logical line. If they occur within a physical line, we treat them as if a new logical line has started, with indentation ω. So the above really means (using dots again)

let
..x, y : Nat
..———————
….x + y : Nat
..
..x + y
….<= induction x
....zero + y = y
....suc x + y = suc (x + y)

Each explanation has a problem, then a nonempty bunch of refinements, then a possibly empty bunch of sub-explanations, so it’s ok to keep refinements and sub-explanations at the same level of subordination.

Nearly done. A whole document is a block at indentation level ω. All forms of bracket contain whole documents. A semicolon terminates an old line and starts a new one at the same notional level of indentation.

So

  • non-indenting always means end of line
  • indenting means more of the same header line…
  • …until you bump into a designated-beginning-of-line token, so that’s the first token on the first line of the subordinate block.
  • construction keywords and the rule are lines to themselves and take a subordinate block
  • <= just starts a line
  • semicolon just ends a line, with no subordinate block

At no point is the width of text relevant to indentation, only the width of leading spaces.

Just one gotcha. You can’t write

lhs <= big long
  thing
  lhs'

because ‘thing’ is taken for a new line in the block introduced by <=. You'd better write

lhs
  <= big long
    thing
  lhs'

instead.

Leave a Reply