Design: Framing Source Code

What should Epigram programs look like, as documents? When it comes to designing a source language of expressions and definitions, what choices must we make? Sooner or later, we need to choose a provisional syntax and get on with elaborating it.

This post is a collection of thoughts and worries, recording the state of my contemplation.

Everything happens in the document

Your source-level programs and proofs are in a document which you’re currently editing. You’re using your favourite editor, not something I’ve chosen for you then banjaxed specially. The Epigram system is your annoying coauthor. It’s a straightforward file transducer, whose primary mode is source-in-source-out. Error messages, information requested and program refinements emerge as annotations and new code in the document. The output document is generated by reporting the status at the end of elaborating the input document. (A useful secondary mode is to keep the system live and maintain an elaboration state in sync with the document: it is then sufficient to trade patches.) The transducer must be idempotent: if fed its own output, it should have nothing further to add. An easy way to achieve this would be to ensure that advisory elaborator responses happen in unelaborated places (comments, perhaps), but we should give some thought to ensuring that elaborator responses can be rechecked.

Literate Programming

The glory of lhs2TeX has taught us that it is important to embed programs and program fragments in other documents, especially LaTeX files, allowing the very code that is rendered to be checked. It’s not clear how much to worry about this in the first instance, except to ensure that it’s easy to yank the stuff for elaboration out of a document and paste it back in afterwards. I’d like to make it easy to check expressions embedded in running text. It should also be possible to mark a definition as something to be elaborated but not brought into scope: we should be able to discuss candidate definitions for something and check that those candidates make sense without necessarily committing to them.

Scoped Comments

Of course, we have our own notion of non-code content, being text in comments. I’d it to be clear that every comment sits in a scope, so that escaped expressions (probably) written |expression| can be checked. Moreover, in such a comment, it should be possible to (i) hypothesize new free variables, (ii) define parameters with test data, (iii) assert equations which are checked by the elaborator. How are comments delimited, anyway? As | is now special, I’m minded to make {| .. |} the comment brackets, with || signalling end-of-line comments.

So, we might have comments like

{|I wonder what |2 + 2| is.|}

or like this

{|Just checking that |2 + 2 = 4|.|}

Consequence: if | is to delimit expressions in comments (and literate source, perhaps, we must ensure that | occurs only in regions of expressions that are themselves delimited at both ends (e.g., inside brackets of some sort).

Three Kinds of Unknown

Standing in places where an expression can go, we have three ways a programmer can leave the expression unknown. A shed (perhaps [|..|]) is both a placeholder and a workspace which remains unelaborated. We’ll need a way to ask the elaborator for helpful type or context information inside a shed. It’s possible that we might have [|unelaborated expression | help requests/information |]. A shed effectively asserts ‘I’m deciding what goes here, but later.’. The elaborator should respond to an erroneous term by erecting a shed around it, with suitable help information (a.k.a. abuse).

We might also consider (|..|..|) as a notation for expressions which are intended for elaboration, but about which we none the less seek information. To give a shed, turn the outer [|..|] into (|..|), then if happy, rub out the scaffolding.

We should also consider the possbility of asking the elaborator to fill stuff in for us. There should be no choice (up to definitional equality) about what that stuff is. I can see two modes being useful. We should be able to write _ for a boring unknown that we’d like the elaborator to compute and suppress: a right-hand-side ‘don’t care’ as in Agda. We should be able to write ? for an interesting unknown that we’d like the elaborator to compute and show us. For ?, we need to prettyprint whatever term the elaborator fills in. If there are any unknown components (the whole thing, perhaps), they should be rendered post-elaboration as empty sheds. We might use ? in scoped comments, or in the information section of a shed to ask for helpful data. e.g., values, types, context.

So we could write

{|Just checking that |2 + 2 = ?|.|}

and expect the elaborator to replace ? by 4. We don’t have a read-eval-print-loop. Everything’s in the document.

Constructions

An Epigram source file is a bunch of constructions. Elaborating a construction gives rise to a definition in the proof state. Each construction has a declaration and an explanation. The declaration is introduced by a keyword and explains what is being constructed. So far, it looks like we have three kinds of construction: datatypes (keyword ‘data’, functions (keyword ‘let’), and proofs (keyword ‘lemma’).

data Nat : Set

let (x : Nat) + (y : Nat) : Nat

lemma x, y, z : Nat — (x + y) + z == x + (y + z)

That — is the rule-mark of natural deduction: you can make it longer and give it its own line if you’re nostalgic for Epigram 1 (and you could write the short version in Epigram 1, anyway). Other construction kinds (records? postulates? who knows?) may yet emerge.

The explanation then records the dialogue which shows how the construction is (being) done. An explanation consists of a problem, then a refinement, then a collection of zero or more sub-explanations, addressing sub-problems arising from the refinement. One form of refinement is the where-clause, declaring local constructions, whose explanations then become sub-problems.

A datatype problem is an instance of a datatype family. The refinement might be a grammar. (This is an example, not an exhaustive treatment.)

data Nat : Set
Nat ::= zero | suc (n : Nat)

A programming problem is a pattern-matching left-hand-side. The refinement might do an elimination (<=), leaving subproblems which might then be solvable directly (=).

let (x : Nat) + (y : Nat) : Nat
x + y <= induction x
zero + y = y
suc x + y = suc (x + y)

A proof problem is a proposition. A refinement could well be an elimination, just as in programming, or a ‘because’ (perhaps -:).

lemma x, y, z : Nat — (x + y) + z == x + (y + z)
(x + y) + z == x + (y + z) <= induction x
(zero + y) + z == zero + (y + z) -:
(suc x + y) + z == suc x + (y + z)
-: (x + y) + z == x + (y + z)

(In fact, I’d expect both of those subgoals to be knocked off automatically. Now is not the time for a digression on the design of declarative proof languages. I’m just trying to give an impression of a source language for proofs where one expresses proof strategies and readily verifiable consequences, rather than proof terms or commands to modify a proof state.)

If elaboration finds subproblems unaccounted for, it inserts them. Note that the elaborator must be able both to generate and to recognize problems.

Despite the fact that elaboration must (to what extent?) respect dependency order, I’m keen to avoid imposing any particular textual order on the constructions in a file. For that matter, I’d also like to permit the separation of a declaration from its explanation. A simple topological sort should be sufficient to yield a suitable elaboration order, but that is the moral ‘should’, requiring some effort on our part to make it so. The key question is how to detect usage: naming has a role to play. We need to avoid silent usage — dependency without explicit mention of name. We need to avoid false positives — mistaking a locally binding occurrence of a name for a more global definition. These issues can be rather subtle, especially given that it is not always obvious what is a binding occurrence in a problem statement.

Proofs and Names

Proof-irrelevance makes it rather dull to name proofs. Of course, it still makes sense to name propositions. We could permit the syntax

lemma x, y, z : Nat — PlusAssoc = (x + y) + z == x + (y + z)

to define the proposition PlusAssoc at the same time as introducing its proof. That also gives us a hook to hang a topological sort on. Clearly, a proof must be elaborated after every definition it mentions. Moreover, we should adopt the convention that proofs are elaborated just after the definitions it depends on, so that they’re available as ‘knowledge’ as soon as possible. In particular, the theory associated with a function is automatically available when constructing further functions depending on it. You get the theory of addition when you’re working on multiplication, etc. We might want to investigate lighter ways than naming to induce the dependency order for lemmas within a theory.

Spotting Binding Occurrences

Haskell has an easy time making the binding occurrence of a variable detectable. In a left-hand-side, it’s easy to tell the defined symbol from a Constructor from a pattern variable. It’s not so easy for us, because it makes sense to write defined symbols in problems. Proof problems can mention defined things like +. Programming problems can mention defined symbols as a result of dependent case analysis or just some good old hand-rolled eliminator. I propose the following brutal policy: names defined in a module are always uses of that definition. So, you can mess up a binding occurrence of a variable by defining it elsewhere in the module. Oh well.

Not that I want to get into module system design just now, but that would suggest using longnames by default for stuff in imported modules, with a manual override allowing imports to name the things which should be usable by short names. That would serve to ensure that every defined thing unavailable as a pattern variable is made so explicitly. Of course, you could always ask the elaborator to fill out a default shortname-import list, based on what the module exports. (How do we manage exports?)

Bonus? It should be possible to get the elaborator to perform alpha-conversion, yea verily unto expressions embedded within scoped comments.

Decapitation

Where a construction is named, it should be possible to separate its explanation from its declaration. We should be able to write a declaration without its explanation. If we don’t give an explanation elsewhere, the elaborator will fill in the opening problem for us at that point in the file. However, we should be free to move the explanation elsewhere, reintroducing it in the relevant construction style, but with only a name where the declaration would go.

keyword name
explanation

It’s not unusual to state a theorem, then discuss informally how the argument goes, stating relevant lemmas in the process, before proving the theorem from the lemmas and stuffing the proofs of the lemmas into an appendix (traditionally called ‘Exercises’).

Plenty of Details for the Devil

I haven’t said terribly much about how to write the declaration of a function, or about the syntax of expressions, or which refinements are possible. That’s for another time. But I hope I’ve sketched out the shape of Epigram documents in a way that doesn’t restrict those design choices too much. I’ve only used weird brackets so far! I haven’t said anything about layout yet, but my examples somehow presume there’ll be some: key question, how do you know when refinements stop and subproblems start? I haven’t really explored rule-style declarations or implicit quantification.

Devilish details are also interesting to study in the context of elaboration. It should always be possible to turn a shed into an auxiliary local construction by invoking the construction and asking the elaborator to generate its declaration (with suitable extra information as required). In effect, it should be possible to elaborate a construction sketch and generate a separable cluster of constructions required to complete that sketch. Formalism should not come at the cost of motivation.

This entry was posted on Monday, August 1st, 2011 at 7:18 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

2 Responses to “Design: Framing Source Code”

pozorvlak says:

02/08/2011 at 11:42 pm

Interesting stuff. Here are my comments, for whatever they’re worth:

You’re using your favourite editor, not something I’ve chosen for you then banjaxed specially… It’s a straightforward file transducer, whose primary mode is source-in-source-out… We don’t have a read-eval-print-loop.

So your basic cycle is “save, run transducer, reload file”, perhaps with changes highlighted by your editor in some way? I think a scratch evaluator would still be useful (I found the normaliser very useful in your Agda course), but maybe not – in the same way that a saved, re-runnable test suite is better than evaluating expressions at a REPL.

A useful secondary mode is to keep the system live and maintain an elaboration state in sync with the document: it is then sufficient to trade patches.

Are you thinking of just re-running the transducer from scratch on the patched document, or of maintaining persistent structures and altering them as needed? The latter would have obvious advantages for programming in the large, but strikes me as hard to get right – I know the Factor team had to do a lot of work to get this to work properly.

The glory of lhs2TeX has taught us that it is important to embed programs and program fragments in other documents, especially LaTeX files, allowing the very code that is rendered to be checked.

I think K&R got there first :-)

It should also be possible to mark a definition as something to be elaborated but not brought into scope: we should be able to discuss candidate definitions for something and check that those candidates make sense without necessarily committing to them.

See also: usage examples embedded in documentation. There’s even a Python testing library based around this idea.

every comment sits in a scope, so that escaped expressions (probably) written |expression| can be checked.

Whoa. Cool.

with || signalling end-of-line comments.

You just freaked out every C programmer on the planet :-)

So far, it looks like we have three kinds of construction: datatypes (keyword ‘data’, functions (keyword ‘let’), and proofs (keyword ‘lemma’).

I like this. Simple and clean.

I’m just trying to give an impression of a source language for proofs where one expresses proof strategies and readily verifiable consequences

Looks good to me. I guess you’d need more complicated examples to nail the syntax down, though.

A simple topological sort should be sufficient to yield a suitable elaboration order

This is an implementation detail (one could also imagine, say, bottom-up elaboration in parallel): the hard part is detecting the dependencies (or rather, making them unambiguously detectable). I like your brutal solution, precisely because it’s brutal: simple, memorable rules are generally easier to work with than subtle heuristics that work 99.9% of the time but go wrong in mind-bending edge cases.

Not that I want to get into module system design just now

Don’t put it off for too long: the more I program, the more convinced I become that providing a good-enough module system design is among the most important tasks of a language designer (he says, with precisely zero language designs under his belt). Leave it too late and you end up with #include. You want to provide enough of a module system that programmers feel no need to roll their own, otherwise your community will either fracture or never develop a culture of code-sharing. IMHO, you want to have a Comprehensive Epigram Archive Network up and running as soon as possible (could you piggyback one on Hackage? If not, let me offer my services as a hosting provider).

Plus, I think there’s probably scope to do really interesting things with modules in dependently-typed languages. OCaml’s functors become simple functions from types to tuples of types, but is it possible to write pragmas in user code?

that would suggest using longnames by default for stuff in imported modules, with a manual override allowing imports to name the things which should be usable by short names.

This is considered best practice in the Perl community: it seems to work pretty well. Apart from anything else, it allows you to find where a name comes from by simply searching the current file with your editor.

How do we manage exports?

I’m not sure I understand the problem. Perl modules are expected to provide two arrays, @EXPORT and @EXPORT_OK: the former is all the symbols that are exported by default (best practice is for this to be empty), and the latter is all the symbols that can be exported if requested by client code (this mechanism is actually implemented in userspace code: see here). I guess this mechanism doesn’t make so much sense for a statically-typed language, but the general idea of having two lists should work.

I haven’t said anything about layout yet

Again, I’d suggest – no, beg – that you keep this brutally simple.
Conor says:

03/08/2011 at 2:12 am

Thanks for taking the trouble to write this.

I’m expecting that most reasonably customizable editors would allow us to spawn an elaborator on the current buffer contents in the background, then merge the changes made by the elaborator with the changes you’ve made in the meantime, marking conflicts. Just like the elaborator is your annoying coauthor!

Reloading from scratch each time is the easiest thing to get right, but we have been known to make “local undo” work. The logic’s quite tricky: definitions are either Unchanged, Improved, Degraded or Deleted, in increasing order of badness. The worse the news, the harder you have to scrutinize downward dependencies. That way, you can start from the textual damage to the source file and compute how much of a hole it blows in the proof state. It’s worth noting that a construction always amounts to a pair of definitions (T : Set, t : T), which can’t be knocked out by bad news from elsewhere (because Set is always ok, and if T is a Set, it’s reasonable to define a t : T). At a crude level, then, the worst that happens is that we figure out which constructions are downstream of stuff that has changed and we re-elaborate only them. (But is it worth the trouble?)

Elaboration in parallel is very much on the thought-agenda. The elaborator is a kind of multithreaded operating system: it could very well run multithreadedly. A crucial property (sometimes hard to ensure, and not true in Agda or Coq) is that refinement of solutions respects refinement of problems. You can kick off an elaboration problem that isn’t entirely defined and be sure that whatever progress you can make on that basis will not be regretted when the problem becomes more defined. Representing incomplete definitions gives us the basis of a non-blocking read.

Modules, at least in the weak sense of file-and-naming structure, will not be long away. The proof state has a modular structure; the source language will piggyback on that. As for first-class modules, we’re in the fabulous position that (i) all constructions are definitions (ii) (needs implementation, but the result is due to Nicolas Oury) we can always turn a definition into an abstraction with a propositional equation. Representation abstraction needs more thought…

Exports… In dependently typed languages, definitions don’t go out of scope while anything depends on them. While it should be up to the importing module to decide which imported definitions get local short names, everything’s available in principle, so it may be helpful to prepare a candidate import list that isn’t full of incidental crap.

More source language syntax coming soon, and the punctuation details are always up for grabs, but I’m working from the outside in. I have a prejudice about layout (which forces a certain simplicity): it should visibly work in a proportionally spaced font. The question is what cues indentation.

It is, of course, difficult to please. When things are new and strange, it’s a mistake to choose an ill-fitting but familiar-looking notation. Better to be clear that something odd is happening. I expect there’ll be a bit of that.

Epilogue