2017-05-19 A well-typed suspension calculus

When implementing interpreters for languages based on the lambda-calculus one quickly moves away from naively substituting variables when evaluating function application. This is mostly due to the fact that terms can grow very big very fast, since when substituting variables for some term said term will be duplicated everywhere the variable is used.

In other words, if we have some term

(\x -> foo x x) SomeBigTerm

SomeBigTerm will be duplicated when substituting x for SomeBigTerm.

The solution is to not substitute eagerly. In most languages it is quite easy to do so, for example using a CEK machine.

However, things get a bit tricky when one needs to compute with free variables.1 This is the case in many dependently typed languages, such as Agda or Coq. Agda perfomance in particular has long suffered also because it substitutes terms too eagerly.

This problem is also relevant in other languages that need to implement higher-order unification, such as lambda-Prolog. The implementors of lambda-Prolog solved this problem by developing a “suspension calculus”, which allows substitutions to be delayed until we really need them.2

The substitution calculus works with de Bruijn indices. As everybody who implemented algorithms involving de Bruijn indices knows, it is extremely easy to make mistakes and break the invariants required for the indices to be well-formed. The suspension calculus is no exception, and up to now I had trouble justifying its rewrite rules.

However, it is possible to encode what scope we’re working on at the type level, thus making operations involving de Bruijn indices much safer. This short article is about implementing the suspension calculus using the same techniques, so that we can implement its rewrite rules with much more confidence.

The article is a literate Haskell file, you can save it and load it using

$ stack --resolver lts-8.11 ghci
Prelude> :l suspension.lhs

You can also see the file on GitHub.

Let’s get started.

Boring preamble

First of all, a few boring LANGUAGE extensions and import. Most of the imports are needed for the pretty printing and parsing, which are not really relevant to the article.

Variables and expressions

As I have mentioned, we will have our variables to be well-typed, in the sense that the “depth” of the scope of terms will be tracked at the type level. This approach goes back at least to de Bruijn notation as a nested datatype by Bird and Paterson. You can refer to that paper for details, but the core idea is quite simple.

We first define the type of variables for our terms:

This looks an awful lot like Maybe, but we define our own type for nicer naming. The role of B and F will hopefully become more clear as we define expressions, but intuitively B refers to the most recently bound variable (like 0 if we were using normal de Bruijn indices), and F refers to a variable in the scope without the most recently bound variable.

Then we define Syntax and Exp, in tandem. Syntax specifies the constructs of our lambda calculus: variables, applications, lambda functions, and let bindings:

We will define Exp shortly, but for now you can pretend it’s Syntax.

Note that the argument of Syntax indicates the scope of the term. For example if we have a term of type

Syntax (Var (Var Text))

we know that it’s a term with two bound variables (the two Var) and some free variables represented by the top-level Text. In such a term, we might have

F (F "someTopLevelDefinition")
F B -- The variable bound by the second most recent lambda or let
B -- The variable bound by the most recent lambda or let

Also note that we store a piece of Text in Lam and Let to easily pretty print terms.

That said, Exp is either a piece of Syntax, or a suspended piece of Syntax:

Env b a is some data structure containing information to turn a term with scope b into a term in scope a. For example, something of type

Env (Var (Var Text)) (Var Text)

contains information on how to remove one free variable out of a term. We will define Env shortly, but first let’s define a couple of shortcuts to form Exps quickly:


We can now get to the tricky part: defining environments that let us delay substitution. We first define a GADT to specify an increase in scope depth:

If we have Weaken from to, to it’s going to be of the form Var (Var ... (Var from)): it specifies an increase of scope depth from from.

Then, a canonical environment is either just a weakening, or an existing environment added with an expression containing the value for a bound varible:

Similarly to Lam and Let, we store the name of the variable that EnvCons is referring to for easy pretty printing.

We call this form of environments canonical because we will reduce all environments to this form. However, we can also form environments by composition:

To recap:

EnvNil wk weakens terms by the amount specified by wk. For example, applying

EnvCanonical (EnvNil (WeakenSucc WeakenZero)) :: Env a (Var a)

to a term of type Exp a will result in a term of type Exp (Var a), which will be implemented by applying all variables in the term to F.

EnvCons n e env will replace the first bound variable with e, and then apply the environment env. For example, applying

EnvCanonical (EnvCons n e (EnvCanonical (EnvNil WeakeZero))) :: Env (Var a) a

to a term of type Exp (Var a) will result in a term of type Exp a, where the first bound variable is replaced with e.

EnvComp env1 env2 will apply first environment env1 and then environment env2.

Let’s define some shortcuts to construct environments and suspensions:

Then, we can define a function to remove delayed composition of environments. Although our environments differ quite significantly from the ones in the original lambda-Prolog paper, this is the trickiest part where the well-typed scope are a really big help.

evalEnv :: Env from to -> CanonicalEnv from to
evalEnv = \case
  -- If the environment is already canonical, stop
  EnvCanonical env -> env
  -- If we have a composition, evaluate the left
  -- hand side and then compose...
  EnvComp env1 env2 -> goComp (evalEnv env1) env2
    -- Compose a canonical environment with normal environment
    goComp :: CanonicalEnv a b -> Env b c -> CanonicalEnv a c
    -- If the LHS is just a weakening, push the weakening
    -- through.
    goComp (EnvNil wk) env2 = goCompWeaken wk env2
    -- If the LHS is a delayed substitution, return it
    -- applying the RHS to the expression and composing
    -- the rest of the environment.
    goComp (EnvCons n e env1) env2 = EnvCons n (susp env2 e) (EnvComp env1 env2)

    -- Composes a weakening and an environment.
    goCompWeaken :: Weaken a b -> Env b c -> CanonicalEnv a c
    goCompWeaken wk = \case
      -- If the RHS is a weakening, just compose the two
      -- weakenings
      EnvCanonical (EnvNil wk') -> EnvNil (compWeaken wk wk')
      -- If the RHS is a delayed substitution, drop it
      -- if the weakening is non-zero, return the RHS
      -- as-is otherwise.
      EnvCanonical (EnvCons v e env) -> case wk of
        WeakenZero -> EnvCons v e env
        WeakenSucc wk' -> goCompWeaken wk' env
      -- If the RHS is a composition, apply the weakening to the
      -- LHS of the composition and then compose again.
      EnvComp env1 env2 -> goComp (goCompWeaken wk env1) env2

    -- Compose two weakenings -- same as addition
    compWeaken :: Weaken a b -> Weaken b c -> Weaken a c
    compWeaken wk1 WeakenZero = wk1
    compWeaken wk1 (WeakenSucc wk2) = WeakenSucc (compWeaken wk1 wk2)

The only surprising rule is the one dropping the substitution when a weakening is composed with an environment – the intuition is that if we’ve just weakened a term it surely can’t refer to the first bound variable.

Note that while these rules feel quite natural (and in fact are pretty much forced by the types), they are natural because of how we formulated environments, and that formulation was also guided by the types. When pushing more property of the code in the types, this often happens: data structures are largely driven by making your programs type check more easily. As Conor McBride put it, types are lamps, not lifebuoys.

Once we have this function “evaluating” environments, we can write a function taking a variable and looking up into an environment:


Finally, we can evaluate expressions to their head normal form (if they have one).

Normalized terms are either a lambda or a free variable applied to some terms (a neutral term):

Then we define a function to push environments substitution down the expression, giving us back a Syntax:

Then, evaluating a term is just a matter of creating the right suspension and using removeSusp:

And we’re done! In the rest of the article I implement a parser and pretty printer, which means that you can do

$ stack --resolver lts-8.11 ghci
Prelude> :l suspension.lhs
Main> repl
>>> \x -> x
Evaluated expression:
  \x -> x
Evaluated expression (no suspensions):
  \x -> x
>>> (\x -> x) foo
Evaluated expression:
Evaluated expression (no suspensions):
>>> let x = foo; x
Evaluated expression:
Evaluated expression (no suspensions):

Things get interesting when we evaluate expressions that contain delayed substitutions:

>>> (\a b -> a) foo
Evaluated expression:
  \b ->
        (b_1 := b)
        ($comp ($cons (a := foo) ($nil 0)) ($nil 1)))
Evaluated expression (no suspensions):
  \b -> foo
>>> :{
  | let x = \y -> x y;
  | x foo
  | }:
Evaluated expression:
        (y :=
          $susp ($cons (x := \y -> x y) ($nil 0)) foo)
        ($nil 0))
Evaluated expression (no suspensions):
  x foo

The first printed expression shows the full suspension including the environment, where $susp env a represents a Susp, $cons (n := e) env an EnvCons n e env, and $nil i a EnvNil wk where wk weakens by i.

We also print the expression with all the suspensions removed.

Node that this approach can be paired with other performance improvements, such as graph reduction (sharing common subexpressions and evaluating them all at once), or more eager evaluation of function arguments.

Thanks for listening, comments on reddit.


Pretty printing

evalToSyntax :: Eval a -> Syntax a
evalToSyntax = \case
  EvalLam n body -> Lam n body
  EvalNeutral v args -> foldl' (\e -> App (Syntax e)) (Var v) args

removeAllSusps :: Exp a -> Exp a
removeAllSusps e = Syntax $ case removeSusp e of
  Var v -> Var v
  Lam n body -> Lam n (removeAllSusps body)
  Let n e1 e2 -> Let n (removeAllSusps e1) (removeAllSusps e2)
  App fun arg -> App (removeAllSusps fun) (removeAllSusps arg)

data PrettyEnv a = PrettyEnv
  { peCounters :: HashMap Text Int
  , pePrettyNames :: PrettyNames a
data PrettyNames a where
  PrettyNamesNil :: PrettyNames Text
  PrettyNamesCons :: Text -> PrettyNames a -> PrettyNames (Var a)

newPrettyEnv :: PrettyEnv Text
newPrettyEnv = PrettyEnv mempty PrettyNamesNil

weakenPrettyEnv :: PrettyEnv a -> Text -> (Text, PrettyEnv (Var a))
weakenPrettyEnv vn n = let
  counter = case HMS.lookup n (peCounters vn) of
    Nothing -> 0
    Just c -> c
  counters = HMS.insert n (counter+1) (peCounters vn)
  n' = if counter == 0 then n else n <> "_" <> T.pack (show counter)
    ( n'
    , vn
        { peCounters = counters
        , pePrettyNames = PrettyNamesCons n' (pePrettyNames vn)

strengthenPrettyEnv :: PrettyEnv (Var a) -> PrettyEnv a
strengthenPrettyEnv vn = vn
  { pePrettyNames = case pePrettyNames vn of
      PrettyNamesCons _ ns -> ns

peLookup :: PrettyEnv a -> a -> Text
peLookup vn = go (pePrettyNames vn)
      go :: PrettyNames a -> a -> Text
      go ns0 v = case ns0 of
        PrettyNamesNil -> v
        PrettyNamesCons txt ns -> case v of
          B -> txt
          F v' -> go ns v'

data Position
  = PosNormal
  | PosArg

posParens :: Position -> PP.Doc -> PP.Doc
posParens = \case
  PosNormal -> id
  PosArg -> PP.parens

hang :: PP.Doc -> PP.Doc
hang = PP.group . PP.nest 2

prettyExp :: Position -> PrettyEnv a -> Exp a -> PP.Doc
prettyExp pos vn = \case
  Syntax e -> prettySyntax pos vn e
  Susp env e -> prettySusp pos vn env (Syntax e)

prettySyntax :: Position -> PrettyEnv a -> Syntax a -> PP.Doc
prettySyntax pos vn = \case
  Var v -> PP.text (T.unpack (peLookup vn v))
  e@App{} -> posParens pos (hang (prettyApp (Syntax e)))
  e@Lam{} -> posParens pos (hang ("\\" PP.<> prettyLam vn (Syntax e)))
  Let n e1 e2 -> posParens pos $ let
    (n', vn') = weakenPrettyEnv vn n
      PP.text (T.unpack n') PP.<+> "=" PP.<+>
      prettyExp PosNormal vn e1 PP.<> ";" PP.<$>
      prettyExp PosNormal vn' e2
    prettyApp = \case
      Syntax (App fun arg) -> prettyApp fun PP.<$> prettyExp PosArg vn arg
      e -> prettyExp PosArg vn e

    prettyLam :: PrettyEnv a -> Exp a -> PP.Doc
    prettyLam vn' = \case
      Syntax (Lam n body) -> let
        (n', vn'') = weakenPrettyEnv vn' n
        in PP.text (T.unpack n') PP.<+> prettyLam vn'' body
      e -> "->" PP.<$> prettyExp PosNormal vn' e

prettySusp :: Position -> PrettyEnv b -> Env a b -> Exp a -> PP.Doc
prettySusp pos vn env e = let
  (envDoc, vn') = prettyEnv PosArg vn env
  in posParens pos (hang (PP.vsep ["$susp", envDoc, prettyExp PosArg vn' e]))

prettyEnv :: Position -> PrettyEnv b -> Env a b -> (PP.Doc, PrettyEnv a)
prettyEnv pos vn = \case
  EnvComp env1 env2 -> let
    (env2Doc, vn') = prettyEnv PosArg vn env2
    (env1Doc, vn'') = prettyEnv PosArg vn' env1
    in (posParens pos (hang (PP.vsep ["$comp", env1Doc, env2Doc])), vn'')
  EnvCanonical (EnvNil wk) -> let
    (wkDoc, vn') = prettyWeaken vn wk
    in (posParens pos ("$nil" PP.<+> wkDoc), vn')
  EnvCanonical (EnvCons n0 e env') -> let
    (env'Doc, vn') = prettyEnv PosArg vn env'
    (n, vn'') = weakenPrettyEnv vn' n0
      ( posParens pos $ hang $ PP.vsep
          [ "$cons"
          , PP.parens (hang (PP.text (T.unpack n) PP.<+> ":=" PP.<$> prettyExp PosNormal vn e))
          , env'Doc
      , vn''

prettyWeaken :: PrettyEnv b -> Weaken a b -> (PP.Doc, PrettyEnv a)
prettyWeaken vn0 wk0 = let
  (wkNum, vn) = go vn0 wk0
  in (PP.integer wkNum, vn)
    go :: PrettyEnv b -> Weaken a b -> (Integer, PrettyEnv a)
    go vn = \case
      WeakenZero -> (0, vn)
      WeakenSucc wk -> let
        (c, vn') = go (strengthenPrettyEnv vn) wk
        in (c+1, vn')


type ParseMonad = Tri.Parser

data ParseEnv a where
  ParseEnvNil :: ParseEnv Text
  ParseEnvCons :: Text -> ParseEnv a -> ParseEnv (Var a)

parseEnvLookup :: ParseEnv a -> Text -> a
parseEnvLookup env0 txt = case env0 of
  ParseEnvNil -> txt
  ParseEnvCons txt' env -> if txt == txt'
    then B
    else F (parseEnvLookup env txt)

parseExp :: Position -> ParseEnv a -> ParseMonad (Exp a)
parseExp pos env = Syntax <$> parseSyntax pos env

parseSyntax :: Position -> ParseEnv a -> ParseMonad (Syntax a)
parseSyntax pos env = (case pos of
  PosNormal -> asum
    [ parseLam env
    , parseLet env
    , do
        head <- parseSyntax PosArg env
        args <- many (parseSyntax PosArg env)
        return (foldl' (\fun arg -> App (Syntax fun) (Syntax arg)) head args)
  PosArg -> asum
    [ Var <$> parseVar env
    , Parse.parens (parseSyntax PosNormal env)
    ]) Parse.<?> "expression"

parseVar :: ParseEnv a -> ParseMonad a
parseVar env = do
  n <- Parse.ident parseIdentStyle
  return (parseEnvLookup env n)

parseIdentStyle :: Parse.IdentifierStyle ParseMonad
parseIdentStyle = Parse.IdentifierStyle
  { Parse._styleName = "identifier"
  , Parse._styleStart = Parse.letter <|> Parse.char '_'
  , Parse._styleLetter = Parse.alphaNum <|> Parse.char '_'
  , Parse._styleReserved = HS.fromList ["let"]
  , Parse._styleHighlight = Parse.Identifier
  , Parse._styleReservedHighlight = Parse.ReservedIdentifier

parseLam :: ParseEnv a -> ParseMonad (Syntax a)
parseLam env0 = (do
  void (Parse.symbolic '\\')
  n <- Parse.ident parseIdentStyle
  Lam n <$> go (ParseEnvCons n env0)) Parse.<?> "lambda"
    go :: ParseEnv a -> ParseMonad (Exp a)
    go env = asum
      [ do
          void (Parse.symbol "->")
          parseExp PosNormal env
      , do
          n <- Parse.ident parseIdentStyle
          lam n <$> go (ParseEnvCons n env)

parseLet :: ParseEnv a -> ParseMonad (Syntax a)
parseLet env = (do
  void (Parse.reserve parseIdentStyle "let")
  n <- Parse.ident parseIdentStyle
  void (Parse.symbolic '=')
  e1 <- parseExp PosNormal env
  void (Parse.symbolic ';')
  e2 <- parseExp PosNormal (ParseEnvCons n env)
  return (Let n e1 e2)) Parse.<?> "let"


repl :: IO ()
repl = Haskeline.runInputT
    { Haskeline.historyFile = Just ".lambda" }
    render x = PP.displayS (PP.renderSmart 0.6 80 x) ""
    outputDoc = Haskeline.outputStrLn . render

    parseInput = parseExp PosNormal ParseEnvNil <* Tri.eof

    runParser fp (line :: Int) (col :: Int) s p = let
      delta = Tri.Directed (T.encodeUtf8 (T.pack fp)) (fromIntegral line - 1) (fromIntegral col - 1) 0 0
      in case Tri.parseByteString p delta s of
        Tri.Success a -> Right a
        Tri.Failure err -> Left (Tri._errDoc err)

    loop :: Haskeline.InputT IO ()
    loop = do
      mbInput <- Haskeline.getInputLine ">>> "
      let processAndLoop s = do
            let mbX = runParser "repl" 0 0 (T.encodeUtf8 (T.pack s)) parseInput
            case mbX of
              Left err -> do
                outputDoc ("Error while parsing" PP.<$> err)
              Right x -> do
                outputDoc ("Parsed expression:" PP.<$> PP.indent 2 (prettyExp PosNormal newPrettyEnv x))
                let whnf = Syntax (evalToSyntax (eval (removeSusp x)))
                outputDoc $
                  "Evaluated expression:" PP.<$>
                  PP.indent 2 (prettyExp PosNormal newPrettyEnv whnf)
                outputDoc $
                  "Evaluated expression (no suspensions):" PP.<$>
                  PP.indent 2 (prettyExp PosNormal newPrettyEnv (removeAllSusps whnf))
      case mbInput of
        Nothing -> return ()
        Just (':' : cmd) -> case cmd of
            "q" -> return ()
            "{" -> do
              let collect :: [String] -> Haskeline.InputT IO ()
                  collect chunks = do
                    mbInput' <- Haskeline.getInputLine "  | "
                    case mbInput' of
                      Nothing -> return ()
                      Just "}:" -> processAndLoop (concat (intersperse "\n" (reverse chunks)))
                      Just chunk -> collect (chunk : chunks)
              collect []
            'l' : ' ' : file -> processAndLoop =<< liftIO (readFile file)
            _ -> do
              outputDoc ("Unrecognized command" PP.<+> PP.text cmd)
        Just input -> processAndLoop input

  1. Or “with open terms” or “under binders”, depending to who you are talking to.

  2. For more about the suspension calculus as described by the authors of lambda-Prolog, see A Simplified Suspension Calculus and its Relationship to Other Explicit Substitution Calculi, by Andrew Gacek and Gopalan Nadathur.