Whenever I write “language” I implicitly mean “programming langauge”.
Dependent Types
Metaprogramming
Metaprogramming is the design of programs that generate code meant to be used
at the same of a similarly high level of abstraction as the metaprogram’s
language. For example, the popular languages Python and Javascript each provide
a built-in function called eval
that takes as input a string and then tries to
interpret that string as a piece of code in the respective language (here,
“interpret” refers to immediately executing the code a high level rather than
first compiling that code to a low-level representation, usually C or
Assembly).
Metaprogramming is the ultimate abstraction, since (almost) any pattern of code at all can be generated by a metaprogram of the same or smaller size (asymptotically, at the very least). However, metaprogramming can also yield some severe side-effects.
- Generated code is not directly visible while programming, so there is an additional layer of obfuscation between programming and feedback.
- Metaprograms usually only have limited access to static information, which
inferred by the compiler, about the programs to be generated. On top of this,
metaprograms are not usually checked to make sure they generate valid
programs. Python and Javascript’s
eval
function know that their argument is a string and nothing more!
For example, Template Haskell provides a monad Q :: * -> *
which is the type
of computations that produce Haskell code. Here is a metaprogram that, at
compile-time, generates a function that sums n
arguments. Without
metaprogramming, there is no basic functionality for variable-arity functions in
Haskell.
addN :: Int -> Q Exp
addN n = do
xs <- replicateM n (newName "x")
lamE (varP <$> xs) (foldM (\e v -> [| $e + $v |]) [| 0 |] xs)
n :: Int
n = $(addN 2) 1 2
-- n == 1 + 2
-- n == 3
Embedded Languages
The definition of a language consists of its syntax and semantics. The syntax of a language is the specification for well-formed programs in the language. The semantics of a language is a mapping from it’s syntax to computational behaviors.
Usually, the semantics of a language are represented as a function from the
syntax of a higher-level language to the syntax of a lower-level langauge. For
example, a compiled language (e.g. C++, Haskell) has a corresponding compiler
which encodes semantics with a function, compile
, from the language syntax to
assembly code. Then the assembly code is compiled into executable bytecode. And
finally hte executable bytecode is “compiled” into basic computational
operations. Beyond this is an descent into the physical implementation of the
computer, and further into the physical processes that make up the physical
execution.
Metaprogramming slighly confuses this setup, however. A metaprogram is a program
that can actually refer to the syntax of itself by representing the program as a
string, and it can refer to the semantics of itself by calling the eval
function. So, the hierarchy described in the previous section can have cycles.
Even without metaprogramming though, we can interface directly with this hierarchy via an embedded language. An embedded language is a language where its syntax and semantics are defined entirely within a higher-level language. A compiled language is almost like an embedded language within its compiler’s language, but it is not because the compiler’s language is not a (usually) a lower-level language than the compiled language. So, embedded languages are sort of like reverse compiled languages.
As a very simple example, here is a way of embedding the untyped lambda calculus within Haskell:
data Term = Var String | Lam String Term | App Term Term
deriving (Show)
type Ctx = String -> Maybe Term
eval :: Ctx -> Term -> Maybe Term
eval ctx (Var x) = ctx x
eval ctx (Lam x b) = Just (Lam x b)
eval ctx (App (Lam x b) a) = eval (\y -> if x == y then b else ctx y) b
eval ctx (App _ _) = Nothing
example1 :: Maybe Term
example1 = eval (\_ -> Nothing) (App (Lam "x" (Var "x")) (Var "y"))
-- example1 = eval ((λ x . x) y)
-- example1 == Just (Var "y")
example2 :: Maybe Term
example2 = eval (\_ -> Nothing) (App (Var "x") (Var "y"))
-- example2 = eval (x y)
-- example2 == Nothing
example3 :: Maybe Term
example3 = eval (\_ -> Nothing) (App (Lam "x" (Var "y")) (Var "z"))
-- example3 = eval ((λ x . y) z)
-- example3 == Nothing
The datatype Term
defined the syntax of the language, and the function eval
defined the syntax of the language. Note that, since eval
returns a
Maybe Term
, it can be considered partial on the entire syntax domain. That is,
some syntactically-valid programs are not meaningful i.e. they don’t have
semantics. For example, example2
tries to apply a non-lambda, and example3
refers to an unbound variable.
So, what’s the point of embedded languages? Two interesting use cases are the following:
- A language (or, at least, parts of it) can be embedded within itself to
facilitate metaprogramming capabilities. Rather than just attempt to parse
unstrucutred strings like JavaScript and Python’s
eval
, languages like Haskell (via Template Haskell) and C++ (via templates) offer a metaprogramming interface for manipulate structured representations of the language’s syntax. This makes writing and reasoning about metaprograms somewhat easier. - A language of interest can be embedded within a proof assistant language, such as Agda or Coq, in order for the programmer to write proofs about the syntax and semantics of programs in the embedded language. This is useful for verifying properties of important programs in the embedded language, such as compilers.
In the following section, I will introduce another use case, mesaprogramming, which is a combination of these two use cases.
Mesaprogramming
The usual technique of metaprogramming (“above-programming”) is to provide a special interface for invoking code generation (e.g. templates as a built-in or library feature). The main issue with this is that it requires a new metaprogramming language to be built on top of the base language, which comes with complicated language features and usually little to none of the safety features of the original language. This is a serious issue because writing correct metaprograms is very difficult, especially when their output is not easily accessible.
Mesaprogramming (“below-programming”) is an alternative technique that can be implemented in any language, and is most powerful in proof assistant languages. Rather than write a metaprogramming language on top of the base language, the technique of mesaprogramming is to define an embedded mesaprogramming language inside of the base language as a formalization of a fragment of the base language (defining the entire language within itself comes with many issues: TODO).
For example, variable arity functions in Haskell:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeFamilies #-}
data Nat = Zero | Suc Nat
data SNat :: Nat -> * where
SZero :: SNat Zero
SSuc :: SNat n -> SNat (Suc n)
n0 :: SNat Zero
n0 = SZero
n1 :: SNat (Suc Zero)
n1 = SSuc SZero
n2 :: SNat (Suc (Suc Zero))
n2 = SSuc (SSuc SZero)
n3 :: SNat (Suc (Suc (Suc Zero)))
n3 = SSuc (SSuc (SSuc SZero))
type family EndoN (a :: *) (n :: Nat) :: * where
EndoN a Zero = a
EndoN a (Suc n) = a -> EndoN a n
-- given a (singleton for) natural number `n`, is an `n`-ary function that adds its inputs
addN :: SNat n -> EndoN Int n
addN SZero = 0
addN (SSuc SZero) = \x -> x
addN (SSuc (SSuc n)) = \x y -> addN (SSuc n) (x + y)
For example, TODO: example of metaprogramming in python