The Syntax of Ginger

Oh man, this got real fun real quick.

Finally I have a syntax for ginger that I'm happy with. This has actually been a huge roadblock for me up till this point. There's a bit of a chicken-and-the-egg problem with the syntax: without pinning down the structures underlying the syntax it's difficult to develop one, but without an idea of syntax it's difficult to know what structures will be ergonomic to use.

I've been focusing on the structures so far, and have only now pinned down the syntax. Let's see what it looks like.

Preface: Conditionals

I've so far written two posts regarding conditionals in ginger. After more reflection, I think I'm going to stick with my *original* gut, which was to only have value and tuple vertices (no forks), and to use a function which accepts both a boolean and two input edges: the first being the one to take if the boolean is true, and the second being the one to take if it's false.

two

posts

Aka, the very first proposal in the first post. It's hard to justify up-front, but I think once you see it in action with a clean syntax you'll agree it just kind of works.

first post

Designing a Syntax

Ginger is a bit of a strange language. It uses strange datastructures in strange ways. But approaching the building of a syntax for any language is actually straightforward: you're designing a serialization protocol.

To pull back a bit, consider a list of words. How would you encode this list in order to write it to a file? To answer this, let's flip the question: how would you design a sequence of characters (ie the contents of the file) such that the reader could reconstruct the list?

Well, constructing the list from a sequence of characters requires being able to construct it *at all*, so in what ways is the list constructed? For this list, let's say there's only an append operation, which accepts a list and a value to append to it, and returns the result.

If we say that append is encoded via wrapping parenthesis around its two arguments, and that `()` encodes the empty list, then we get a syntax like...

(((() foo) bar) baz)

...which, in this instance, decodes to a list containing the words, "foo", "bar", and "baz", in that order.

It's not a pretty syntax, but it demonstrates the method. If you know how the datastructure is constructed via code, you know what capabilities the syntax must have and how it needs to fit together.

gg

All of this amounted to me needing to implement the ginger graph in some other language, in order to see what features the syntax must have.

A few years ago I had begun an implementation of a graph datastructure in go, to use as the base (or at least a reference) for ginger. I had called this implementation `gg` (ginger graph), with the intention that this would also be the file extension used to hold ginger code (how clever).

The basic qualities I wanted in a graph datastructure for ginger were, and still are:

Immutability, ie all operations which modify the structure should return a copy, leaving the original intact.
Support for tuples.
The property that it should be impossible to construct an invalid graph. An invalid graph might be, for example, one where there is a single node with no edges.
Well tested, and reasonably performant.

Coming back to all this after a few years I had expected to have a graph datastructure implemented, possibly with immutability, but lacking in tuples and tests. As it turns out I completely underestimated my past self, because as far as I can tell I had already finished the damn thing, tuples, tests and all.

It looks like that's the point where I stopped, probably for being unsure about some other aspect of the language, and my motivation fell off. The fact that I've come back to ginger, after all these years, and essentially rederived the same language all over again, gives me a lot of confidence that I'm on the right track (and a lot of respect for my past self for having done all this work!)

The basic API I came up with for building ginger graphs (ggs) looks like this:

package gg

// OpenEdge represents an edge with a source value but no destination value,
// with an optional value on it. On its own an OpenEdge has no meaning, but is
// used as a building block for making Graphs.
type OpenEdge struct{ ... }

// TupleOut constructs an OpenEdge leading from a tuple, which is comprised of
// the given OpenEdges leading into it, with an optional edge value.
func TupleOut(ins []OpenEdge, edgeVal Value) OpenEdge

// ValueOut constructs an OpenEdge leading from a non-tuple value, with an
// optional edge value.
func ValueOut(val, edgeVal Value) OpenEdge

// ZeroGraph is an empty Graph, from which all Graphs are constructed via calls
// to AddValueIn.
var ZeroGraph = &Graph{ ... }

// Graph is an immutable graph structure, formed from a collection of edges
// between values and tuples.
type Graph struct{ ... }

// AddValueIn returns a new Graph which is a copy of the original, with the
// addition of a new edge. The new edge's source and edge value come from the
// given OpenEdge, and the edge's destination value is the given value.
func (g *Graph) AddValueIn(oe OpenEdge, val Value) *Graph

The actual API is larger than this, and includes methods to remove edges, iterate over edges and values, and perform unions and disjoins of ggs. But the above are the elements which are required only for *making* ggs, which is all that a syntax is concerned with.

As a demonstration, here is how one would construct the `min` operation, which takes two numbers and returns the smaller, using the `gg` package:

// a, b, in, out, if, etc.. are Values which represent the respective symbol.

// a is the result of passing in to the 0 operation, ie a is the 0th element of
// the in tuple.
min := gg.ZeroGraph.AddValueIn(gg.ValueOut(in, 0), a)

// b is the 1st element of the in tuple
min = min.AddValueIn(gg.ValueOut(in, 1), b)

// out is the result of an if which compares a and b together, and which returns
// the lesser.
min = min.AddValueIn(out, gg.TupleOut([]gg.OpenEdge{
    gg.TupleOut([]gg.OpenEdge{a, b}, lt),
    a,
    b,
}, if)

And here's a demonstration of how this `min` would be used:

// out is the result of passing 1 and 5 to the min operation.
gg.ZeroGraph.AddValueIn(gg.TupleOut([]gg.OpenEdge{1, 5}, min), out)

Make it Nice

Technically* we're done. We have an implementation of the language's underlying structure, and a syntax which encodes it (ie the ugly ass go syntax above). But obviously I'm not proposing anyone actually use that.

Another thing I found when digging around in the old ginger repo was a text file, tucked away in a directory called "sandbox", which had a primitive syntax which *almost* worked. I won't copy it here, but you can find it if you care to. But with that as a foundation I came up with a crude, rough draft spec, which maps the go syntax to the new syntax.

ValueOut(val, edgeVal)    : -edgeVal-val
ValueOut(val, null)       : -val
TupleOut([]val, edgeVal)  : -edgeVal-(val, ...)
TupleOut([]val, null)     : -(val, ...)
Graph(openEdge->val, ...) : { val openEdge, ... }

A couple things to note about this spec:

`null` is used to indicate absence of value on an edge. The details of `null` are yet to be worked out, but we can use this placeholder for now.
`Graph` is cheating a bit. In the original `gg` implementation a Graph gains its OpenEdge/Value pairs via successive calls to `AddValueIn`. However, such a pattern doesn't translate well to text, and since we're dealing purely with constructing an entire Graph at once we can instead have our Graph syntax declare all OpenEdge/Value pairs at once.
It's backwards! Eg where the go syntax does `ValueOut(val, edgeVal)`, the proposed spec puts the values in the opposite order: `-edgeVal-val`. The former results in code which is read from input to output, while the latter results in the opposite: output to input.

This was a tip I picked up from the old text file I found, and the result is code which is more familiar to an existing programmer. I *think* (but am not sure) that it's also more in line with how programming is done mentally, ie we start with a result and work backwards to figure out what it takes to get there.

It's possible, though, that I'm wrong, so at this end of this post I'm going to put some examples of the same code both "forwards" and "backwards" and see how I feel about it.

With all that said, let's see it in action! Here's `min` implemented in our shiny new syntax:

min -{
    a -0-in,
    b -1-in,
    out -if-(
        -lt-(-a,-b),
        -a,
        -b
    )
}

and then here's it being used:

out -min-(-1,-5)

Make it Nicer

The most striking feature of this rough draft spec is all the prefix dashes, such as in the `-min-(-1,-5)` statement. These dashes were included as they make sense in the context of what the intended human interpretation of the structure is: two values, `1`, and `5`, are being *piped into* the two slots of a 2-tuple, and that 2-tuple is being *piped into* the `min` operation, the output of which is being *piped into* something `out`.

The "piping into" is what the dash represents, which is why the top level values in the graph, `a`, `b`, and `out`, don't have a preceding dash; they are the ultimate destinations of the pipes leading to them. But these pipes are ultimately ugly, and also introduce odd questions like "how do we represent -1?", so they need to go.

So I've made a second draft, which is only a few changes away from the rough, but oh man do those changes make a world of difference. Here's the cleaned up spec:

ValueOut(val, edgeVal)    : edgeVal(val)
ValueOut(val, null)       : val
TupleOut([]val, edgeVal)  : edgeVal(val, ...)
TupleOut([]val, null)     : (val, ...)
Graph(openEdge->val, ...) : { val = openEdge, ... }

The dashes were simply removed, and the `edgeVal` and `val` concatted together. For `ValueOut(val, edgeVal)` wrapping parenthesis were put around `val`, to delineate it and `edgeVal`. This conflicts with the syntax for `TupleOut([]val, edgeVal)`, but that conflict is easy to remedy: when parenthesis wrap only a single `val` then that is a `ValueOut`, otherwise it's a `TupleOut`.

Another change is to add an `=` between the `val` and `openEdge` in the `Graph` constructor. This is a purely aesthetic change, but as you'll see it works well.

So let's see it! `min` implemented with this cleaned up syntax:

min = {
    a = 0(in),
    b = 1(in),
    out = if(
        lt(a,b),
        a,
        b
    )
}

And then its use:

min(1,5)

Well well well, look what we have here: a conventional programming language syntax! `{`/`}` wrap a scope, and `(`/`)` wrap function arguments and (optionally) single values. It's a lot clearer now that `0` and `1` are being used as operations themselves when instantiating `a` and `b`, and `if` is much more readable.

I was extremely surprised at how well this actually worked out. Despite having drastically different underpinnings than most languages it ends up looking both familiar and obvious. How cool!

Examples Examples Examples

Here's a collection of example programs written in this new syntax. The base structure of these are borrowed from previous posts, I'm merely translating that structure into a new form:

// decr outputs one less than the input.
decr = { out = add(in, -1) }

// fib accepts a number i, and outputs the ith fibonacci number.
fib = {

    inner = {
        n = 0(in),
        a = 1(in),
        b = 2(in),

        out = if(zero?(n),
            a,
            inner(decr(n), b, add(a,b))
        )

    },

    out = inner(in, 0, 1)
}

// map accepts a sequence and a function, and returns a sequence consisting of
// the result of applying the function to each of the elements in the given
// sequence.
map = {
    inner = {
        mapped-seq = 0(in),
        orig-seq = 1(in),
        op = 2(in),

        i = len(mapped-seq),

        // graphs provide an inherent laziness to the language. Just because
        // next-el is _defined_ here doesn't mean it's evaluated here at runtime.
        // In reality it will only be evaluated if/when evaluating out requires
        // evaluating next-el.
        next-el = op(i(orig-seq)),
        next-mapped-seq = append(mapped-seq, next-el),

        out = if(
            eq(len(mapped-seq), len(orig-seq)),
            mapped-seq,
            inner(next-mapped-seq, orig-seq, op)
        )
    }

    // zero-seq returns an empty sequence
    out = inner(zero-seq(), 0(in), 1(in))
}

Selpmexa Selpmexa Selpmexa

Our syntax encodes a graph, and a graph doesn't really care if the syntax was encoded in an input-to-output vs an output-to-input direction. So, as promised, here's all the above examples, but "backwards":

// min returns the lesser of the two numbers it is given
{
    (in)0 = a,
    (in)1 = b,

    (
        (a,b)lt,
        a,
        b
    )if = out

} = min

// decr outputs one less than the input.
{ (in, -1)add = out } = decr

// fib accepts a number i, and outputs the ith fibonacci number.
{
    {
        (in)0 = n,
        (in)1 = a,
        (in)2 = b,

        (
            (n)zero?
            a,
            ((n)decr, b, (a,b)add)inner
        )if = out

    } = inner,

    (in, 0, 1)inner = out

} = fib

// map accepts a sequence and a function, and returns a sequence consisting of
// the result of applying the function to each of the elements in the given
// sequence.
{
    {
        (in)0 = mapped-seq,
        (in)1 = orig-seq,
        (in)2 = op,

        (mapped-seq)len = i,

        ((orig-seq)i)op = next-el,
        (mapped-seq, next-el)append = next-mapped-seq,

        (
            ((mapped-seq)len, (orig-seq)len)eq,
            mapped-seq,
            (next-mapped-seq, orig-seq, op)inner
        )if = out

    } = inner,

    (()zero-seq, (in)0, (in)1)inner = out
} = map

Do these make you itchy? They kind of make me itchy. But... parts of them also appeal to me.

The obvious reason why these feel wrong to me is the placement of `if`:

(
    (a,b)lt,
    a,
    b
)if = out

The tuple which is being passed to `if` here is confusing unless you already know that it's going to be passed to `if`. But on your first readthrough you won't know that till you get to the end, so you'll be in the dark until then. For more complex programs I'm sure this problem will compound.

On the other hand, pretty much everything else looks *better*, imo. For example:

// copied and slightly modified from the original to make it even more complex

(mapped-seq, ((orig-seq)i)op)append = next-mapped-seq

Something like this reads very clearly to me, and requires a lot less mental backtracking to comprehend. The main difficulty I have is tracking the parenthesis, but the overall "flow" of data and the order of events is plain to read.

Next Steps

The syntax here is not done yet, not by a long shot. If my record with past posts about ginger (wherein I've "decided" on something and then completely backtracked in later posts every single time) is any indication then this syntax won't even look remotely familiar in a very short while. But it's a great starting point, I think, and raises a lot of good questions.

Can I make parenthesis chains, a la the last example, more palatable in some way?
Should I go with the "backwards" syntax afterall? In a functional style of programming `if` statements *should* be in the minority, and so the syntax which better represents the flow of data in that style might be the way.
Destructuring of tuples seems to be wanted, as evidenced by all the `a = 0(in)` lines. Should this be reflected in the structure or solely be syntactical sugar?
Should the commas be replaced with any whitespace (and make commas count as whitespace, as clojure has done)? If this is possible then I think they should be, but I won't know for sure until I begin implementing the parser.

And, surely, many more! I've felt a bit lost with ginger for a *long* time, but seeing a real, usable syntax emerge has really invigorated me, and I'll be tackling it again in earnest soon (fingers crossed).

-----

Published 2021-08-26

This site is a mirror of my gemini capsule. The equivalent gemini page can be found here, and you can learn more about gemini at my 🚀 What is Gemini? page.