.%*. .-. .%@@@+. .--=%@@@- =@@@@@@- :--+@@@@@@@@@* *@%@@@@@%: :--=#@@@@@@@@@@@@#@@+ @@::%@@@@@@-: :::-=%@@@@@@@@@@@@@@%#*- :@* .@@ =@@@@@@@@@+-:::::::-=*@@@@@@@@@@@@@@@@@@@%##- @@: #@# +%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@%%#*: -@@ #@- *%@@@@@@@@@@@@@@@@@%%%#= %@= @@: -=+++== .. @@: @@ . -@@@ +@@ #@% =@@@@- +@@@@# *+ %@# #@- -@@@@@@@. +@@@@@% .%@@@@= @@: @@: +@@@@@@@@@ +@+ +@@@ %@@@@@@@@@ -@@ @@ *@%@* :@@@ *@+ -@@@ %@@@@@@@@@@@# @@@ #@% %@ @% %@@ *@* .@@@ =@@@*@@ :@@@@ @@: %@:## *@+ :@@ :@@ @@@ %@@: @# .@@@ .@@ @@:. @@: .@@ @@ @@@ %@: @@ @@% @@% @@ -@@. =@@ @@. @@@ %@. #@- @@# @@. %@% +@@. @@. %@- @@@ #@ @@. :@@ -@@ %@: +@@+ .@% #@= @@@.@ *@@ +@+ @@* @@: +@@%. .%% *@+ @@@ %@@ -@# @@ @@ :@@@@@@% :@@ @@% %@@ @# *@@ %@@ #@@@@* @@ @@% %@@. %@. @@: @@: ===: @@. @@% %@@: =@%. .@@ @@. #@= @@% %@@%+%@#. @@% @@ +@* @@# -@@@@@* .@@ %@@ :@@ @@# %@@#: =@@ @@: .@@ @@# : @@= @@. @@. @@# :@@ @@ #@= @@% %@% #@@ +@+ #@% .@@ @@: =@# +@% =@% @@ :@% -@@ @@+ @@ .@@ -@: -@@ -@@ @% .: *@# @@* # @@. @@ =@@ .@@ @@* .@@ .@@ *@@ =@# @@* @@* @@ -@@ .@@ +@* .@@ @@- -@@ =@@ %@% #@* @@ @@: @@ +@@ .@@ %@+ :@% @@. :@% *@% -@. %@= %@ @@. @@ *@% @@ %@= @% @@: :@% +@@ :@* =+- #@+ :@: .%@@@. @@- -@ =@@@@@# :@@ =@ *@#.=@@% #@% *@ #@- -@@% %@- @@ #@: :@@% @@: @@ #% .@@% .#@@* -@@ @% #% .@@% +@@@@@- %@% @# %@ :@@% .@@@@@@@@ %@: @# %@ :@@% +@@% .@@@@ @@: @# %@ :@@% #@@: :@@@ -@@ @# @@. .@@@ .@@% .@@@= %@% @# @@. .@@@: %@@- @@@% %@: @# @@ @@@@ %@@@ *@@@ @@: @# .@@ @@@@@@@@@@= :@@@ @@ * @@@ :@@@@@@@@. :@@@ *@@ .@@@@@% -@@@@@. .@@@ @@= @@@@@: .. @@@ @@. -@%: @@@ @@. @@@ @@ -@@ =@@ .@@ @@+ .@@:@@. @@#@- @@@. +@- -mediocregopher's lil web corner
- The new best language for computing fibonacci numbers.
As a kind of Christmas present to myself I took a whole week off of work specifically to dedicate myself to working on ginger.
My concrete goal was to be able to run a ginger program to compute any Nth fibonacci number, a goal I chose because it would require the implementation of conditionals, some kind of looping or recursion, and basic addition/subtraction. In other words, it would require all the elements which comprise a Turing complete language.
And you know what? I actually succeeded!
The implementation can be found here. At this point ginger is an interpreted language running in a golang-based VM. The dream is for it to be self-hosted on LLVM (and other platforms after), but as an intermediate step to that I decided on sticking to what I know (golang) rather than having to learn two things at once.
In this post I'm going to describe the components of this VM at a high level, show a quick demo of it working, and finally talk about the roadmap going forward.
The core package of the whole project is the graph
package. This
package implements a generic directed graph datastructure.
The generic part is worth noting; I was able to take advantage of go's new generics which are currently in beta. I'd read quite a bit on how the generic system would work even before the beta was announced, so I was able to hit the ground running and start using them without much issue.
Ginger's unique graph datastructure has been discussed in previous posts in this series quite a bit, and this latest implementation doesn't deviate much at a high level. Below are the most up-to-date core datatypes and functions which are used to construct ginger graphs:
// Value is any value which can be stored within a Graph. Values should be
// considered immutable, ie once used with the graph package their internal
// value does not change.
type Value interface {
Equal(Value) bool
String() string
}
// OpenEdge consists of the edge value (E) and source vertex value (V) of an
// edge in a Graph. When passed into the AddValueIn method a full edge is
// created. An OpenEdge can also be sourced from a tuple vertex, whose value is
// an ordered set of OpenEdges of this same type.
type OpenEdge[E, V Value] struct { ... }
// ValueOut creates a OpenEdge which, when used to construct a Graph, represents
// an edge (with edgeVal attached to it) coming from the vertex containing val.
func ValueOut[E, V Value](edgeVal E, val V) *OpenEdge[E, V]
// TupleOut creates an OpenEdge which, when used to construct a Graph,
// represents an edge (with edgeVal attached to it) coming from the vertex
// comprised of the given ordered-set of input edges.
func TupleOut[E, V Value](edgeVal E, ins ...*OpenEdge[E, V]) *OpenEdge[E, V]
// Graph is an immutable container of a set of vertices. The Graph keeps track
// of all Values which terminate an OpenEdge. E indicates the type of edge
// values, while V indicates the type of vertex values.
type Graph[E, V Value] struct { ... }
// AddValueIn takes a OpenEdge and connects it to the Value vertex containing
// val, returning the new Graph which reflects that connection.
func (*Graph[E, V]) AddValueIn(val V, oe *OpenEdge[E, V]) *Graph[E, V]
// ValueIns returns, if any, all OpenEdges which lead to the given Value in the
// Graph (ie, all those added via AddValueIn).
func (*Graph[E, V]) ValueIns(val Value) []*OpenEdge[E, V]
The current Graph
implementation is incredibly inefficient, it does a lot of
copying, looping, and equality checks which could be optimized out one day.
That's going to be a recurring theme of this post, as I had to perform a
balancing act between actually reaching my goal for the week while not incurring
too much tech debt for myself.
There's a final operation I implemented as part of the graph
package:
MapReduce. It's a difficult operation to describe, but I'm going to
do my best in this section for those who are interested. If you don't understand
it, or don't care, just know that MapReduce
is a generic tool for transforming
graphs.
For a description of MapReduce
we need to present an example graph:
+<--b---
+ \
X <--a--+<--c----+<--f-- A
+ /
+ +<---g---
+<--d--+
+<---h---
\
Y <---------e----------- B
Plus signs indicate tuples, and lowercase letters are edge values while upper case letters are vertex values. The pseudo-code to construct this graph in go might look like:
g := new(Graph)
fA := ValueOut("f", "A")
g = g.AddValueIn(
"X",
TupleOut(
"a",
TupleOut("b", fA),
TupleOut("c", fA),
TupleOut(
"d",
ValueOut("g", "A"),
ValueOut("h", "B"),
),
),
)
g = g.AddValueIn("e", "B")
As can be seen in the code, MapReduce
's first argument is an
OpenEdge
, not a Graph
. Fundamentally MapReduce
is a reduction of the
dependencies of a particular value into a new value; to reduce the
dependencies of multiple values at the same time would be equivalent to looping
over those values and calling MapReduce
on each individually. Having
MapReduce
only deal with one edge at a time is more flexible.
So let's focus on a particular OpenEdge
, the one leading into X
(returned by
TupleOut("a", etc...)
. MapReduce
is going to descend into this OpenEdge
recursively, in order to first find all value vertices (ie the leaf vertices,
those without any children of their own).
At this point MapReduce
will use its second argument, the mapVal
function,
which accepts a value of one type and returns a value of another type. This
function is called on each value from every value vertex encountered. In this
case both A
and B
are connectable from X
, so mapVal
will be called on
each only once. This is the case even though A
is connected to multiple
times (once with an edge value of f
, another with an edge value of b
).
mapVal
only gets called once per vertex, not per connection.
With all values mapped, MapReduce
will begin reducing. For each edge leaving
each value vertex, the reduceEdge
function is called. reduceEdge
accepts as
arguments the edge value of the edge and the mapped value (not the original
value) of the vertex, and returns a new value of the same type that mapVal
returned. Like mapVal
, reduceEdge
will only be called once per edge. In our
example, <--f--A
is used twice (b
and c
), but reduceEdge
will only be
called on it once.
With each value vertex edge having been reduced, reduceEdge
is called again on
each edge leaving those edges, which must be tuple edges. An array of the
values returned from the previous reduceEdge
calls for each of the tuples'
input edges is used as the value argument in the next call. This is done until
the OpenEdge
is fully reduced into a single value.
To flesh out our example, let's imagine a mapVal
which returns the input
string repeated twice, and a reduceEdge
which returns the input values joined
with the edge value, and then wrapped with the edge value (eg reduceEdge(a, [B,
C]) -> aBaCa
).
Calling MapReduce
on the edge leading into X
will then give us the following
calls:
# Map the value vertices
mapVal(A) -> AA
mapVal(B) -> BB
# Reduce the value vertex edges
reduceEdge(f, [AA]) -> fAAf
reduceEdge(g, [AA]) -> gAAg
reduceEdge(h, [BB]) -> hBBh
# Reduce tuple vertex edges
reduceEdge(b, [fAAf]) -> bfAAfb
reduceEdge(c, [fAAf]) -> cfAAfc
reduceEdge(d, [gAAg, hBBh]) -> dgAAgdhBBhd
reduceEdge(a, [bfAAfb, cfAAfc, dgAAgdhBBhd]) -> abfAAfbacfAAfcadgAAgdhBBhda
Beautiful, exactly what we wanted.
MapReduce
will prove extremely useful when it comes time for the VM to execute
the graph. It enables the VM to evaluate only the values which are needed to
produce an output, and to only evaluate each value once no matter how many times
it's used. MapReduce
also takes care of the recursive traversal of the
Graph
, which simplifies the VM code significantly.
With a generic graph implementation out of the way, it was then required to define a specific implementation which could be parsed from a file and later used for execution in the VM.
The file extension used for ginger code is .gg
, as in "ginger graph" (of
course). The package name for decoding this file format is, therefore, also
called gg
.
The core datatype for the gg
package is the Value
, since the
graph
package takes care of essentially everything else in the realm of graph
construction and manipulation. The type definition is:
// Value represents a value which can be serialized by the gg text format.
type Value struct {
// Only one of these fields may be set
Name *string
Number *int64
Graph *Graph
// Optional fields indicating the token which was used to construct this
// Value, if any.
LexerToken *LexerToken
}
type Graph = graph.Graph[Value, Value] // type alias for convenience
Note that it's currently only possible to describe three different types in a
gg
file, and one of them is the Graph
! These are the only ones needed to
implement a fibonacci function, so they're all I implemented.
The lexing/parsing of gg
files is not super interesting, you can check out the
package code for more details. The only other thing worth noting is that, for
now, all statements are required to end with a ;
. I had originally wanted to
be less strict with this, and allow newlines and other tokens to indicate the
end of statements, but it was complicating the code and I wanted to move on.
Another small thing worth noting is that I decided to make each entire .gg
file implicitly define a graph. So you can imagine each file's contents wrapped
in curly braces.
With the gg
package out of the way I was able to finally parse ginger
programs! The following is the actual, real-life implementation of the fibonacci
function (though at this point it didn't actually work, because the VM was still
not implemented:
out = {
decr = { out = add < (in; -1;); };
n = tupEl < (in; 0;);
a = tupEl < (in; 1;);
b = tupEl < (in; 2;);
out = if < (
isZero < n;
a;
recur < (
decr < n;
b;
add < (a;b;);
);
);
} < (in; 0; 1;);
Finally, the meat of all this. If the graph
and gg
packages are the sturdy,
well constructed foundations of a tall building, then the vm
package is the
extremely long, flimsy stick someone propped up vertically so they could say
they built a structure of impressive height.
In other words, it's very likely that the current iteration of the VM will not be long for this world, and so I won't waste time describing it in super detail.
What I will say about it is that within the vm
package I've defined a new
Value
type, which extends the one defined in gg
. The necessity of
this was that there are types which cannot be represented syntactically in a
.gg
file, but which can be used as values within a program being run.
The first of these is the Operation
, which is essentially a first-class
function. The VM will automatically interpret a graph as an Operation
when it
is used as an edge value, as has been discussed in previous posts, but there are
also built-in operations (like if
and recur
) which cannot be represented as
datastructures, and so it was necessary to introduce a new in-memory type to
properly represent operations.
The second is the Tuple
type. This may seem strange, as ginger graphs already
have a concept of a tuple. But the ginger graph tuple is a vertex type, not a
value type. The distinction is small, but important. Essentially the graph tuple
is a structural element which describes how to create a tuple value, but it is
not yet that value. So we need a new Value type to hold the tuple once it has
been created during runtime.
Another thing worth describing about the vm
package, even though I think they
might change drastically, are Thunk
s:
// Thunk is returned from the performance of an Operation. When called it will
// return the result of that Operation having been called with the particular
// arguments which were passed in.
type Thunk func() (Value, error)
The term "thunk" is borrowed from Haskell, which I don't actually know so I'm probably using it wrong, but anyway...
A thunk is essentially a value which has yet to be evaluated; the VM knows
exactly how to evaluate it, but it hasn't done so yet. The primary reason for
their existence within ginger is to account for conditionals, ie the if
operation. The VM can't evaluate each of an if
's arguments all at once, it
must only evaluate the first argument (to obtain a boolean), and then based on
that evaluate the second or third argument.
This is where graph.MapReduce
comes in. The VM uses graph.MapReduce
to
reduce each edge in a graph to a Thunk
, where the Thunk
's value is based on
the operation (the edge's value) and the inputs to the edge (which will
themselves be Thunk
s). Because each Thunk
represents a potential value, not
an actual one, the VM is able to completely parse the program to be executed
(using graph.MapReduce
) while allowing conditionals to still work correctly.
EvaluateEdge is where all that happens, if you're interested, but be warned that the code is a hot mess right now and it's probably not worth spending a ton of time understanding it as it will change a lot.
A final thing I'll mention is that the recur
operation is, I think, broken. Or
probably more accurately, the entire VM is broken in a way which prevents
recur
from working correctly. It does produce the correct output, so I
haven't prioritized debugging it, but for any large number of iterations it
takes a very long time to run.
Finally, to show it off! I put together a super stupid eval
binary which takes
two arguments: a graph to be used as an operation, and a value to be used as an
argument to that operation. It doesn't even read the code from a file, you have
to cat
it in.
The README documents how to run the demo, so if you'd like to do so then please clone the repo and give it a shot! It should look like this when you do:
# go run ./cmd/eval/main.go "$(cat examples/fib.gg)" 8
21
You can put any number you like instead of 8
, but as mentioned, recur
is
broken so it can take a while for larger numbers.
The following are all the things I'd like to address the next time I work on ginger:
gg
Allow for newlines (and )
and }
) to terminate statements, not just
;
.
Allow names to have punctuation characters in them (maybe?).
Don't read all tokens into memory prior to parsing.
vm
Fix recur
.
Implement tail call optimization.
General
A bit of polish on the eval
tool.
Expose graph creation, traversal, and transformation functions as builtins.
Create plan (if not actually implement it yet) for how code will be imported from one file to another. Namespacing in general will fall into this bucket.
Create plan (if not actually implement it yet) for how users can extend/replace the lexer/parser.
I don't know when I'll get to work on these next, ginger will come back up in my rotation of projects eventually. It could be a few months. In the meantime I hope you're as excited about this progress as I am, and if you have any feedback I'd love to hear it.
Thanks for reading!
Published 2021-12-31
This post is part of a series.
Previously: The Syntax of Ginger
Next: Ginger Names
This site can also be accessed via the gemini protocol: gemini://mediocregopher.com/