Back to the Series's Index

How to Program Good - Chapter 2: Data and Interfaces

Chapter 1 discussed the standard by which we will be judging Good Programs: reliability. Reliable programs, given the same inputs, do the same thing everytime. In order for an overall program to be considered reliable, it must be created using pieces which are themselves reliable. It's certainly possible to create an unreliable program from reliable parts, but the inverse is completely impossible.

There are many different kinds of parts we could choose as our fundamental building blocks, but (if you couldn't guess from the chapter title) we will be choosing these: data and interfaces. Why these? Because they work well for our purpose, and they are always available no matter the language or circumstance.

Data - The Inert Material

In the context of a program we generally think in terms of "variables", which name some value in a certain scope, like `A = 6` or `F = open("some file.txt")`. We're going to separate these values into two categories: data and non-data. Our definition of data might be different from another that you've seen, but in the context of this series we will stick with this:

Data is any value which doesn't lose meaning by being moved.

The integer 6 is always 6. No matter if it's moved from stack to heap, memory to network socket, my machine to yours, today to tomorrow; it's always just 6. The same can be said for strings, floats, decimals, booleans, and possibly other types specific to certain languages. Arrays, structs, lists, maps, sets, and other data structures retain this property as well.

So what falls into non-data? File handles, closures, sockets, mutexes, process IDs... these are all values which can exist in a program at runtime, and some of them might even be able to be represented as something resembling data (e.g. process IDs), but their actual meaning does not survive being moved from one process to another, from one machine to another, or from one day to the next.

The fact that data can be moved freely is not exactly what makes it useful to us, but rather that free movement necessarily removes two key sources of unreliability: time and place. Non-data is only usable to the extent that it exists in the right place at the right time, and by being tied to these two moving targets an element of uncertainty seeps in. You will notice that usage of non-data is where the vast majority of error handling in programs takes place. Of course we can't avoid the usage of these tools, and we shouldn't be afraid of them, but it's important to recognize that we should prefer data when we can use it.

Interfaces - Describing the Moving Parts

We want our programs to do things, but data is inert and can't do anything on its own. It's the rest of our programs, the functions and classes and methods, that define the behavior of our programs. But as a program grows we quickly find the need to organize this behavior in order to make sense of it, and this process of organization takes the form of drawing imaginary borders between different sections, each with its own responsibilities. One part of a program does one thing, another part does something else, and neither part knows or cares about the details of what the other is doing.

This concept is called encapsulation. The idea is to construct a blackbox of code, in this series called a "component", which accepts some inputs and returns some outputs accordingly, but whose inards are not known or even important. Each component can be thought of as a small program-within-the-program, and so is judged against the same criteria as our overall program: reliability. Given the same inputs, does the component output the same outputs every time? This correspondence of inputs to outputs brings us to our second building block: the interface.

An interface is a mechanism which both describes the behavior of a component and disconnects its implementation from that description.

The correspondence of inputs to outputs described by the interface is its "contract", and for a component to implement the interface it must follow that contract. The description of behavior nearly always takes the form of a set of functions or methods implemented together, along with documentation of each function/method and the interface as a whole. Generally a language will consider some type to implement an interface if that type implements every method in the interface. In dynamic languages there might not need to be an explicit definition of an interface in the code, so it will be up to the programmer to document and assert for themselves that a component implements an interface.

An example definition of an interface and its associated types, written in a made-up language, might look like this:

# Contact describes the pieces of information about a person which can be stored
# in an addressBook. Contacts are identified by their name field.
Contact struct {
    name        String
    address     String
    phoneNumber String
    email       String
}

# An addressBook can store Contacts and retrieve them later.
addressBook interface {

    # set stores the given Contact for later retrieval using the list and get
    # methods. If set has been previously called with a Contact sharing the same
    # name then that old Contact is replaced by this new one.
    set func(contact Contact) -> ()

    # list returns all Contacts which have been added to the AddressBook using
    # set, sorted by their name.
    list func() -> []Contact

    # get returns the Contact which was most recently passed to set sharing the
    # given name, or returns None if no such Contact has ever been passed to
    # set.
    get func(name String) -> Maybe<Contact>
}

Note the documentation of the interface. While the name and signature of each method provides a pretty good idea of the overall behavior of the interface, the documentation specifies behavior which types can't capture, and is as much a binding part of the contract as they are. What does `set` do if the Contact has been passed to it before? In what order are the Contacts returned from `list`? If some caller of an interface method depends on that method behaving a certain way, that behavior MUST be incorporated into the method's documentation, therefore becoming part of the contract.

A basic component which implements this interface might look like this:

# inMemoryAddressBook implements the addressBook interface using an in-memory
# Map type. All Contacts stored in the inMemoryAddressBook will be lost once
# it goes out of scope.
inMemoryAddressBook Map<String, Contact>

(self *inMemoryAddressBook) implements addressBook {
    set func(contact Contact) -> () {
        self.store(contact.name, contact)
    }

    list func() -> List<Contact> {
        contacts = self.getAllValues()
        return contacts.sortedBy(func(c Contact) -> String { return c.name })
    }

    get func(name String) -> Maybe<Contact> {
        return self.getValue(name)
    }
}

An interface may have multiple implementations which might get used interchangeably depending on configuration or user input. The parts of the code which are relying on an interface only see that there is something which implements it, and that's all they care about. This is the most important property of interfaces, and is what makes them so useful. It allows for different sections of a codebase to expand, contract, and be replaced without affecting any other parts. As long as the same inputs continue to produce the same outputs the rest of the codebase can remain ignorant of what is happening inside.

Here we're going to add another implementation of the `addressBook` interface we defined previously, and then show how some code might use one implementation or the other depending on runtime circumstances:

# kvAddressBook implements the addressBook interface by using a kvDatabase to
# store and retrieve Contacts.
#
# In this example kvDatabase is itself some interface describing a persistent
# key-value database which has methods set, get, and getAll. For simplicity this
# example ignores error handling both for interactions with the database and
# JSON encoding/decoding.
kvAddressBook kvDatabase

(self kvAddressBook) implements addressBook {
    set func(contact Contact) -> () {
        self.set(contact.name, json.encodeToString(contact))
    }

    list func() -> List<Contact> {
        contacts = self.getAll().map(func(key String, val String) -> Contact {
            return json.decodeFromString<Contact>(val)
        })
        return contacts.sortedBy(func(c Contact) -> String { return c.name })
    }

    get func(name string) -> Maybe<Contact> {
        if val = self.get(name); val.some() {
            return Some(json.decodeFromString<Contact>(val))
        }
        return None
    }
}

main func() -> () {

    # We will initialize thisAddressBook using one of the two possible
    # implementations, depending on an environment variable.
    thisAddressBook addressBook

    if os.getenv("IN_MEMORY") != "" {
        thisAddressBook = inMemoryAddressBook{}
    } else {
        kvDB = # instantiate kvDatabase
        thisAddressBook = kvAddressBook{kvDB}
    }

    # thisAddressBook can now be used as an addressBook, regardless of its
    # implementation.
}

Changing an Interface

Changing the contract of an interface is not something which is done lightly, but will certainly happen, especially in the draft phase of a program where the groundwork is still being laid.

When an interface is changed it is required that all components which implement the interface are carefully checked to be following the new contract. On the other side, all places which are calling methods on the interface must also be checked to ensure they are still using the interface properly.

This checking is not optional, and if it seems tedious then it merely underlines how important simplicity is in interface design, especially as an interface becomes used more and more. On the other hand it's quite easy to identify all places which need to be checked: simply search for all places where the interface is used.

A static type checker can helpful here, but keep in mind that behavioral changes to a contract might not result in changes to the method type signatures. And while data types are not interfaces, changing the definition of a type does constitute a change to the contract of every interface which uses that type.

Sometimes changes to an interface can be made in a way which doesn't require going back and checking all places where the interface is used. These changes are called "backwards-compatible" changes. Perhaps previously a method accepted an integer argument which couldn't be negative, but it now accepts negative values too. As long as the behavior for positive integers remains the same then this is a backwards-compatible change; the previous correspondence of inputs to outputs has not been modified.

Backwards compatibility is always preferred, but often requires undocumented assumptions about the interface. For example, adding a field to a struct is usually backwards-compatible, as long as the field's default value is consistent with the struct's meaning without the field. But what if somewhere in the codebase somebody used a data serializer which assumes the struct's fields don't change? Adding a field to the struct would invalidate all previously serialized forms of it! So do we consider struct field additions to be backwards-incompatible? Usually not, but if you're aware of the usage of such a serializer in your codebase you need to go and double check.

Just Use Data

Designing interfaces which are generic enough to hide their implementation, specific enough to be useful, and simple enough to rarely need changing, is more of an art than a science. Later chapters will provide different ways to think about the overall structure of your program to help see how components work together to form a larger whole, and these different angles on the problem may make it easier to find solutions, but ultimately good interface design is just a matter of practice. This section describes a rule which should be followed when defining an interface, and by reducing the field of possible solutions this rule will also help with tricky design problems.

In almost all cases the inputs and outputs of an interface should be just data. When data crosses the interface boundary (either being given as an input, or being returned as an output) its meaning is not tied to the state of the caller or component, and so can be treated just as it is. When non-data is passed across the interface boundary then so is the implied state of the caller/component associated with that non-data. This implied state then necessarily must become part of the documented interface contract, leaving it more complex to understand and more difficult to correctly implement.

At the point an interface becomes too complex to document or understand it becomes tempting to toss it away and use the component type directly. This is a mistake. We'll see in later chapters that the interface is a crucial tool in making components thoroughly testable, and by neglecting thorough testing we instead must rely solely on our own understanding of the component and manual testing to ensure it works correctly. But if we threw away the interface because the behavior was too complex to understand or document, can we rely on our own understanding of it when manually testing? No, we can't.

There are rare cases where non-data being passed across the boundary is worth the trade-off. A notable example for me is Go's `Context` type, which allows a caller to cancel a blocking call from outside. This tool simplifies a lot of situations in asynchronous programming which would otherwise require the use of message passing or callbacks. Other such examples may exist, and there may be specific cases where more complex contracts are required to account for performance considerations. But these cases should be met with skepticism, require significant justification, and of course be fully documented in the contract.

Everything is Interfaces

So far we've considered the interface as being a mechanism provided by a programming language which allows us to encapsulate a set of behavior in a useful way. But there are other kinds of interfaces. If you consider every programming-related acronym ending in an "I", the "I" probably stands for "interface": API, CLI, GUI, ABI, TUI, CNI. In all cases the interface is the surface layer of a larger entity, like the remote control of a TV, or the shell of an egg. All the same rules we've been talking about for language-level interfaces apply to these as well: prefer data, prefer backwards-incomptable changes, clearly document all contractual behavior.

Even interfaces which face a human, rather than a machine, like CLIs and GUIs, deserve this treatment. Remember our purpose, which is to create programs which behave so reliably that they become an extension of the user themselves. If the user's interface into the program is constantly changing, like the toy hammer which dents and warps over time, this extension can never happen. Sure, new functionality can be added, but if existing functionality must be changed then the user must explicitly opt-in, and we can't blame them if they never do.

Next Chapter

Now that we have our basic tools in hand it's time to start discussing how to use them in more detail. The next chapter will discuss the composition of components into larger components and ultimately into large, complex programs which nevertheless behave reliably at all stages of their execution (coming soon!)


Hi! I'm available for remote contract work. You can learn more about me and my skillset by browsing around this site, then head over to my resume site to find my work history and professional contact form.


This site is a mirror of my gemini capsule. The equivalent gemini page can be found here, and you can learn more about gemini at my 🚀 What is Gemini? page.