My goal is to have the compiler be able to generate custom-made data structures for any particular program. Basically, the programmer tells the compiler what kind of behavior is required, and the compiler designs a program that satisfies that specification. But of course that description is so vague that it might as well describe any programming language higher-level than assembler. So here it is in more detail.
The example that I keep returning to is that of a list. There are several different possible implementations of the list concept -- arrays, linked lists, block lists, text files, database tables, etc. All of these implementations have different performance characteristics -- some allow random-access, some allow constant-time insertion, some are persistent, some can be resized more quickly, etc. However, all the implementations also conform to the same specification -- they store a bunch of data in order.
In object-oriented languages, all of the List implementations would implement a common List interface. You can think of an OO interface as being a kind of weak specification. My goal is to have the programmer only write the specification, and leave all the implementation work up to the compiler. Now obviously you can't just tell the compiler the names, arguments and return types of all the methods that the class should implement. It also needs to know the important stuff; like if you add an element to a list, then you should be able to get it back later by passing a particular integer index to the get() method. However, that kind of constraint isn't even mentioned in most object-oriented languages (although Eiffel gets close). So clearly I need to have more detailed specifications than just an interface definition.
Therefore I present to you my specification of a list. I'll explain the notation I use after the example.
class List
__init__():
size() = 0
size() -> result:
illegal if result < 0
add(elem) -> index:
index = size()
size() = size() + 1
get(index) = elem
get(index) -> elem:
illegal if index < 0
illegal if index >= size()
remove(index):
illegal if index < 0
illegal if index >= size()
for i in index .. size():
get(i) = get(i+1)
size() = size() - 1
Let's start with the __init__() method. This is the constructor. All it does is initialize size() to zero. But it looks weird, doesn't it? It looks like it's calling size(), and then assigning to the return value. But actually I'm just using the = sign in an unusual way. You should think of the = operator as meaning "will return", rather than assignment. So the create() method is actually saying that after it returns, any future call to size() will return zero.
Next let's look at the size() method. It returns a value which we are calling "result". You'll notice that no type is given -- we don't say whether it returns an integer or a string or what. I'll justify that later, but for now just go with your intuition that it probably returns an integer. The next thing you'll notice about the size() method is that even though it says that it returns a value, it doesn't say what that value actually is. That's because we're relying on other calls to say what the return value should be. For example, we've already seen that when the object is first created, the create() method sets the return value of size() to zero. The add() and remove() methods also modify the return value of size().
Now for the add() method. First, add() states that its return value should be whatever the value of size() was at the time add() was called. So if you call add() right after create(), then it will return zero. It also says that after calling add(), the return value of size() should be incremented by one. And lastly, it says that the element that was added should be retrievable later, by calling get() with the correct index.
The get() method is similar to size() in that it relies on other methods to set up its return values, so there's nothing for it to do. In general, all read-only functions will have nothing to do.
Now check out remove(). It slides all the elements after the given index down by one. In essence, it's covering up the element at the given index, thereby making it unretrievable (it's the compiler's responsibility to make sure that unretrievable data doesn't take up memory, therefore the element at the given index must be actually removed from the list).
That's it! That's the entire specification for lists. A more practical specification would also have definitions for contains(), reverse(), indexOf(), etc. But this is an example, so I've kept it short. The point is that this single specification suffices to define the behavior of all possible list implementations.
What I haven't shown is how client code might use a list. Clients can specify additional constraints on their particular list implementation. For example, one client might state that its lists should use the least possible amount of memory. Another might specify that the add() method should run in constant time. And yet another client might request that its list be persistent across invocations of the program, and that each user should have her own copy. It is based on these additional requirements that the compiler decides which implementation to use for a particular client -- whether it is based on arrays, linked lists, databases, etc. However I want to be clear here: the compiler is not choosing between a fixed number of implementations that it has stored in a library somewhere. Instead, it actually generates a custom implementation for each client, to meet its particular requirements.
You can email me: kimbly - at - kimbly - dot - com.
Last updated May 21, 2003
Back to Kimberley's Code.
Back to Kimberley's Home Page.