Cyclone

Yesterday I went to a talk at Harvard, titled "Implementation and Evaluation of a Safe Runtime in Cyclone", by Matthew Fluet.

In this talk we outline the implementation of a simple Scheme interpreter and a copying garbage collector that manages the memory allocated by the interpreter. The entire system including the garbage collector is implemented in Cyclone, a safe dialect of C, which supports safe and explicit memory management...

Greg Morrisett, one of the designers of Cyclone, was also present and answered some of the more advanced questions about Cyclone itself.

Cyclone is a "safe" version of C, and it uses static analysis to catch memory leaks. The talk focused on how to write a garbage collector such that the implementation would be accepted by Cyclone's (strict) memory leak detection requirements. First I'll explain how memory leak detection is done in Cyclone, and then I'll explain why it's not trivial to implement a garbage collector using the scheme.

Cyclone's memory leak detection is based on regions. There are several different types of region, but the relevant kind for this talk was the "dynamic" kind. The main characteristics of dynamic regions are that you can allocate new objects into them incrementally, but they have to be freed all at once in one big batch. Perfect for a stop-and-copy garbage collector.

Cyclone restricts the operations you can perform on regions so it can guarantee that you don't leak regions, and that you don't dereference data within a region after it has been freed. This static analysis is based on the idea of capabilities (unforgeable, uncopyable unique handles). Briefly, all the data in a Cyclone program is statically tagged with the region it belongs to (the tag is part of the type, so "an integer in region 1" is different than "an integer in region 2"). You can't actually *use* the data in a region unless you present the requisite capability to the compiler (each region has its own capability). This is done syntactically, kind of like this:

present (region_capability) {
    ... data in the region can be accessed here ...
}
... but not here...

Note that I used my own syntax here. The actual syntax used by Cyclone seems less intuitive to me.

The Cyclone compiler makes it impossible to leak regions by making it illegal to let a capability go out of scope. It also prevents you from referencing previously-freed memory by revoking the capability for the region as soon as you free it. It's actually a pretty simple and straightforward scheme, so if it seems confusing it's probably because I've explained it poorly.

The garbage collector that Matthew Fluet implemented uses a simple stop-and-copy scheme. In other words it takes an old heap, traces through it to find memory still in use, copies that the memory to the new heap, and leaves a "forwarding pointer" in the old heap so that the garbage collector can handle cycles correctly.

It's the forwarding pointers that make things difficult. Because all datatypes have an embedded region name which specifies what region they belong to, it makes it difficult to come up with a type name for the forwarding pointers. Matthew had a very amusing slide that illustrated this. You end up with a type like this:

A pointer to a block in the region after this one, which has a pointer to a block in the region after the one after this one, which has a pointer to a block in the region after the one after the one after this one, which has a pointer to a block in the region after the one after the one after the one after this one, which has ...

To get around this, they introduced a new primitive into the Cyclone runtime, which lets you express this kind of recursion. There was some explanation about how they "used an existential to prime the pump", and some other feature which made some of the verbosity in the type signature become implicit... but I didn't really follow that part.

However what I did glean from the discussion was that Cyclone's straightforward memory leak detection algorithm doesn't lead to a small, well-defined interface. Case in point: in order to implement a garbage collector they had to extend the runtime. It was a small function (only 5 lines of code) but nevertheless it was not possible at the user level. In effect, they had to introduce a new proof rule into the type system. That's disturbing to me -- it means the type system is incomplete. And these are smart people, so if it wasn't complete to start with, that probably means it'll never be complete.

What's even more disturbing is that at the end of the talk, Norman Ramsey asked how they would implement a generational garbage collector. And the answer was, "We haven't figured out how to do that yet... it may require dependent types." Eeek! Suddenly I realized at a deep level that advanced type systems are equivalent to mathematical proofs, and that if you start trying to prove too much about such an unpredictable beast as a program, it will resist you with a vengence.

Just imagine trying to explain to a systems programmer that the reason they can't implement a "simple" generational GC scheme is because that would require that the type checker be extended to support dependent types. They'll look at you like you have three heads, and then they'll toss you and your silly little language into /dev/null.

Posted on February 19, 2004 06:42 PM
More languages articles

Comments

Suddenly I realized at a deep level that advanced type systems are equivalent to mathematical proofs...

This is a deep observation: it's the Curry-Howard isomorphism, which shows the relationship between types and programs on the one hand and logical propositions and their proofs on the other.

Just imagine trying to explain to a systems programmer that the reason they can't implement a "simple" generational GC scheme is because that would require that the type checker be extended to support dependent types.

That's a little unfair: just because something requires dependent types doesn't mean it's infeasible. There are a number of restricted forms of dependent types that are still decidable, e.g., Dependent ML. On the other end of the spectrum there are languages such as Cayenne that abandon the requirement of decidability in the type checker and allow full dependent types. And then there's things like Haskell's type classes, which allow you to simulate some aspects of dependent types.

However, I don't disagree with you: statically predicting the behavior of a program is just plain hard. (Rice's theorem is one such slap in the face.) Striking a balance between a type system generous enough to allow most "good" programs and powerful enough to disallow most "bad" programs seems to be really, really difficult.

...if you start trying to prove too much about such an unpredictable beast as a program it will resist you with a vengeance.

Well said!

Posted by: Dave Herman at February 19, 2004 10:23 PM

Thanks for the summary Kim. As Dave notes in his comment, the Curry-Howard isomorphism shows a relationship between programs/type systems and logical propositions/proofs. The upshot of that isomorphism is that Godel's incompleteness theorem applies to type systems. Godel's theorem says that a proof system cannot be both complete and correct. Because of the isomorphism we then know that a programming language's type system cannot be both complete and correct. That is, every type system will either prevent you from doing some of the things you ought to be able to do; or it will allow you to do things you shouldn't.

Mostly, designers of statically type-checked languages aim for the first approach, guaranteeing that all programs that pass the type checker are typesafe. This can usually be done in such a way that users don't feel too enslaved. Despite good efforts, however, many languages (and probably even more compilers) have type checkers that give their seal of approval to unsafe code. I know, for example, that it is possible to do type-unsafe things in Java in limited cirucmstances involving subtypes and arrays. This is by design of the Java Virtual Machine. But while these kinds of holes sound dangerous, they are mostly of theoretical significance since the circumstances where they arise are, well, super mega weird.

As for you last paragraph, "Just imagine trying to explain to a systems programmer...", this is why weak type systems are useful and popular. Cyclone sounds like it's trying to add a strong type system feature (safe memory allocation) into a weak type system (C), and there's bound to be some impedance mismatch there.

By the way, I learned of your blog from John Sequeira, who I met at a going-away party for Gregor. I recently started my first software engineering job out of college and am enjoying your writing.

Posted by: Shimon Rura at February 20, 2004 10:40 AM
Post a comment









Remember info?




Prove you're human. Type "human":