Declarative Semantics

This post came out of the Blue Sky discussion. The disclaimer for that discussion applies here as well. I expect you to roll your eyes at how impractical it is.

Suppose I write "get http://www.kimbly.com/" in a program, exactly as shown. It won't compile in nearly any programming language. I have to write something more like this:

    url = new URL("http://www.kimbly.com/")
    url.get()

Having a reusable URL class is a lot better than having to work with sockets, but there's still a lot of noise there. From the user's perspective, I have to "quote" the url to turn it into a "string", and then I have to "pass" that string to the "constructor" of a "class". Then I "call" a "method" on the "object" that the constructor "returned". And so on. This is the price of modeling.

Application programmers shouldn't have to do modelling. Once a concept has been added to a library, the programmer should be able to talk about things directly, without having to use programming-language abstractions. This is Blue Sky.

Unfortunately I don't know of any good ways to express the semantics of "get http://www.kimbly.com/". I could make a macro that would let the literal code compile into something, but what should it evaluate as? An anonymous function? A string containing the html source? A structured object that lets me request the title and so on? Any decision would be premature, because URLs in themselves are not about executable code. They're not an algorithm, or a data structure. They're just a concept.

We don't have any well-known techniques for working with abstract concepts like this -- instead, the macro writer is forced to decide on a single executable "meaning". They decide whether "get <url>" should mean eagerly fetching the url or looking it up in a cache, whether it should mean parsing the result or just leaving it as a string, and whether the whole thing should be stored in memory or written to disk.

There is no way for me to write a declarative description of what a URL really is in a way that a programming language could understand. I want to be able to explain that it's not only a string of letters, but that it can be meaningfully broken into pieces (protocol, host, port, path) that may or may not be relevant. I want to be able to explain its relationship with the HTTP protocol. And I want to explain these things in a declarative way -- more like a knowledge base than a code library.

In fact, I want even more than that. I want other people to be able to add to, refine, or even override the statements that I made about URLs. You have to stop modeling at some point or else you'll never get anything done -- but maybe my model of URLs stopped too soon for some purposes. Perhaps I didn't care to mention 404s, or character set encoding. Someone else should be able to add declarative statements to my description, without requiring my cooperation, and without having to completely recreate my model. RDF is kind of like this, in that it lets people add new predicates and new statements, without having to coordinate with each other.

Currently, programming languages map concrete syntax into abstract programming-language concepts, which then get mapped into machine code. Application programmers never work with their problems directly -- instead, they work indirectly, by referring to and manipulating programming-language concepts like functions, objects, and types. At no point do they ever actually refer to the problem domain itself. I want to change this so that syntax instead maps to problem domain concepts, which then map to programming-domain concepts and so on to machine code. And I want to keep all the mappings separate.

I don't know exactly how to implement this idea (it is Blue Sky, after all), but I do have an intuitive feeling that it doesn't have to be as hard as we might think -- especially if you allow models to be incomplete, over-simplified, and even wrong sometimes.

Followups to Declarative Semantics:

Posted on August 29, 2003 03:00 PM
More languages articles

Comments

Good article. Interestingly enough, there is a language that will run something strikingly similar to your first example ("read http://www.kimbly.com", specifically) — REBOL which is focused on the internet "problem domain." Unfortunately, it's a closed language.

Posted by: Anode at August 29, 2003 06:56 PM

The map is not the territory, although it sounds like you want the map and the territory to be interchangible. That is definitely Blue Sky stuff here (and to think I first came across TUNES *six* years ago and wrote them off as a bunch of hand waving lunatics---interesting to see them still waving those hands around). Personally, I would much rather have F=(G*m1*m2)/r*r than drop the computer out the window.

Now, to go on to your example of the URL and trying to work with it directly. You are right, a URL is a concept but there is a declarative descritpion (written in such a way that a programming language could understand) of a URL in RFC-2396 (BNF format if your curious). But that's only half the battle. You go on to say that you want to declare the relationships of the parts of a URL to, say, HTTP (HTTP protocol is redundant in much the same was as PIN number) but that it is hard to do. Yes, I agree to that. What, exactly does

get http://www.kimbly.com/

mean? What does it return? And where does it return it to? (as written, I would expect $_ to contain a reference to something, but I'm not real keen on Perl myself) And more importantly, the action "get" doesn't map cleanly to every form of URL:

get mailto:sean@conman.org

What does that mean? I certainly hope it doesn't mean you get my email (although, given the level of spam, perhaps that isn't such a bad idea after all). The actions allowed via HTTP are: OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE and CONNECT, each with their own semantics (POST is different that PUT), but of those actions, only POST can be said to apply to mailto: and even then it's more of a pun than anything.

And how to map these upon ftp:? Or even file://localhost/etc/passwd?

Perhaps a different example then.

I personally would love to make Flash animations. I've tried the Macromedia Flash animation program but found I couldn't figure out how to work it. It's not like I don't understand animation---as a kid I wanted to be a comic artist and a film maker so I'm not unfamiliar with the concepts (8mm film runs at 16 fps, 16mm and 35mm run at 24 fps. Professional animations run at 12 fps and typically use some form of cell based animation where there characters are painted onto transparencies, although an animator like Ralph Bashki (personally I can't stand his work) has also used rotoscoped techniques; I could go on about this ... ). I would expect a program to work with cells, frames and clips. And perhaps tracks for sound. A frame is made up of multiple cells. Clips are sequences of frames with an associated track for sound (so I guess at a minimum, a clip requires two frames, and a frame requires at least one cell).

Hmm ... lots of declarative stuff there, but I'm still no closer to an animation program than I was before.

So how is this Blue Sky stuff supposed to work again?

Posted by: Sean Conner at August 29, 2003 11:26 PM

There's no reason you can't write (less line noise):
url = getUrl("http://www.kimbly.com")

If you language allows it then you could use this syntax:
url = getUrl "http://www.kimbly.com"

... but's that's just syntactic sugar.

(BTW, where do I find the text formatting rules for Moveable Type? I assume it's a bit wiki-ish.)

"Application programmers shouldn't have to do modeling."

Yes, they should. That's exactly what they should be doing. What I think you mean, though, is that application programmers shouldn't have to do modeling outside of their problem domain.

I was also a bit disturbed by this comment in the original discussion: "I think we should strive to make modeling unnecessary."

You can't make modeling unnecessary. Modeling is what writing software is all about; you can't avoid it. Modeling is the act of identifying what parts of your problem domain are relevant (because not all of them are) and capturing those in software. All we can do to improve this is to minimise the amount of unnecessary, incidental, modelling we often have to do.

"Application programmers never work with their problems directly... At no point do they ever actually refer to the problem domain itself."

In programming languages, types capture semantic information from the problem domain. Types also define what operations/behaviours are allowed. Compilers can use type information to tell you when you are doing something illegal/nonsensical. When you are using types, you are referring to the problem domain (or rather, the parts of it you want to model).

It's all very well to say "I want to express everything declaratively" but eventually your code has to be executed, on some machine, and so somehow, somewhere, actions must be performed. You must have some kind of underlying execution model that the programmer knows about. In the URL example, at some point software must perform the actions of the HTTP protocol to fetch the resource.

"... syntax instead maps to problem domain concepts, which then map to programming-domain concepts ..."

From the sounds of this, all you need are domain-specific languages, and easy ways to create them. Which we already have.

Posted by: Alistair Bayley at September 1, 2003 07:05 AM

Sean, you just described a semantic model for your hypothetical animation program. That's great. Now you need two more things: a mapping from syntax to semantics, and a mapping from semantics to implementation. (The mapping from implementation to machine code is currently handled pretty well by compilers.) Programmers and Computer Scientists currently don't really have a concept of mapping syntax to semantics to implementation. Instead we go straight from syntax to implementation.

I think that by explaining the semantic model of your animation program, you are closer to having a runnable program than you were before. Now the compiler has some idea of "what you're talking about". You're doing the same kind of thing when you sketch out classes on CRC cards, or when you assign types to various pieces of your program (as Alistair points out). Just because you haven't started coding doesn't mean that this design phase wasn't necessary.

Your closing question simply emphasizes that semantic models don't automatically imply an implementation. And I think that this is exactly why it's important to focus on semantic models as distinct from programming language implementation -- otherwise we're skipping a step. You may think that step is worthless, but I disagree. I think that coding without a semantic model is what "real men" do -- like writing programs directly in machine code.

Posted by: kim at September 2, 2003 11:05 PM

Alistair, thanks for clarifying what I said about application programmers not doing modeling outside of their application domain. That's what I meant. I was using "modeling" to mean "fitting the application to the structures provided by the implementation language". Also, I agree with your point about how types capture semantic information. Currently they're just about the only technique we have for doing so.

However, I don't think that DSLs are the answer to my prayers. A given DSL will have a particular syntax, and a particular semantic model, and a fixed set of implementation techniques. I want to separate all of these things, so that they can be chosen separately. But most importantly, I want the semantic model to be additive (i.e. composable with other semantic models). Every DSL I know of is based on a parser that creates a particular data structure representing the semantics of that DSL. This data structure is usually not extensible, and is frequently not even accessible. It is also usually highly tuned to implementation concerns (e.g. bytecode). What I want is to make this data structure be the actual output of the DSL, rather than merely an intermediate representation that is used as input for subsequent execution.

In a way, you can think of this as merely a different way of looking at macros and DSLs. And as with everything in programming, it's been done before. The goal is to make a language that encourages this style, and makes it useful.

Tangent: if we assume that types are the closest thing we currently have to a semantic model, then this leads to the question: are there DSLs for types?

And by the way, the formatting rules for Movable Type are straight HTML.

Posted by: kim at September 2, 2003 11:32 PM

I want the semantic model to be additive (i.e. composable with other semantic models).

What does this mean? And what do you mean that a given DSL will have a particular semantic model? I would have thought that the semantic model for a DSL would be determined by the domain being modeled i.e. it would be the semantic model from the domain (unless my idea - admittedly vague - of what "semantic model" means is wrong). Which would mean that you couldn't choose it, which would mean that there's no value in separating it (if that's possible).

are there DSLs for types?

Yes. They are the syntax/language you use to specify type information to compilers. In Haskell it's separated from the code syntax, whereas in Java it's integrated with the code syntax (look at the type signature for a function - it's specified in the function itself). Although... you could say that Java interface syntax is a DSL for types, because that's what Java's interfaces are - types.

Posted by: Alistair Bayley at September 3, 2003 05:03 AM

When I saw your "get http://www.kimbly.com/" I
immediately thought of some perl stuff
(In case you haven't guessed, I'm mostly
a perl guy). For example, look at something
like:
get http://www.kimbly.com/
Now, the program has to figure out what you mean,
just like a person who read the program would.
In perl, if you say 9 + "2.3" perl figures out
from the context that you want to treat the string as a number, and it does it's best to
interpret that string as a number. So, your
program might look at 'get' and decide it is
a postfix method or operator, then it could
look at http://www.kimbly.com/ and decide that
it is a url. Now, "decide this string is a
url" would probably be a pretty expensive
process relative to what most languages do
today, but this is basically what you are
asking for. Now, since the program has
decided that this string is a url 'object'
and there is the context of the unsatisfied
operation 'get', the progam executes the
get method of the url. Finally, there is no
LHS type context, so the program may throw
the result away, but a better response might
be to store the result in the default context,
which is called $_ in perl. Later, when you
write

get http://www.kimbly.com/
print

The program will understand that you want to
call the print method of the object that
resulted from the get method of the url
(could be an html page, or something else)

The more you generalize these sorts of
operations, the more expensive it is for the
machine, and the harder it is to figure out
when it doesn't do what you want, but this
is the cost of this DoWhatIMean approach.

-Kris

Posted by: Kris Bosland at September 9, 2003 11:01 AM
Post a comment









Remember info?




Prove you're human. Type "human":