Here's a very interesting paper, Extracting Queries by Static Analysis of Transparent Persistence, by Ben Wiedermann and William Cook. The idea is that instead of constructing a SQL query and then iterating over the results, you just write a program as if all the records in the database are available, and the language automatically figures out what (efficient) SQL query to construct. If-statements are automatically turned into where clauses, and reference traversal is automatically turned into a relational join. Unfortunately the paper doesn't cover the more interesting cases of aggregation or grouping.
The paper focuses on relational databases, but I think it would be really cool to apply the approach to the custom data structures used in scalable distributed systems. A common pattern in distributed systems is fan-out/fan-in. That is, you have some kind of coordinating server which receives an original query, sends subqueries to lots of secondary servers, and then aggregates the results. Google's MapReduce is a variation on this pattern, and Endeca used a similar technique as well. It would be great if you could program the coordinating server as if the entire data store of the secondary servers were directly available -- that would simplify the tedious coding of query languages and communication protocols.
Similar considerations apply to client/server programming, although there you would also have to consider the security implications (often you want to retain tight control over the server api, to protect against abuse by rogue clients).
Posted on December 19, 2006 12:14 PM
More languages articles