<?xml version="1.0" encoding="iso-8859-1"?>
<!-- generator="Movable Type/2.64" -->
<rss version="0.91">
  <channel>
    <title>Daily Abstraction</title>
    <link>http://www.kimbly.com/blog/</link>
    <description>Thoughts on programming languages, dissident politics, and the rest of my life.</description>
    <language>en-us</language>
    <webMaster>kimbly@kimbly.com</webMaster>
    <pubDate>Tue, 18 Mar 2008 20:09:49 -0500</pubDate>
    <item>
      <title>Lockhart&apos;s Lament</title>
      <link>http://www.kimbly.com/blog/000513.html</link>
      <description><p><a href="http://www.maa.org/devlin/LockhartsLament.pdf" >Lockhart's Lament</a>.  An essay on why the pre-college math curriculum sucks. Favorite quote: "Be honest: did you actually even read it?  Of course not.  Who would want to?"  [Said of the formal proof style introduced in Geometry].</p>

<p>I remember once arguing with a teacher over whether I should be required to prove that 2 is an even number.  It seemed so blatantly obvious to me that I couldn't even imagine how to prove it.  Turns out my teacher wanted me to write something like "2/2 == 1, and 1 is a whole number, so 2 is even".  My jaw went slack.</p></description>
    </item>
    <item>
      <title>The amazing color changing card trick</title>
      <link>http://www.kimbly.com/blog/000512.html</link>
      <description><p>This really is amazing.  Be sure to watch all the way through.</p>

<p><p><br />
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/voAntzB7EwE"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/voAntzB7EwE" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object><br />
</p></p>

<p>(via <a href="http://www.thevalve.org/go/valve/article/a_lesson_in_perception_and_attention/">The Valve</a>)</p></description>
    </item>
    <item>
      <title>StreamBase hiring PLT experts</title>
      <link>http://www.kimbly.com/blog/000510.html</link>
      <description><p>StreamBase is looking to <a href="http://www.streambase.com/careers/PrincipalSoftwareEngineer.php" >hire a programming language expert</a>, partly to fill the gap I'll be leaving behind.  The opportunity to work on a programming language full-time doesn't come along all that often, so if you're into languages you may want to check it out.</p>

<p>There are several interesting parts to the job:</p><p>First, StreamBase actually has two "languages" -- one is based on SQL, and one is based on graphical boxes-and-arrows.  They're equally expressive, but the differences extend to more than just surface syntax.  Stream processing is a relatively new paradigm, so we're looking for people with experience in lots of paradigms, such as functional, constraint, and logic programming.</p>

<p>Second, the company is interested in standardization, so you just might end up with your name on the industry standard for streaming event processing.  Yes that's right, you could be the next Guy Steele ;)</p>

<p>Third, we're adding significant new features every few months, so there's a lot left to do.  I'd love to provide a list just to get you excited, but I suspect the company would prefer I not tip off  competitors.  There's a short list in the job description if you're interested, but unfortunately it doesn't include any of what I think are the really interesting ideas.</p>

<p>Fourth, you'd be working in Kendall Square, just a few minutes from CSAIL.  That means the opportunity to mosey on over to MIT for interesting lectures.  It also means not having to drive a car to work.</p>

<p>Anyway, if you're reading my blog because you like the PL thoughts that I post (all too rarely these days), then I bet you'd find the job interesting.  If you're a true PLT geek, then you'd be a fool if you didn't at least find out more about it.  If you think you'd like to apply, I encourage you to contact me directly to make sure your resume gets noticed.  Send email to &lt;kim.burchett at gmail.com&gt;.</p></description>
    </item>
    <item>
      <title>Zero variance</title>
      <link>http://www.kimbly.com/blog/000509.html</link>
      <description>I noticed today that by using complex numbers you can create a set of values whose variance is zero, even though not all the numbers are equal.

<p>Let X be a set of real numbers.  Let Y be a set of complex numbers evenly spaced along the circle whose center is at mean(X) and whose radius is stdev(X).  Y doesn't have to have the same number of points as X.  In fact, you can even take the limit and let Y be the set of all points on the circle.  Here's a picture of what I'm talking about, where X is the set {0..10}:

<center><img border=0 width=566 height=406 src="/blog/pics/zero_variance.PNG"></center>

<p>The mean of Y will be equal to the mean of X, but the variance and standard deviation of Y will be 0.  Furthermore, the variance of the real components of Y (ignoring the imaginary components) is equal to half the variance of X.

<p>It would be even more interesting if you could define Y so that its real components were exactly the same as the real components of X.  However in the case where X consists of two points, this require the imaginary components of Y to be +/- infinity, so I'm suspicious.

<p>I'm not sure whether this kind of thing is actually useful or not, but it does at least provide an interesting interpretation of the meaning of variance.  The fact that the variance of the real part of Y is proportional to the variance of X also makes me wonder whether there's a connection here to the idea of marginal distributions.

<p>Here's a bit of scheme code that shows what I'm talking about.

<pre>
(define pi 3.14159265358979323846)

(define (curry f x) 
  (lambda y (apply f x y)))

(define (n-downto-0 x)
  (if (= 0 x) '(0) (cons x (n-downto-0 (sub1 x)))))

;; returns a list of all the n-th roots of x
(define (roots x n)
  (map (lambda (i) (make-polar (* x (expt 1 (/ 1 n)))
                               (* 2 pi (/ i n))))
       (n-downto-0 (sub1 n))))

(define (avg vals)
  (/ (apply + vals) (length vals)))

(define (variance vals)
  (let ((mean (avg vals)))
    (avg (map (lambda (x) (sqr (- x mean))) vals))))

(define X (n-downto-0 100))
(define Y (map (curry + (avg X)) 
               (roots (sqrt (variance X)) (length X))))

(variance X)
(variance Y)
(* 2 (variance (map real-part Y)))
</pre></description>
    </item>
    <item>
      <title>Nerd ABCs</title>
      <link>http://www.kimbly.com/blog/000508.html</link>
      <description><p>Here's a wonderful set of <a href="http://tiffanyard.com/nerd.htm" >alphabet flash cards based on science</a>.</p>

<p><center><a href="http://tiffanyard.com/nerd.htm" ><img border=0 src="/blog/pics/A.jpg" width=100 height=140></a> <a href="http://tiffanyard.com/nerd.htm" ><img border=0 src="/blog/pics/B.jpg" width=100 height=140></a> <a href="http://tiffanyard.com/nerd.htm" ><img border=0 src="/blog/pics/C.jpg" width=100 height=140></a></center></p></description>
    </item>
    <item>
      <title>The root to solving the problem</title>
      <link>http://www.kimbly.com/blog/000507.html</link>
      <description><p>Here's an article saying that <a href="http://www.chicagotribune.com/news/nationworld/chi-0702230087feb23,1,1003665.story">high schoolers are taking tougher courses and getting higher grades, but apparently learning <i>less</i></a>.  There's a very ironic quote in the article:</p>

<blockquote>
"We know the root to solving the problem is having more rigor in classes, starting in 9th grade," said Norma Rodriguez, chief of high school curriculum and instruction for Chicago Public Schools.
</blockquote>

<p>I bet this quote came from the written transcript of a spoken interview.  And I bet that Norma Rodriguez was actually talking about the <i>route</i> to solve the problem.  Apparently journalists (and their editors!) are aren't immune to the national trend towards lower reading scores.</p></description>
    </item>
    <item>
      <title>Brookline Home for Sale</title>
      <link>http://www.kimbly.com/blog/000506.html</link>
      <description><p>Since I'm going to be <a href="/blog/000505.html" >moving to New York</a>, we're selling our <a href="http://www.20parkway.com/" >Brookline condo</a>.  If you know anyone who might be interested please spread the word -- especially if you know someone who works near the Longwood Medical Area.  The first open house will be this Sunday, Feb 25.</p>

<p><center><a href="http://www.20parkway.com/" ><img src="http://www.20parkway.com/pics/front1.jpg" width=320 height=240 border=0></a></center></p>

<blockquote>
Light-filled Brookline 2+ bedroom condo with updated bath and study. Working fireplace, hardwood floors, high ceilings, and lovely detail throughout. Abundant closets plus 100 square ft basement storage space. Shared roof deck and laundry. Located on a one-way road across from Emerald Necklace park and river. Five minute walk to many parks, Longwood medical area, Coolidge Corner, Brookline Village, and green D line T. Transferable rental parking ($100/mo) paid through 2007.
</blockquote></description>
    </item>
    <item>
      <title>I&apos;m moving to New York</title>
      <link>http://www.kimbly.com/blog/000505.html</link>
      <description><p>I'm going to be moving to New York City!</p>

<p>I'll be joining Goldman Sachs, working as a developer on a core part of their technology infrastructure.  It's a big change for me -- the biggest company I've ever worked for so far only had about 200 people, and GS is approximately two orders of magnitude larger than that.  I'll be working in a skyscraper right in downtown Manhattan, just a couple blocks from Wall Street and that bronze bull that you've probably seen pictures of before.</p>

<p><center><img src="http://www.d.umn.edu/~gbabiuk/images/WallStBull.jpg" width=300 height=225 border=0></center></p>

<p>I've been asked to be discreet about the details of the job, so all I'll say is that it involves language design and optimization, highly scalable systems, and financial derivatives.  Here's a paper that'll give you a flavor for what I'm talking about: <a href="http://research.microsoft.com/~simonpj/Papers/financial-contracts/contracts-icfp.htm" >Composing Financial Contracts</a>, by Simon Peyton Jones.  Unfortunately I won't actually be writing haskell code, but at least I'll be working with a group of people who deeply appreciate functional programming.</p>

<p>I won't actually be moving for a few months yet (date still TBD).  In the meantime I'm still working at <a href="http://www.streambase.com/" >StreamBase</a>, implementing cool new features for our upcoming release.</p></description>
    </item>
    <item>
      <title>Thoughts on Robust Systems</title>
      <link>http://www.kimbly.com/blog/000504.html</link>
      <description><p>Reading <a href="http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf" >Building Robust Systems</a>, by Gerald Sussman.  I'm not done yet, but so far (page 7) the paper is pretty vague and metaphorical.  So I thought I'd scribble down some of the slightly-more-concrete thoughts that it triggers in my head:</p><p>Sussman talks about "degeneracy" in biological systems, and how it can emerge by duplicating a section of DNA and allowing the copies to diverge.  In programming languages, this might be done by taking a program, copying a section of code, and then changing each caller so it either continues to call the old version or calls a new one.  In order to allow broken pieces of code to continue to evolve without destroying the program, you could make callers "prefer" one version over the other, but silently fall back to their non-preferred implementation if the first version didn't work.  For example, maybe their preferred version threw an exception, or maybe it started failing some kind of unit test that the caller cares about.</p>

<p>Here's another idea: generate random segments of code by "connecting the dots", where by "dot" I mean "type", or perhaps "function call".  Suppose you have a URL and you want to have  a file on disk.  If you're lucky, you can search the call graphs of a whole bunch of programs and find some code path that starts with a url and ends with a file.  If you're really lucky, that code path will do something appropriate, like downloading the content behind the url.  If you took this idea and applied it to all the open source projects in the world, you'd probably have a fair chance of implementing something reasonable, purely by accident.  Well, not really by accident -- it would actually be by virtue of the fact that you're drawing a random sample from a set of programs that is distributed extremely non-uniformly over the space of all possible programs.  <a href="http://lambda-the-ultimate.org/node/1178" >Djinn</a> does something like this, but without the benefit of a meaningful dataset of samples to draw from.  Haskell probably has an advantage at this kind of thing because it doesn't depend on side effects to determine the meaning of a segment of code.</p>

<p>Combine these two ideas.  Generate random code, evolve it by making (fail-safe) copies, and mutate it by replacing randomly-selected code paths with randomly-generated code paths that connect the same dots.</p>

<p>This starts to resemble the way bacteria <a href="http://curriculum.calstatela.edu/courses/builders/lessons/less/les4/conjugation.html" >share plasmids</a>.  If you manage to come up with a useful generated program, add it back to the pool of programs from which we draw our samples.  Now you start getting positive feedback for code snippets that continue to be useful when copied/mutated.  You can start imagining computer scientists doing research into things like what characteristics are necessary for a piece of code to be randomly reusable, or what extraction techniques end up producing more reusable snippets.</p>

<p>To take better advantage of the meaning inherent in existing a program, you also want to pay attention to variable names, not just types.  E.g. a variable named "i" is probably an index, while one named "n" is probably a length of some kind.  Build up a probabilistic model describing how variables with certain names are likely to be handled (e.g. variables named "i" are unlikely to be divided, but very likely to be incremented).  You might even be able to <i>only</i> pay attention to names (or at least parts of names), and just hope that the types work out properly -- that would be simpler than trying to handle both names and types.  Try to notice naming conventions within a single class / subsystem / project, and pull them out as patterns.  Do all of this probabilistically so that it scales to huge code repositories like SourceForge.  I'm thinking you'd treat names as n-grams, and just accumulate counts for how many times an identifier with a particular n-gram appeared in a particular relationship with other variables.  Normalize these histograms to produce probability distributions.  Now you can use the distributions to produce a "code babler", or to identify (and extract) common patterns that appear to be far from random.</p>

<p>I'll probably have more thoughts later, but I wanted to scribble this down, since it's been so long since I've written anything here.</p></description>
    </item>
    <item>
      <title>Dubious decisions</title>
      <link>http://www.kimbly.com/blog/000503.html</link>
      <description><p>If you ever find yourself deliberately using a vague subject for an email because you feel the real subject is sensitive information that you may not want to be seen by casual overlookers, then it may be that you're doing something ethically questionable.  Especially if you're the boss.  Perhaps you should rethink the entire plan, rather than just trying to keep it a secret.</p>

<p>Readers sensitive to irony will realize that this posting is itself deliberately vague.  I'm protecting the guilty.<br />
</p></description>
    </item>
    <item>
      <title>Final version of paper</title>
      <link>http://www.kimbly.com/blog/000501.html</link>
      <description><p>Here's the final version of the paper describing my Master's research at Brown: <a href="http://kimbly.com/papers/bck-pepm-2007.pdf">Lowering: A Static Optimization Technique for Transparent Functional Reactivity</a>.  It will be <a href="http://www.program-transformation.org/PEPM07/PEPMProgram" >presented at PEPM'07</a> in January.</p>

<p>The work was very successful: I managed to get a speedup of 7810% for a program that had about 6000 lines of code.  The abysmal performance of that program was what motivated the project in the first place.</p><p>I'll explain what the title of the paper means.</p>

<p>First, "Functional Reactivity" means a programming language that automatically updates its outputs whenever the value of a variable changes -- kind of like a spreadsheet.  Check out <a href="http://conal.net/fran/" >Fran</a> for a simple introduction to the idea.</p>

<p>Second, the "Static Optimization" part means that I'm manipulating program source code, without running it.  In particular, I'm manipulating <a href="http://citeseer.ist.psu.edu/cooper04frtime.html" >FrTime</a> code.  The hope is that the result of the manipulation will be a program that behaves identically to the original program, but runs faster and uses less memory.</p>

<p>Third, the "Transparent" part means that the functional reactivity is implicit.  All the language constructs have been replaced so that they just Do The Right Thing whenever you start working with time-varying values.  This is different from the haskell-based versions of functional reactivity, which use the type system to distinguish between time-varying values and constants.</p>

<p>Finally, "Lowering" is the word I made up for this particular kind of optimization.  The name comes from the fact that it's the opposite of lifting.  "Lifting" means adding support for functional reactivity to an arbitrary function.  "Lift" is also the name of the Haskell function that gives monadic features to an arbitrary function.  Indeed, the lowering optimization should apply to any monad where lift distributes across function composition.</p>

<p>That's fine, you say, but what does lowering actually <i>mean</i>?  The idea is that instead of doing a whole lot of simple operations on time-varying values, you can temporarily stop the values from changing, do a whole lot of operations on <i>constant</i> values, and then let the values start changing again.  Operating on constant values is faster than operating on time-varying values, so this can yield big performance improvements.  Read the paper if you want to know the details.</p></description>
    </item>
    <item>
      <title>Extracting Queries by Static Analysis</title>
      <link>http://www.kimbly.com/blog/000500.html</link>
      <description><p>Here's a very interesting paper, <a href="http://www.cs.utexas.edu/~wcook/Drafts/2006/WiedermannCook06.pdf" >Extracting Queries by Static Analysis of Transparent Persistence</a>, by Ben Wiedermann and William Cook.  The idea is that instead of constructing a SQL query and then iterating over the results, you just write a program as if all the records in the database are available, and the language automatically figures out what (efficient) SQL query to construct.  If-statements are automatically turned into where clauses, and reference traversal is automatically turned into a relational join.  Unfortunately the paper doesn't cover the more interesting cases of aggregation or grouping.</p><p>The paper focuses on relational databases, but I think it would be really cool to apply the approach to the custom data structures used in scalable distributed systems.  A common pattern in distributed systems is fan-out/fan-in.  That is, you have some kind of coordinating server which receives an original query, sends subqueries to lots of secondary servers, and then aggregates the results.  Google's MapReduce is a variation on this pattern, and Endeca used a similar technique as well.  It would be great if you could program the coordinating server as if the entire data store of the secondary servers were directly available -- that would simplify the tedious coding of query languages and communication protocols.</p>

<p>Similar considerations apply to client/server programming, although there you would also have to consider the security implications (often you want to retain tight control over the server api, to protect against abuse by rogue clients).</p></description>
    </item>
    <item>
      <title>Ugly, helpful error messages</title>
      <link>http://www.kimbly.com/blog/000499.html</link>
      <description><p>I've been trying to sign up for an ACM SIGPLAN membership for over a week.  Whenever I hit the button to submit my application, I got an error message with a bunch of raw stack trace information.  I figured eventually someone at the ACM would fix this, but apparently not.</p>

<p>Around about the tenth time that my application failed to be processed, I finally decided to read the error message carefully.  Here's what I saw:</p><blockquote>
Error Executing Database Query.<br>
ORA-00001: unique constraint (MSF.CP_PK) violated<br>
[... snip...] <br>
SQL 	   INSERT INTO client_phones(CLIENT_NO,PHONE_TYPE,PHONE,CREATED_BY,CREATED_DATE) VALUES('7781127','PRIMARY','617 ***-****','SQJOnline',sysdate)<br>
[...snip...]
</blockquote>

<p>Hmmm... it looked like there might be a problem with a duplicate phone number somewhere in the database.  What if I used a different number?  So I tried my home number, and all of a sudden everything went through smoothly!  Let's hear it for ugly ColdFusion error messages.</p></description>
    </item>
    <item>
      <title>A Force More Powerful</title>
      <link>http://www.kimbly.com/blog/000498.html</link>
      <description><p><a href="http://www.afmpgame.com/" >A Force More Powerful</a> is the only video game I'm aware of where the goal is to effect political change through nonviolent action.</p>

<blockquote>
Game play is governed by detailed interactive models-of strategic and political factors, ethnicity, religion, literacy, material well-being, media and communications, resource availability, economic factors, the role of external assistance, and many other variables. Tactics include such basics as training, fund-raising and organizing, as well as leafletting, protests, strikes, mass action, civil disobedience and noncooperation.
</blockquote>

<p>I want this for Christmas :)</p></description>
    </item>
    <item>
      <title>StreamBase docs online</title>
      <link>http://www.kimbly.com/blog/000497.html</link>
      <description><p>StreamBase finally decided to release the complete set of product documentation online, for free.  So now, if you're interested in seeing what the <a href="http://www.streambase.com/developers/docs/latest/streamsql/index.html" >StreamSQL language</a> looks like, you can find out without having to download and install a hundred megabytes of binaries.</p></description>
    </item>

  </channel>
</rss>