Final version of paper

Here's the final version of the paper describing my Master's research at Brown: Lowering: A Static Optimization Technique for Transparent Functional Reactivity. It will be presented at PEPM'07 in January.

The work was very successful: I managed to get a speedup of 7810% for a program that had about 6000 lines of code. The abysmal performance of that program was what motivated the project in the first place.

More...

Paper got accepted

I just got notified that the paper describing the work I did for my Masters was accepted to PEPM '07 (Partial Evaluation and Program Manipulation). PEPM is affiliated with POPL, and will be held in Nice, France in mid-January 2007. I'm feeling somewhat mixed about the news, since it's not clear whether I'll get funding to actually attend the conference.

I'm not sure what the rules are with regard to sharing preliminary papers, so I'm not going to link to it until it actually gets published. You'll just have to wait in suspense :)

Hari Balakrishnan: Kim, you can share whatever version of papers you want and I...

Learning Natural Language Semantics

I went to an NLP symposium at MIT a couple weeks ago. The most interesting thing I learned there was from a 2005 a paper titled Learning to Map Sentences to Logical Form, by Luke Zettlemoyer and Michael Collins.

I'm pretty familiar with the use of lambda calculus for programming languages, but I'd never before seen it used as a formalism for the semantics of natural language. So the first Amazing Revelation was this:

"What states border Texas?"
λx.state(x) ∧ borders(x,texas)

The second Amazing Revelation was that if you assign "types" to English words, then you can "parse" a sentence directly into a semantic model simply by finding a valid tree of logical deductions. The type system used consists of parts of speech (NP, S, etc) plus a way to represent missing phrases on the right or left. So, for example, you can think of S/NP as being like the function type NP->S, except that it requires the NP to be on the right. S\NP would require the NP to be on the left. Here's the deduction tree for the question "What states border texas?"

But these two ideas are merely the background upon which the paper builds. The real meat of the paper is that if you're given a bunch of sentences and their corresponding LC semantics, then you can automatically learn the initial mappings from words to types. And once you've got that initial step covered, then the rest of the problem reduces to finding the most likely (valid) type judgement for the sentence as a whole.

Pretty damn cool, if you ask me. The paper won the best paper award at UAI 2005, so apparently I'm not the only one who was impressed.

After the symposium everybody went out to dinner at CBC in Kendall Square. I ended up sitting across the table from Luke (the primary author on the paper). He's a really cool guy; smart, calm, and interested in what other people are doing. He's currently studying graphical models (e.g. markov random fields).

All in all, it was a great day.


Graph Theory is Discrete Topology

I just realized that graph theory is basically just a discrete version of topology. Topology is about the connectivity of various continuous spaces. Graphs are about connections between discrete nodes. I know very little about topology, but this suggests some reasons why I might want to learn more about it.

neuman: i am very interested in topoloy on discrete spaces, please w...
pralahad: i am working a reasearch work on tolpogical networks with gr...

Spectral clustering rocks!

In machine learning the other day, we learned about spectral clustering. This has got to be one of the most beautiful algorithms I've seen in months. It doesn't require you to come up with any kind of model for how your data was generated; all you need is a similarity measure between any two data points (e.g. exponential fall off). The measure doesn't have to satisfy the triangle inequality (i.e. it doesn't have to be transitive), but it should be symmetric. Here's how the algorithm works:

More...
Rich: Kim, Hmm, there's one eigenvalue with value zero. The second...
Kim: Rich, it seems you're right, but that doesn't make sense to ...
joe: If you liked spectral clustering, you might like "A New Appr...
st r: The eigenvalues to take into account are different from case...
Rahul: hey the confusion over second largest and second smallest ma...
rachel: But what do the 3rd, 4th and 5th etc. eigen vectors mean exa...
Nara: Now, it seems that we all here are investigating the usefuln...
Justin Donaldson: Second eigenvectors (and above) are all useful for clusterin...
Thomas Mathew: Kim, Thanks for your post - helped me on the right track fo...
Ted Dunning: Look up Ng and Jordan's article on the subject. That make...

Pictures of dataflow optimization

I've been working on a project to optimize FrTime for the last few months. The idea is to statically analyze a program in order to discover its dataflow dependencies, and then combine lots of little dataflow nodes that don't do much work into a smaller number of big dataflow nodes that do a significant amount of computation. Performance improves pretty much in direct proportion to the fraction of nodes removed, so this can lead to really big improvements.

On Friday I finally generated some graphs to visualize the difference between an optimized program and an unoptimized program. The result is pretty impressive, especially if you like pictures as much as I do. Here's the unoptimized version of a simple program that uses TexPict, an image and text compositing library:

And here's the optimized version:

Red arcs indicate nodes corresponding to conditionals (if statements), and blue nodes indicate data constructors (structs and lists). Black nodes are simple computation, such as addition or subtraction.

leo: Can you post the programs these graphs represent? They see...

Why least-squares makes sense

An offhand comment in my machine learning textbook finally gave me a good intuitive understanding for why you want to square the error terms when doing a best fit of a line to a set of data. Previously, the best explanation I'd ever heard was something like "well. by squaring you guarantee that the total error is non-negative", which wasn't sufficient to explain why x2 is better than, say, x4 or abs(x).

More...
Hans: Great explanation, thanks. Oddly enough, I had never heard a...
Per Vognsen: Every convex functional has a global optimum, so I wouldn't ...
ulrik: Also in engineeing problems least squares method minimize th...
diego: Here's my 2 cents... -- From a theoretical standpoint, l...
Will Dwinnell: Statisticians seem almost embarrassed to admit that, among a...

Courses for Fall 2006

I'll be taking Statistical Models in Natural-Language Processing, and Intro to Machine Learning. I'm also planning to audit Domain-Specific Databases and CS Algorithms and Economics.

George: I have Bishop's Neural Networks for pattern recognition book...

iRobot talk at Brown

Today was the first day of Brown's Building Intelligent Robots class, and they devoted the class to an invited talk by someone in the military research department at iRobot. Unfortunately I don't remember the name of the speaker.

More...

Google Techtalks online

I just found out that Google's TechTalks are available online from Google Video. For those missing out on the opportunity to hear smart people talking about intelligent topics (i.e. most people outside academia), you might find it refreshing to watch a few of them.

: It wouldn't hurt smart people talking about smart things to ...

Talking with Ken Dill

Yesterday I had the chance to speak with Ken Dill for an hour about protein folding, dynamical simulation of small molecules (e.g. proteins), etc. He was at Brown to give a talk, but I had a class during his talk so I had to miss it. Here's a brief transcript of the ideas I got out of the informal discussion before his talk.

More...
George Dahl: What do you mean by a "real wave function" exactly? Many qu...
Kim: All I know about quantum mechanics is what I read in QED. I...
Gerard Goossen: "replica exchange" sounds to me like some version of what I ...
Shae Erisson: Replica exchange may be connected with bee, ant colony, and ...

Picture of knapsack problem

In my combinatorial optimization class, we had to write a program to solve the {0,1} knapsack problem, and then characterize its performance. I decided to create a picture of the "hardness landscape" of a particular class of knapsack problem (described below) when tackled via branch-and-bound. I want to share the pictures because I found them fascinating.

Here's a birds-eye view of the hardness landscape, zoomed out 100x, where capacity ranges from 0 to 40,000 and number of items ranges from 0 to 30,000.

Here's a close up view of the upper-right corner of the graph where there seems to be the most "static". Capacity ranges from 39,600 to 40,000 and num items ranges from 2000 to 2300.

Each pixel in these graphs represents the time required to find the optimal solution to the problem at that pixel. Black means the solution was found in zero seconds, while white means the solution took more than 0.5 seconds.

Here's the problem structure I used:

randomItems numItems = map mkItem [1..numItems]
    where mkItem n = Item {weight = (n `div` base) + 1,
                           cost = n `mod` base}
          base = round $ sqrt $ fromIntegral numItems
Robin Green: The cutoff point is kind of arbitrary though. This might sou...
Kim: If you look at the first picture, the shades of grey in the ...

My Courses at Brown

Classes started about a week ago. For those who are curious, here are the courses I'm taking:

So far, the first three are really exciting, while the last is a bit slow.

In CS275, we get to pick our own topics, and my first topic is on Software Transactional Memory. I've been interested in STM for a while, and by happy coincidence, the professor for that course, Maurice Herlihy, is one of the biggest contributors to the area. (I'm so bad with names that I didn't even realize this was the case until another student pointed it out to me in the second class.)

CS296-1 is basically formal verification. This week's class included a quick overview of model checking, and last week's homework was to go through the Alloy tutorial. After the model checking class, some of the class hung around afterwards and learned about things like the difference between CTL and LTL, and the mu operator (which was fun because I'd already come across mu in TAPL, for describing fixpoints of infinite types).

In CS258, we're studying approaches for tackling NP hard optimization problems. Today we discussed local search, constraint propagation over a graph, and the simplex algorithm. The current homework is to implement dynamic programming and branch-and-bound implementations of the knapsack problem.

In CS196-1, we're currently doing sequence alignment. But the instructor claims eventually we'll get to things I find more interesting, like protein folding and systems biology.

Also, every thursday there are seminars. Today's seminar was on adding reflection to NuPRL, which was quite interesting.

In short, there's a whole ton of stuff going on, and almost all of it is really interesting.

Richard Cook: I was wondering what languages they let you use in the cours...
Kim: In CS258, we get to pick our own languages. I chose haskell...
Shae Erisson: Martin Erwig's Functional Graph Library is in Data.Graph the...