Here's a picture of what I've been doing this weekend:

This is a picture of one grid of a microarray. Each microarray has 4x12 grids, and each grid has 16x17 spots, for a total of 13,056 spots per microarray. My task is to look at the circle around each and every spot, and make sure it's the right size, that it's centered, and that it doesn't have any defects. It currently takes me about six hours per microarray, but eventually I hope to get that down to 2-3 hours. The goal is to finish nine microarrays (117,504 spots) by the end of the week.
Why am I doing this? Because my partner is in her last year of a PhD in molecular biology, and she has a lot of work to do before she's done. So I'm helping. There's a lot of stuff I can't help with, but circling spots is something that even a trained pidgeon could probably manage.
Because I'm a programmer, and because this activity really only occupies the visual part of your brain, leaving the rest of your brain idle, I've been spending my time thinking about how I could do this task more efficiently. I've considered trying to train a neural net. I've also considered reorganizing the user interface so that you can deal with "spots that look similar" en masse, instead of having to go through them in the same order as the physical layout of the microarray.
Also, since most of the grids in the microarray look similar (e.g. there's always a row of bright orange spots on the lower left), it might be useful to group all the spots that are in the same position, so that you can work with them all at the same time.
Unfortunately, as slow as pixing is, programming is even slower. So I'll probably never get enough incentive to try to implement these ideas.
Posted on January 11, 2004 11:33 PM
More personal articles
In addition to neural networks you might want to consider using genetic algorithms or genetic programming. They are super easy to program and I am willing to bet they would get excellent results on a problem of this type.
I spent a year working on genetic algorithms, so I have enough familiarity with them to know that it would be very difficult to use them for this kind of problem. Genetic algorithms work by taking random solutions to a problem, and then breeding them to come up with solutions that are "better" according to a fitness function.
First, you'd have the problem of defining a fitness function. Presumably that would be the signal-to-noise ratio of the chosen circles. So that's not too hard. You'd probably also want to allow the human to influence the fitness function somehow, but let's assume that's do-able.
Now you have to come up with a way to encode the "solution" to the problem. I'll assume the problem is the placement of a single circle (placement of all circles can be broken down to several instances of the problem of placing a single circle). To place a circle, you might as well use a simple exhaustive search algorithm instead -- it's not that hard to evaluate the fitness function at each and every point.
The only way to use genetic algorithms to make this any easier would be if your "solution" was actually an algorithm for general circle placement, instead of the placement of one particular circle. GA's are not good at algorithms. GP's are intended to be better at it, but I remain extremely skeptical of genetic programming. It has not been shown to work well in any significant class of problems.
It seems like you saw a problem for which no solution was immediately obvious, and so you decided to use a magic bullet technique to solve it. GA's and GP's can't solve all problems. In fact, there are very simple problems that they are atrocious at solving. For example, try comparing a linear program against a GA, for a problem where a linear program is suited. The GA will take 100's of times longer, and its solution will be much worse.
Well, there are some immediately obvious solutions, such as considering all of the pixels over a certain brightness threshold as a single object, and then finding a heuristic by which you can determine the optimal circle placement and size so as to minimize the number of dark pixels in the object.
Genetic algorithms and gentic programming are very well suited to solving optimization problems, such as what the various thresholds here should be, based on some sample training data.
Also, I've spent a year working with genetic programming and my assessment is that the problem of finding the optimal circle placement and sizing heuristic should not be particularly difficult for it. I am less familiar with genetic algorithms, but finding optimal threshold values in this problem should be easy enough even for genetic algorithms.
As far comparisons with other techniques go it really depends on your problem. Perhaps this problem would be well suited to a conventional approach. If you know of such an approach, go for it. If not, and you were considering going with an "ai" technique such as neural nets, you might also want to give these techniques a try.
There has definitely been much less research in the area of genetic programming compared to neural nets or even genetic algorithms, so in a sense they are "unproven". Still, if I had this problem I would spend a few hours to a day that it would take to code up the primitives and fitness function for it and throw it in a gp and see how well it does. Couldn't hurt. Might even be fun.
Interesting idea. Unfortunately I don't think simply finding a brightness threshold would be sufficient. There are some spots that are really bright, and some that are very dim. In fact, the dimmest ones are sometimes the hardest, because you can barely see them well enough to know where the spot is.
As far as spending a day on it and trying something out, maybe you're a better programmer than I. For me, it would take a day just to read the picture format, add a circle, and display it to the user. On the other hand, I don't do graphics and UI programming as much as I used to, and I suppose that these days there are ready-to-use libraries for this kind of thing. But it would probably take me a day to find them and figure them out.
By the time I had a program written, I'd be done if I had done it by hand. And trying to do it via programming might end up not working, while doing it by hand is guaranteed.
Posted by: kim at January 16, 2004 10:29 AMKim wrote:
>
> Interesting idea. Unfortunately I don't think simply finding a brightness threshold would be
> sufficient. There are some spots that are really bright, and some that are very dim. In fact,
> the dimmest ones are sometimes the hardest, because you can barely see them well enough to
> know where the spot is.
But you **can** see them. That means that they stand out from the dark background, which means that there's a brightness threshold at work.
> As far as spending a day on it and trying something out, maybe you're a better programmer
> than I. For me, it would take a day just to read the picture format, add a circle, and
> display it to the user. On the other hand, I don't do graphics and UI programming as much as
> I used to, and I suppose that these days there are ready-to-use libraries for this kind of
> thing. But it would probably take me a day to find them and figure them out.
Yes, my estimate assumes that you know how to read a graphics format file. You actually don't need to display the results to the user until you've found a good algorithm, however. And, you would judge how good the algorithm is simply by how close the resulting circles are to the ones humans have already drawn on the sample microarrays. This judgment could, and actually has to for the ga/gp/nn to work, be done by your program, not by humans. So, while your ga/gp/nn is slaving away you can use the time to learn how to display circles to the user.
> By the time I had a program written, I'd be done if I had done it by hand. And trying to do
> it via programming might end up not working, while doing it by hand is guaranteed.
Depends on how many you have to do. Anyway, it sounded like your original question regarded finding an algorithm for doing this automatically, not doing it by hand.
Have a look at "An Introduction to Exploring Genomes and Mining Microarrays", at
http://conferences.oreillynet.com/pub/w/17/presentations.html
You might find some other fun stuff prowling around at http://bio.oreilly.com/