Sloppiness versus Efficiency

This weekend I was talking with someone in biotech, and the subject came up of the balance between making sure you understand every aspect of an experiment, versus getting it to a point where you're able to get results even though there are still some aspects that you don't quite understand. For example, maybe an experiment works at 72 degrees F, but not at 68 degrees F. Depending on the experiment, that difference may or may not be important. But the fact that you don't understand why it doesn't work at 68 degrees F decreases general confidence in your results, even if only to some small degree.

The same thing happens in programming. For example, I had a problem a while ago where I had a program that worked fine when run by hand on Linux or Solaris or Windows, but if I tried to run it from within a python script, it failed exclusively on Fedora Core 2. Furthermore, the failure mode was really strange: the program would work fine until after the first TCP/IP connection had been serviced, and then it would disappear. It wouldn't core dump, or print an error message, it would just disappear. Now one could say that it didn't really matter, because this program was not meant to be run from within a python script. But even though the problem only happened in specific circumstances, it reduced our confidence in the robustness of the program in all circumstances.

This is similar to the "broken window theory" -- that if you tolerate trivial problems, big problems become harder to control.

The really interesting thing here is that different people have different tolerances for small problems like this, and it can be very difficult not to look down on people whose tolerance level is different from your own. For example, if someone spends two weeks hunting down some small problem that seems mostly irrelevant to you, then it can seem like that person is being inefficient and wasting their time. But on the other hand, if they blithely ignore what seem to you to be worrying indications of instability, then it can seem like they're being sloppy and unprofessional.

I think the thing to realize is that when someone has a different tolerance level for problems like these, you need to realize that it really is just a matter of taste. If you're their manager, then you should encourage them to change their tolerance level in order to be more in line with the rest of the team, but you have to be careful not to just dismiss them as being a slob, or an idler.

Posted on March 21, 2005 11:33 AM
More management articles

Comments

The difference in tolerances could be a matter of taste, but it could also be an indicator of the maturity of a developer. A wiser, more experienced developer may spot a serious structural weakness in a nondescript crack in the wall.

I have noticed this particularly with debugging concurrency related bugs that show up only in rare circumstances.

-K

Posted by: Kaushik at March 22, 2005 02:36 AM

My opinion, perhaps not completely rational, is that almost all malfunctions in software development are worth understanding enough so that you can determine whether or not they need to be fixed. The only exceptions I can think of are problems that are purely cosmetic. It doesn't take too many unfixed problems before a system appears so flaky that you can't get anything done.

Posted by: Alec at March 22, 2005 10:18 AM

> If you're their manager, then you should encourage them to change their tolerance level in order to be more in line with the rest of the team...

Maybe I don't understand (I don't have management experience), but I think it would be wiser to let each person have his tolerance level. In the recently started pugs (Perl 6 boostrap) project, Autrijus is our volume guy. He does all the features, and he has a very high tolerance for big problems. Yuval, on the other hand, has low tolerance, so he spends his time writing test after test for the little things (which Autrijus or I fixes). And that works really well. If everybody were an Autrijus, we'd have a feature complete piece of shit (no offense to Autrijus, his modules are quite stable). If everyone were Yuval, we'd have the most stable Hello World program you've ever seen.

Posted by: Luke Palmer at March 24, 2005 11:15 AM

Sometimes it may be better to encourage the rest of the team to be more like the unusual person --- this is not strictly a matter of personal preference. Different levels of tolerance of the unknown are best for different situations.

I wonder if your program was allocating a pty to handle the TCP/IP connection.

Posted by: Kragen Sitaker at April 13, 2005 07:31 PM

The optimizing of tollerance levels is probably the #1 criteria responsible for Microsoft's success. All too often, adademics like seeing this tollerance level as some sort of an absolute, and not something that needs to be balanced carefully with other goals such as time-to-market.

Microsoft's culture set a very different tollerance level than the rest of the industry; and history has proven it to be a very succsesful one.

Posted by: RM at June 11, 2005 03:05 PM
Post a comment









Remember info?




Prove you're human. Type "human":