Sunday, 17 May 2009

Back of the Envelope

"Programming Pearls" is a fantastic book by John Bentley. It covers a wide range of topics from low-level techniques on space-saving to high-level advice on problem solving.

My favourite chapter is called "The Back of the Envelope" which gives great advice in providing quick estimates to real problems. The importance with these rules of thumb is that it's about providing (quickly) an accurate guess.

Enrico Fermi was a master of these back of the envelope calculations. Legend has it that at the testing of a nuclear weapon as part of the Manhattan project, Fermi estimated the yield of the explosion by tossing small pieces of paper in the air and seeing how far they travelled.

Estimation is also a popular interview question. The routine is that the interviewer gives you an odd-ball question (how many gas stations are there in the UK?) and then wants to trace your back of the envelope calculation through (e.g. there are 70 million people in the UK, about 1 in 5 has a car [and so on!]).

Here's some useful bits of knowledge to have for estimation problems. Obviously knowing some maths helps too!

Rule of 72

If you invest a sum of money for y years at an interest rate of r percent, then if r * y = 72 your money will roughly double. For example, 9 years at 8 percent interest will double your money. For example, $2000 for 8 years at 9% yields $3985.

Pi seconds is a nanocentury

There are 3.155 * 10e7 seconds in a year or expressed (by Tom Duff) pi seconds is a nanocentury. For example, how long will it take you to execute 500 million runs of a simulation, where each run takes 1/8 seconds? (50M * 0.125 is about 62.5 million seconds). 62.5 x 10^6 divided by 3.14 X 10^7 is going to be about 2 years.

Of course you could just plug this into Wolfram Alpha and see that it comes out as 1 year, 11 months, 23 days and 20 hours. Near enough! Couple this with the timings for various operations (see Teach Yourself Programming in Ten Years) and you have a technique for estimating expected timing of an operation.

Little's Law

The average number of things in the system is the product of the average rate at things leave the system and the average time each one spending in the system. We can use this to model queue sizes (and not just in computers as the Wikipedia article shows). For example, let's say we're writing a video-processing system. If we process 10 videos per hour, and each one takes 30 minutes to be processed, then the average number of videos in the system at any one time is about 5.

Pareto Principle

80% of the effects come from 20% of the causes. Applies to a wide variety of problems, such as optimization (profile the code to find the 20% slowest functions and you'll get most of the gain), software design (design for the first 20% and 80% of your users will be happy) and more.

Amdahl's law

Amdahl's law states that the speed-up of a parallel program is limited by the sequential part of the program. For example, you have a program that takes 30 months to run in its current form. You've estimated 50% serial and 50% parallel, can you finish it in 6 months with 8 cores? (No, it's bounded by the serial). How many cores to finish it in 18 months? (answer 5)

Does anyone have any more to suggest?