April 20, 2014

Conveying Forecast Error

Whenever you forecast, sometime you'll want to see where you're off a little.  And sometime, you'll want to see where you're off a lot.  You'll need a functional way to express these errors, that let you analyze them in a variety of ways.

I can pretty much step you through what most first-time forecasters do, and then explain where you'll sooner-or-later end up: Using a logarithmic scale.

Bad Method 1:  Using a Linear Scale
Crappier than it seems
At first glance, you'll be tempted to just do it like the picture to the right.  ...So, in this chart, the 130%-140% bucket would contain a forecast of $6.65 for an actual of $5 (6.65/5 = 133%), and the 60%-70% bucket would contain a forecast of $3.25 for an actual of $5 (3.25/5 = 65%).  

This works, right?   Sort of, but it has two big drawbacks:


Static Level of Precision
You probably don't want all of your error buckets to be the same size;  this gets ridiculous pretty quickly.

I mean, reporting upon a 100-110% bucket makes sense -- but what about the 170-180% bucket (are these sufficiently different from 180-190% to warrant their own bucket?).   And it makes zero sense to report a 780%-790% bucket or (even more ridiculous) a 8,920%-8,930% bucket.  And that's what you're gonna get with this approach.


Ideally, your buckets will grow along with your error:  You need fine precision when your forecasts are close, and coarser precision for your big misses -- which you don't get.

Asymmetrically Bounded

On a percentage chart like the one above, the smallest error bucket is fixed:  0-10%.  Your forecast can't get any lower than 0% of actual.  However, your forecast could be a bazillion times higher than actual.  

As a result, when you plot out your error, you'll get a very short "low-tail", and a very long "high-tail".  Among other problems, this makes it impossible to visually scan for forecast bias, because your too-high and too-low forecasts aren't reported in a consistent fashion.  

 
Bad Method 2:  Using a Custom Scale
From there, most people move on to creating customized buckets.  So you'll write statements like this:

If err = 0 then "Match"
else if err > 0 and err <= 1% then "1%"
else if err > 1% and err <= 5% then "1-5%"
else if err > 5% and err < 10% then "5-10%"
else if err > 10% and err < 25% then "10-25%"

...But you'll probably discover that your buckets are always somewhat arbitrary and not quite  pleasing.  And, of course, any time you want more or less-granular buckets, you gotta mess with your equation.

Proper Method:  Use a Log Scale
Reporting upon forecasting error on a logarithmic scale solves all of these problems.
 
But before I jump into using  log-based error groups, let me remind you about logarithms.  (For the rare few of you who are already savvy with logs, feel free to jump ahead.)


Logs 101
Easy, once you get the hang of it.
Logs are the opposite of exponentials, just like subtraction is the opposite of addition, or division is the opposite of multiplication.  So, ya know how since 5 * 2 = 10, then by definition 10 / 2 = 5?  Well, if 53 = 125  then log5(125) = 3.  ...Btw, that five in subscript is called the log's "base."

Logs are much easier to understand if you just see them in action.  The chart to the right shows the log values for certain bases. As you can see, for log base 5, each value represents five times the previous value.   For log base 2, each value represents two times the previous value.

To convert your forecasting error to a log scale, just take the log of the forecast/actuals.  Any base is fine.  With base 2, if your forecast were double your actual, log(200%,2)= 1.   If your forecast were half of your actual, log(50%,2)= -1.  If your forecast matched your actual, log(100%,2)=0.  

What does this buy us?  A lot!

Dynamic Level of Precision
First and foremost, each log value represents a bigger group, which can go from very constrained to enormous.  

Let's say that your forecast is often off by 30%, but you're occasionally off by upwards of 100,000%.  (This especially happens when your actual value unexpectedly drops near zero, where the forecast/actual ratio can be sky-high even for modest forecasts.)

On a linear scale, you can't meaningfully show 30% and 100,000% in the same chart, without the chart being enormous.  Yet on a log base 2 chart, the 30% gets a value of .37, and the 100,000% gets a value of 13.2. 

Amazingly, a log scale allows for fine-tuned precision for accurate forecasts (i.e, you can see the difference between a 5% and a 10% miss), even when reported right next to a 100,000% miss!

Symmetrically Bounded
As I described earlier, on a linear scale, too little precision for forecast that are very low (they're all lumped into the 0-10% bucket), and too much precision for forecasts that are too high (each one gets its own useless bucket, like 8050-8060%).

Logarithms don't have that problem: They represent high and low values on an equal terms.  If you were ever 100x too high, using a log base 2, you'd get a log value of 6.6.   If you were 100x too low, you'd get a log value of -6.6. 

Working Example:
Here, I used Excel to generate 1000 random numbers (each from 1 to 100), and then "forecast" each with another set of semi-random numbers -- based partly upon the original number, and partly upon another random number.  

For kicks, I gave my forecast a subtle positive bias -- it's higher than the actual number a bit too often.

Now, let's plot the error both linearly and logarithmically, and see what we find.

Linear Scale

Ugh, what a disaster. 

First, for one thousand data points, we have 629 buckets.  

Second, ten "low" buckets (0-10%, 10-20%, etc) averaged 28 entries each, while six hundred and nineteen "high" buckets averaged just one entry each.  

Third, because the chart is naturally lopsided, we can't visually see the overt bias in the forecast.

Logarithmic Scale

This is more like it!

First, our chart has only 100 points, instead of six hundred and twenty nine.  Yet our precision was precise where it mattered (i.e., forecasts close to 100% of actuals), and rough when it didn't matter.

Second, you can see that both our high and low values slope away, roughly in symmetry.  Instead of our low forecast crammed into ten groups, and our high values spread out over 619 groups, they're much closer to equal.

Third, you can see instantly that there is a positive forecast bias.  (Focus on the center bar, which represents forecasts near 100% of actuals.  Now compare the bars immediately to its right and left, and then bars two to the right and left.  See how the right bars are always higher?  That's a tell-tale sign of positive forecast bias.)

Conclusion
I probably could have written this entire article in a single sentence:  When reporting forecast error, use a logarithmic scale.

The trouble is, other people had told me that in the past, and I never quite understood why I was doing it, or even knew if I was doing it right.  Hopefully, by describing the drawbacks of the more conventional linear approach, you'll have a bit better understanding of why you should embrace logarithmic error reporting sooner, rather than later!

No comments:

Post a Comment