I can pretty much step you through what most first-time forecasters
do, and then explain where you'll sooner-or-later end up:

*Using a logarithmic scale.*

__Bad Method 1__: Using a Linear ScaleCrappier than it seems |

This works, right? Sort of, but it has two big drawbacks:

__Static Level of Precision__

You probably don't want all of your error buckets to be
the same size; this gets ridiculous pretty quickly.

I mean, reporting upon a 100-110% bucket makes sense -- but what about the 170-180% bucket (are these sufficiently different from 180-190% to warrant their own bucket?). And it makes zero sense to report a 780%-790% bucket or (even more ridiculous) a 8,920%-8,930% bucket. And that's what you're gonna get with this approach.

I mean, reporting upon a 100-110% bucket makes sense -- but what about the 170-180% bucket (are these sufficiently different from 180-190% to warrant their own bucket?). And it makes zero sense to report a 780%-790% bucket or (even more ridiculous) a 8,920%-8,930% bucket. And that's what you're gonna get with this approach.

Ideally, your buckets will grow along with your error: You need fine precision when your forecasts are close, and coarser precision for your big misses -- which you don't get.

__Asymmetrically Bounded__

On a percentage chart like the one above, the smallest error bucket is fixed: 0-10%. Your forecast can't get any lower than 0% of actual. However, your forecast could be a bazillion times

*higher*than actual.

As a result, when you plot out your error, you'll get a very short "low-tail", and a very long "high-tail". Among other problems, this makes it impossible to visually scan for

*forecast bias*, because your too-high and too-low forecasts aren't reported in a consistent fashion.

__Bad Method 2__: Using a Custom Scale
From there, most people move on to creating customized
buckets. So you'll write statements like this:

If err = 0 then
"Match"

else if err > 0
and err <= 1% then "1%"

else if err > 1%
and err <= 5% then "1-5%"

else if err > 5%
and err < 10% then "5-10%"

else if err >
10% and err < 25% then "10-25%"

...But you'll probably discover that your buckets are always
somewhat arbitrary and not quite pleasing. And, of course, any time
you want more or less-granular buckets, you gotta mess with your equation.

__Proper Method__: Use a Log Scale
Reporting upon forecasting error on a logarithmic scale
solves all of these problems.

But before I jump into using log-based error
groups, let me remind you about logarithms. (For the rare few of you who
are already savvy with logs, feel free to jump ahead.)

__Logs 101__

Easy, once you get the hang of it. |

^{3}= 125 then

*log*

_{5}(125) = 3. ...Btw, that five in subscript is called the log's "base."

Logs are much easier to understand if you just see them in action. The chart to the right shows the log values for certain bases. As you can see, for log base 5, each value represents five times the previous value. For log base 2, each value represents two times the previous value.

To
convert your forecasting error to a log scale, just take the log of the
forecast/actuals. Any base is fine. With base 2, if your forecast
were

*your actual, log(200%,2)= 1. If your forecast were half of your actual, log(50%,2)= -1. If your forecast matched your actual, log(100%,2)=0.*__double__
What
does this buy us? A lot!

__Dynamic Level of Precision__

First and foremost, each log value represents a bigger
group, which can go from very constrained to enormous.

Let's say that your forecast is often off by 30%, but
you're occasionally off by upwards of 100,000%. (This especially happens
when your actual value unexpectedly drops near zero, where the forecast/actual
ratio can be sky-high even for modest forecasts.)

On a linear scale, you can't meaningfully

*30% and 100,000% in the same chart, without the chart being enormous. Yet on a log base 2 chart, the 30% gets a value of .37, and the 100,000% gets a value of 13.2.*__show__
Amazingly, a log scale allows for fine-tuned precision
for accurate forecasts (i.e, you can see the difference between a 5% and a 10%
miss), even when reported right next to a 100,000% miss!

__Symmetrically Bounded__

As I described earlier, on a linear scale, too little
precision for forecast that are very low (they're all lumped into the 0-10%
bucket), and too much precision for forecasts that are too high (each one gets
its own useless bucket, like 8050-8060%).

Logarithms don't have that problem: They represent high
and low values on an equal terms. If you were ever 100x too high, using a
log base 2, you'd get a log value of 6.6. If you were 100x too low,
you'd get a log value of -6.6.

**Working Example:**

Here, I used Excel to generate 1000 random numbers (each
from 1 to 100), and then "forecast" each with another set of
semi-random numbers -- based partly upon the original number, and partly upon
another random number.

For kicks, I gave my forecast a subtle positive bias --
it's higher than the actual number a bit too often.

Now, let's plot the error both linearly and
logarithmically, and see what we find.

Ugh, what a disaster.

First, for one thousand data points, we have 629
buckets.

Second, ten "low" buckets (0-10%, 10-20%, etc)
averaged 28 entries each, while six hundred and nineteen "high"
buckets averaged just

*entry each.*__one__
Third, because the chart is naturally lopsided, we can't
visually see the overt bias in the forecast.

__Logarithmic Scale__
This is more like it!

First, our chart has only 100 points, instead of six
hundred and twenty nine. Yet our precision was precise where it mattered
(i.e., forecasts close to 100% of actuals), and rough when it didn't matter.

Second, you can see that both our high and low values
slope away, roughly in symmetry. Instead of our low forecast crammed into
ten groups, and our high values spread out over 619 groups, they're much closer
to equal.

Third, you can see instantly that there is a positive
forecast bias. (Focus on the center bar, which represents forecasts near
100% of actuals. Now compare the bars immediately to its right and left,
and then bars

*to the right and left. See how the right bars are always higher? That's a tell-tale sign of positive forecast bias.)*__two__**Conclusion**

I probably could have written this entire article in a
single sentence:

*When reporting forecast error, use a logarithmic scale.*
The trouble is, other people had told

*that in the past, and I never quite understood*__me__*I was doing it, or even knew if I was doing it right. Hopefully, by describing the drawbacks of the more conventional linear approach, you'll have a bit better understanding of why you should embrace logarithmic error reporting sooner, rather than later!*__why__
## No comments:

## Post a Comment