Philoso-Rants: Building Modular Forecasts

The Pain of Smarter Forecasting
Pundits love to expound about how we should use our data "smarter" -- and often they're talking about making smarter forecasts.   Seriously, is there a person alive who would publicly disagree to the importance of making forecasts smarter?   It's quite fashionable to nod our heads vigorously.

...But I wonder how many people have pondered what exactly that means.   What does it mean to make our forecasts smarter? What does a "smarter" forecast look like?

A lot of people assume it means fancier-shmancier math. Ya know, eigenvalues and matrix math and whatnot. That's rarely the case.   ...Indeed (and this topic begs for a separate post), I've witnessed lots of Directors and VP's place undue faith that a statistician can solve behavioral or process problems -- which invariably ends in tears. Math only gets you so far.

Using the data smarter usually means two things: 1. Performing more steps. 2. Using more disparate data sets. Basically, a "smarter" forecast does more comparing, inferring, measuring, assessing, extrapolating, interpolating, checking. A smarter forecast is a more complicated forecast.

Net-net, a "smarter" forecast isn't free; its cost is reduced system explainability.

These days, the goal of staying simple and explainable isn't even ignored, it's practically scorned. Companies rush towards unfathomably-complicated forecasts, as if they're the path to salvation. On some cosmic level, executives believe that as soon as it's so friggin' complicated that nobody understands how the damn thing works, the riches will surely start rolling in.

And here's the kicker: Forecasts naturally gravitate towards incomprehensibility anyway, which leads me to my first assertion:

Left Unchecked, Forecasts Become Integrated, Inscrutable and Deteriorated.

There isn't one reason for this. There are probably ten; here are a few that spring to mind:

People are more prone to add logic to, rather than remove logic from, a forecast.
Forecasting algorithms can work in unexpected ways, especially on volatile input data, which means that a forecast is never fully understood, even right after it is written.
The forecast's logic is often split over several systems and/or sets of code. As a result, changes in one area can inadvertently affect the results of the other.
The forecast's data inputs naturally change over time, which causes input data to become stale, inconsistent or even corrupted.
It's hard to tell when there even is a bug, because it's predicting the freakin' future and past performance is never a predictor of future results. As a result, problems can fester.
The people who have mastered the heady math might not be good at explaining what they're doing. ...Worse yet, they might not fully understand how the dang forecast is even used. As a result, problems might not be detected.

Worst case scenario (which I have seen many, many times): The forecast integrates a myriad of different data sources, each of which slowly deviates from the forecast's working assumptions about it. Fewer and fewer people claim to grasp how the forecast works, and nobody wants to take responsibility for (or rely upon) a forecast that they don't understand. Finally, its performance begins to deteriorate, and nobody is sure how to fix it.

The Proposal: Stay Simple and Modular
At the heart of the problem is that many companies build monolithic forecasts: Too much functionality fed into a big black box, with multiple data feeds coming out the other side. See that picture above? That's how it ends up.

So what do you do about this?

Well, you can ensure your forecasts do not become too complicated by breaking them into simple and well-understood parts, that can be mixed-and-matched. In other words, resist the natural temptation to build a big monolithic forecast, and focus upon building a series of smaller, simpler, more well-understood components that can be used independently.

I'm not 100% sure how easy/possible/feasible this will be in your circumstances. But I've gotten it to work on many forecasts. Let me describe the general approach I took here, using Akamai as an example.

Case Study: Revenue Forecasting

About this client...
Before I describe forecasting revenue I conducted at a client (back in my consulting days), allow me to provide some detail about the company's business model.

This company provides content delivery service, and its customers' invoices are similar to cell phone bills -- only instead of delivering minutes of phone service, they delivers internet traffic. Just like with cell phone plans, customers can "Commit" to a certain level of usage; generally, the higher a customer's Commit, the lower the per-unit price. If they push more than their Commit, they pay a per-GB Overage rate -- just like the per-minute Overage when you exceed your cell phone minutes.

Customers with steady traffic typically prefer high Commits (to get the lowest per-GB rate), customer with volatile traffic typically have low Commits (so that they don't pay for traffic that they don't use). But any customer's traffic can gyrate unpredictably in any given month.

However, one way in which this company is unlike a typical cell phone plan is that the contracts can get rather complicated. (This is a common phenomenon in B2B companies, where premier customers receive tailored contracts.) Point being: Even if you know a given customer's traffic, calculating their invoice amount can be very tricky.

Revenue Forecasting Data
So, when attempting to forecast quarterly revenue, what factors must be considered? As it turns out, A LOT OF FACTORS. However, they can generally be broken down into four categories, each with three sub-categories:

Traffic Factors: the levels of content that customers push
Contract Factors: the customer contracts Akamai has, and the terms of each contract
Invoice Factors: the amounts that appear on the actual invoices
Revenue Factors: the amount of invoicing that Akamai recognizes (versus deferrals, rev reserve, credit memos, etc)

Within each of these categories:

Current Data: The most recent month's data.
Historical Patterns: Patterns in how this data has changed over time.
Known Current Adjustments: One-off events scheduled to happen within the forecasted period.

Old vs. New Forecasting Methodologies
Until 2009, the forecast took all of these quite-different inputs, and incorporated them all into a single mega-algorithm, like so:

Since then, Akamai has transitioned to a new modular forecast. It works like this:
1. Use only traffic information to make a traffic forecast.
2. Next, use only contract information to forecast what our contracts will look like.
3. Then input our forecasted traffic and forecasted contracts into our invoicing system, to calculate forecasted invoicing.
4. Finally, use historical revenue data to forecast how much of this invoicing we will recognize as revenue.

Benefits of a Modular Approach
You might be wondering: "Okay, so you went from one super-complicated forecast, to a series of four less-complicated ones. Did you really come out ahead?"

The answer: Way ahead. Here are some of the benefits:

Forecast accuracy improved significantly. As you might imagine, Revenue is really just a bunch of adjustments to Invoicing -- and in this case, we're calculating forecasted Invoicing, based upon forecasted Traffic and Contracts: We literally use our production Invoicing system to say "Okay, if the contracts look like this, and the traffic looks like that, then how much money will we invoice?" For a variety of confidentiality reasons, I can't cite specific accuracy levels, but believe me: nobody is complaining about accuracy.
Forecast intuition is very high. Each part of the forecast (e.g., our traffic forecast, our contract forecast, etc) had sufficiently few inputs that people could wrap their heads around how it worked, and what events would affect the forecast. As a result, the forecast hasn't turned into a black box, like many other forecasts I've observed.
Forecasting modules can be swapped in and out. Sooner or later, we'll need to revamp one of the forecasting modules diagrammed above; but because the forecasts are modularized, it won't introduce risk to the other forecasting modules.
Our intermediary forecasts now sing for their supper. Originally, we modularized our forecast to yield a simpler and more-accurate model -- but we attained another benefit: We discovered that groups within the company could use these intermediary forecasts! Engineering, Finance and Sales use our Traffic, Contracts and Invoicing forecasts -- and even spot opportunities to improve them, which yields a more accurate revenue forecast, to boot!

Conclusion
In most organizations, forecasts naturally gravitate towards becoming monolithic beasts that nobody fully understands, and which then deteriorate over time. However, by moving towards more modular forecasts, a company can fight this trend, and maintain understandable, accurate forecasts for much longer.

April 1, 2014

Building Modular Forecasts

No comments:

Post a Comment