If you are an analyst or an analytical leader, how do you practice data-driven decision making? One low-cost way is to use prediction markets. They give you a concrete target and a natural feedback mechanism when the market resolves. This tutorial explores that process.
There is an open prediction market on polymarket.com based on July to August US inflation growth numbers. Even though US-based individuals can’t place bets on this platform, it’s an awesome way to practice data-driven decision-making.
What’s the decision exactly? Will July-August inflation exceed 0% growth? Below is one way to systematically think through this decision. This isn’t only about doing analytics… this practice is about thinking & deciding better with data. It’s about how you think just as much about what the numbers say.
Apples-to-Apples
A good way to start off is double checking that you are looking at the right metrics and you understand what that metric is. Polymarket does a good job linking to the BLS website that reports on this metric. And the BLS does a great job documenting! We can even find good descriptions on the relative importance of categories (food, shelter, recreation, etc.) relative to consumer spending (which is perhaps important because the CPI is calculated based on these weights). Imagine how much easier data science and analytics could be if organizations had their data ecosystems this well documented 🤩.
Ok. So, specifically we are looking at the Percent changes in CPI for All Urban Consumers (CPI-U): U.S. city average. And if you aren’t familiar with the acronym CPI, it’s a measure of price inflation in the economy. Another helpful tool is FRED from the St Louis Federal Reserve – it’s a beautiful archive of historical data on numerous economic indicators. Note that if you open that graph and want month-over-month percent change you’ll need to click ‘Edit Graph’ and change the Units from Index to Percent Change. As a starting point, you might notice that inflation from June-July, the most recent data point, was 0.0%.
Base Rates
Even before the data, though, I started super basic – by Googling ‘inflation.’ Here’s a screenshot:
Here’s the starting perception: inflation must be in a bad place. And it’s likely that, if you live in the US, you’ve heard about rising inflation in the past few months from a family member, co-worker, or friend. What does ‘inflation in a bad place’ mean? Well, since inflation is a measure of price level in an economy… one way to think of detrimental inflation is high growth (i.e. my salary doesn’t go up every month to meet increased prices). This is my mental anchor now… if I had to pick above or below 0%, surely inflation growth will be well over 0%.
Maybe that’s too myopic. The average month-over-month inflation rate since 1947 is 0.29%. However, maybe that’s too farsighted 🤔.
With time-series data (like this), we can use a change point algorithm to identify points in time when the level and variance has… changed. I created this Shiny App awhile back so I can quickly assess potential change points in my time series data (note that change point analysis has a bit of an art to it). With this month-over-month inflation data (from FRED), I see that ~2002 is the last change point (with these parameters). And since the end of 2002, average month-over-month inflation growth has been 0.21%. I’m going with this as my base rate.
Pareto Principle (80/20 Rule)
Next, I look at the components that comprise the aggregate CPI index metric. You can do this several different ways, but using relatively high level category groupings of Food, Energy, and Everything Else… it’s easy to see that average monthly inflation rates this year are being driven by Energy. And to a lesser degree Food.
I often see charts like this when trying to problem solve. In fact, they are so common they have their own name(s); a pareto distribution or often called the 80/20 rule. It means, roughly speaking, that 20% of your categories contribute 80% of your issues. In this particular example, the number of categories we have are too small to calculate a good ratio, but the shape of the graph is spot on. Energy is the “20%” that is driving “80%” of the “issues.” The more abstract idea in this is that things are often asymmetric, or unbalanced. It’s a good heuristic to keep in mind.
Ok, so, Energy prices. If we predict that, it seems like it could give us a good read on Overall CPI.
Oh, rats. Energy price changes are volatile. This makes them hard to predict. I once heard an economist say that high rates of inflation aren’t so much a problem in themselves, it’s that as these rates get higher they get more volatile and it’s hard to plan (both for individuals and businesses) when you don’t know your costs in the future. In July 2022, they fell 4.6%. In August they might fall further, but they might jump back up to >5%. Well, how much do Energy prices really matter to Overall CPI?
Regression (Correlation)
A simple regression helps (sort of). If Energy prices drop 5%, we might expect the Overall index to fall by 0.29%. If Energy prices increase by 5%, we might expect the Overall index to rise by 0.63%. Ugh, that’s uncertainty right around our decision threshold!
This is important because, in my experience, many decision processes find this gray area. You do some work, and then you get to a point where you feel unsure. That’s reasonable though. If all decisions were completely black and white, it would be easy. Getting comfortable with what to do in these situations is a good skill to have. At this point, you can likely craft a narrative in either direction to favor either side of the decision. What will you do?
There are at least 2 paths forward.
First, you can craft the “safest narrative” and call it a day. The “safest narrative” will be the one that the majority of people in your environment already think is true. If the majority think inflation is out of control, the “safest narrative” is to decide that growth will be above 0%. Nobody will look down at you for being wrong.
Second, you can double down on your investigation and look for more information. This path is painful, especially when you think you’ve discovered all the discoverable information! Time will also dissuade you from this path – decisions often have deadlines. But, if you have the grit, take this path – talk to new people, read from a different source, find new data…
Unknown Knowns
In Rumsfeld’s famous “unknown-unknowns” matrix, the quadrant that gets the most attention is the “unknown-unknowns” corner. But thinking about the unknown-knows helps us in this specific case. What information could be out there that I haven’t discovered yet… where haven’t I looked? Energy price changes may be volatile, but they have drivers too. Since it’s so late in the game for this decision (I’m writing this on Sept 5th, and the market resolves on Sept 13th), we know the change in Gasoline prices (a strong indicator of overall energy prices) between July and August. Gasoline prices dropped 12% month-over-month. It may have seemed like Energy price changes are too volatile and impossible to predict. That may have been true in the beginning of August, but not at the end. You can see this reflected in the prediction market. On August 12th, a contract for Yes was relatively more expensive than a contract for No. That has since inverted.
We can do another simple regression of Gasoline prices on Energy prices.
Of course there is a level of uncertainty here. However, it is reasonable to think that if Gasoline prices have fallen month over month by 12%, that Overall Energy prices will fall by ~5%. If you recall from above we said that if Overall Energy prices fall by 5%, it’s reasonable to expect CPI to fall by 0.29%.
Trade-offs
Judgment time. The decision I need to make is ‘will July-August inflation exceed 0% growth?’ I can say yes or no, but I think it’s healthier to ask ‘on a scale from 0-100, how certain am I that inflation will exceed 0% growth? With the methodology outlined above, I would be comfortable saying I am 90% sure July-August month-over-month CPI will not exceed 0%. Why not 92%? Why not 99%?
1st –> systems are complex and in complex systems surprises happen. I was 99.9% confident that Hillary Clinton would win the 2016 US Presidential Election. That was a fantastic lesson!
2nd –> if I try to differentiate between 90% and 92%, I am shooting for false precision. Given the uncertainty, to me, there is no difference between these when making this decision.
3rd –> if I say 99%, I am very unlikely to think about the cost of being wrong. Even though I am confident on this, I recognize there is a chance that I may be incorrect and need to recover from that. In this instance, the cost of being wrong is effectively 0 (remember, I can’t bet on this platform since I am US located). In my humble opinion, the higher the cost of being wrong, the less certain you should strive to ultimately be with your decisions (at least in the analytical phase of your process).
If the costs were higher – say I had millions of dollars of procurement decisions being made – I would go several layers deeper. More time series modeling (Prophet, xgboost type algorithms) might help. Looking at other economic indicators like University of Michigan’s Inflation Expectations Index or the Producer Price Index. I likely would have read a lot more while being mindful to balance the quantity of bullish and bearish positions I let enter my mind.
Practice, Practice, Practice
In order to improve at something quickly, we know deliberate practice is a good approach. Writing out the above analysis is deliberate practice. First, I am bound to get feedback about how it could be improved (🙏 in advance for that). Second, since I now have documented assumptions, when CPI is published on Tuesday I can verify how I was thinking and recalibrate that thinking if necessary.
What other mental models might you use to become more certain about this decision?