Data science projects fail, frequently. Between the end of 2017 and 2019 several published reports from Gartner, NewVantage, and VentureBeat AI showed that ‘failure’ rates on data science projects are north of 75%. But I don’t think this is indicative of how powerful the growth of data, machine learning, and AI has been for business (and likely all sectors of the economy) over the same timeframe. Back-of-the-envelope machine learning is inconspicuously powering business today (2020).
A premortem is a thought exercise to predict or foresee why an analysis or project might fail. It forces the participants to change their perspective by porting their frame of reference to the hypothetical future after the project has failed. Nominating the premortem as ‘the scientific term or concept [that] ought to be more widely known, Richard Thaler wrote “there are two reasons why premortems might help avert disasters. First, explicitly going through this exercise can overcome the natural organizational tendencies toward groupthink and overconfidence. Second, laboratory research shows that by asking why did it fail rather than why might it fail, gets the creative juices flowing.” Consider these more well-known premortem examples:
- Leadership coaches (as well as athletic coaches) employ visualization techniques to help people visualize a future in which they already accomplished their goals. Scott Weighart describes, here, how he uses visualization for improving his presentations. Here is a post that explains how visualization is a soft substitute for experience.
- Shell Scenarios ask “what if?” questions encouraging leaders to consider events that may only be remote possibilities and stretch their thinking.
During a premortem for a proposed data science project, the participants could say (examples come from here and here):
- There was inadequate or unavailable data.
- We tried to build the technology for its own sake rather than for driving profit / KPIs.
- Executive support evaporated.
- We over-complicated the deliverables and failed to communicate clearly.
- We didn’t ask the right questions at the start or as we progressed through the project.
- The data didn’t represent what we thought it represented.
- We didn’t have the right team or the required skill sets (which slowed us down significantly).
Naturally, the group would welcome some data-driven insight during this premortem. It’s possible that the data science team has kept a log in Jira or Github detailing each project with some metadata about the project, and whether or not the project ‘succeeded.’ A proxy for this type of data could be this download of Kickstarter projects from Kaggle. It catalogs 8,000+ projects with information on each project such as category, number of contributors, whether the project was funded, etc.
Often, there isn’t a lot of time to go deep into the data – we need a cursory view. So, to use this data as an idea generator for the group’s premortem we will look at 3 things; the Cramer’s V, a predictive model using xgboost, and finally a linear regression model.
From Wikipedia, the Cramer’s V is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). Values closer to 1 indicate a stronger relationship. This is a perfect starting point because our variable of interest is a nominal variable (failure or success), we can start with Cramer’s V to give us a sense of what features are associated with it.
Using the Cramer’s V metric and this heatmap, I can see that the number of backers, number of pledge backers, number of updates, and number of images on the project are highly associated with the success variable (top row). Additionally, the number of backers / pledge backers is pretty strongly associated with whether or not the project is featured. Engagement matters.
(Note that City appears to be some kind of major influencer to Kickstarter projects. I could cluster and map it, but I’m purposefully leaving it out — for now — because this is back-of-the-envelope… and I want to go fast. Here is the link the Cramer’s V analysis.)
I know that there is a ~48% chance that any randomly chosen project will be a success. If we want a more specific indication of success, we can build a model to predict whether a project with succeed or fail. While there are many options of models to choose from, I’ll use something that works pretty quickly, pretty well on tabular data: xgboost. Using default parameters in the xgboost package in python… I can produce a prediction model with a Mean Absolute Error of 0.07. Using stratified cross-validation, I can expect my model to be ~94% accurate. That’s definitely in the ballpark for decision making. But, what features are driving the prediction… that’s really what I want to know to help me brainstorm ideas for the premortem.
Number of backers / pledge backers, number of updates, and number of images were already on my radar. But, we have new information. In our xgboost model, it looks like both the number of projects created by the project owner and funding goal is indicative of success. Looking a little deeper, we discover that projects with owners that have more than 1 project tend to have lower success rates… and projects with a lower funding goal tend to have higher success rates. In addition to engagement, we now also sense that focus and realistic goals are important characteristics behind a successful project.
(Note that I certainly could have tried other model types, hypertuned the model parameters, and done some deeper — or any — feature engineering. But, I’ve learned something at this point and I’m not going deeper — for now — because this is back-of-the-envelope… and I want to go fast. Here is the link to the xgboost analysis.)
We now have good direction and probably some good ideas for the premortem. But, our methods thus far haven’t told us anything about causality. Creating a linear regression model can help us take one step in that direction. Let’s be irresponsible and throw our identified variables into a linear regression model where the outcome variable is % of funding goal (a continuous variable rather than our nominal success/failure variable).
This is quick and pretty dirty. But, my residuals are normally distributed. I’ll take it with a grain of salt and see what I can learn. What’s most interesting is that A) the number of backers doesn’t seem to matter here (although the number of pledge backers does which is highly correlated) and B) the goodness of fit measure is only 0.31 telling me there is a lot of variation in success rate that I’m not able to explain with these variables alone. The only new information I’m taking away is a healthy dose of skepticism that these variables alone don’t tell me the whole story of whether or not a project will succeed or fail.
(Note that I could have done some transformations on the dependent variable and even used some other types of packages like do-why which is a Python library for causal inference or tried Targeted Maximum Likelihood Estimation. But, I won’t at this point because this is back-of-the-envelope… and I want to go fast. Here is the link to the regression analysis.)
Let’s translate this into some ideas for our premortem. Our data science project has failed (it didn’t make it to production or the insights weren’t adopted by the business). Why?
- Number of Backers / Pledge Backers. We put all our eggs in one basket – a single stakeholder. When that stakeholder lost interest… all support for the project dried up.
- Number of Updates / Project Images. We lost contact. We didn’t provide our stakeholders or partners with updates on progress… we didn’t share all the amazing prototypes we built out and they assumed we had quit or it wasn’t possible. As a result, it became deprioritized.
- Number Created by Owner. We stretched ourselves too thin. We lost focus and our stakeholders and partners lost faith.
- Funding Goal. Our project was too ambitious. We promised too much with too little credibility.
These are generic but serve as a good foundation – at the very least a conversation starter – for the team’s premortem. And, it took less than 2 hours to produce. Now the team can start to figure out actions upon which to combat these possible undesirable futures.
Taking a step back, the point of this exercise was to demonstrate how teams might be using ML in an ad hoc fashion to steer decision making. The book Guestimation introduces us to a technique called the fermi method — where the goal is to use math and estimates to, not find the answer, but come close to the answer to help us decide what to do next. Conor Dewey touches on a similar idea he calls the Minimum Viable Analysis to demonstrate the fact that even if you only go a few steps in the right direction toward the answer, that may very well satisfy your stakeholders. He writes, “you would actually be shocked how often stakeholders are satisfied with your minimum viable analysis.”
Here’s a final applicable idea. Let’s say a company’s leaders want to find people already within their company that would make great data scientists and offer them some type of incentive to come to the data sciences organization. They could look to the DBAs, financial analysts, or IT professionals. But, there are also probably excellent candidates in marketing or operations that would excel as a data scientist after a Nanodegree on Udacity. But how could you find them at relatively low cost? You can scrape Upwork to find highly rated freelance Data Scientists. Cross-reference those people with their LinkedIn profiles and turn their history (experience, education, interests, etc.) into a feature space. Finally, use an unsupervised learning technique to understand what mixtures of characteristics indicate an excellent data scientist. Then you can find those individuals in marketing or operations who have ¾ of the hypothetical characteristics and train them up quickly on the one they don’t. Voilà, now you are looking at this amazing, diverse pool of talent that was invisible to you prior because they didn’t have 8 years of python programming experience. The exercise will likely be very eye-opening at a very rapid rate (dare I say 1-3 days of work?).
It’s important to remember that we are not taking this into production (not yet anyway)… we are using the results to help us be more creative and guide us in the right direction. Machine learning is meant to be a tool to help us solve problems, but it also can show us different perspectives and make us more creative.