Explaining Your Machine Learning Model (or 5 Ways to Assess Feature Importance)

source: xkcd

When productionizing a machine learning model, simply outputting a propensity in a black box isn’t always sufficient. We often want to understand which features in the model are most important. We want actionable insights.

Assessing the why can thus be just as important as the how. Knowing which features, inputs, or variables in a model are influencing its effectiveness is valuable to improving its actionability. Case in point, knowing which user is predicted to churn can help you fill a leaky bucket, but understanding why users are likely to churn can help to close the leak before it occurs.

Assessing feature importance though is not straightforward. There are many ways to assess how a variable is influencing a model, and they have their respective benefits and tradeoffs. Certain methods produce more consistent results, while others are only applicable to specific models. Below we outline five ways of addressing feature importance, with a focus on logistic regression models for simplicity.

Method #0: Pearson Correlations (or why you shouldn’t use them)

A quick and dirty way to discern relationships between a feature and a goal are Pearson Correlations.

Correlations assess the strength of a linear relationship between two variables. They explain how much a change in your input variable will affect the output you’re trying to predict, producing a value between -100% and 100%.

While simple to grasp (and straightforward to compute in even Excel), correlations are limited in their utility. For one, they only explain linear relationships between variables. They provide little value in assessing slopes or nonlinear relationships (see the second and third rows in the image above). But more importantly, Pearson Correlations only explain patterns in your underlying data, not what your machine learning model actually learns. The standard mantra “correlations do not imply causation” applies.

In machine learning, your trained model learns the underlying relationships between not just your input and target variable, but interactions between your respective variables. Correlations don’t account for these interactions, and are not determined by causal inference. Case in point, the # of firefighters responding to a forest fire are likely highly correlated to the severity of the fire, but that does not mean they caused the fire itself.

So how do we assess actual causal relationships?

Method #1: Model Coefficients

To represent feature importance by causal relationships, one option is to deconstruct the machine learning model itself into its components. In the case of Logistic Regression, this constitutes evaluating the trained model coefficients.

As previously noted, Logistic Regression models train a probabilistic function to evaluate the propensity of an outcome (P) based on some input variables (X). Model training produces a series of coefficients (θ) or weights for those variables that can best classify an outcome.

The coefficients can actually in turn be used to rank your respective features or inputs. As the coefficients indicate the relative contribution of each variable to the probability function (see figure above), ranking them from largest to smallest is essentially a proxy for feature importance.

The positives of this approach are that the feature rankings are identical for all users. As the coefficients are an inherent property of the trained model, the rankings you get are relatively stable across users. This produces a consistent method of analysis for feature importance.

Some caveats with this approach to consider though. First, before training your model you have to ensure all your features are on the same scale. Without scaling, variables with different ranges would have different coefficient ranges, and not comparable. Also if you have a lot of features, there is a possibility of unstable coefficients due to interactions between collinear features. In this case you should implement regularization techniques like L2 regularization during model training, to reduce the impact of these effects.

Lastly, while the rankings produced via coefficients are stable, they are not inherently interpretable. The coefficient values don’t necessarily have semantic or actionable meaning. So it can be difficult for non-technical stakeholders to gauge how much actionable influence a change in a feature will result in towards their goal.

Method #2: Odds Ratio

A similar methodology for feature importance, but with a little more semantic meaning, is the Odds Ratio.

Odds Ratio for a feature constitutes a couple of concepts. The “odds” indicate the probability of an outcome occurring vs. the probability of it not. The Odds Ratio for a particular feature in turn refers to the multiplicative increase in the odds due to a 1-unit change in a feature.

For a Logistic Regression model specifically, the math works out to a simple formula (see figure above), wherein the Odds Ratio for a particular feature can be derived from the euler's exponent of the respective coefficient (θ).

The benefit of this approach is similar to Method #1, in that it is derived from the components of the model. The output is for one a really simple calculation (e^θ). You also get a stable ranking of values for your features agnostic to an individual user (assuming you’ve scaled and regularized them pre model-training). And there is some added interpretability to the ranking values, in that each metric indicates how a change in a feature will affect the outcome.

Still, the interpretation of the rankings is based on the “odds” of an outcome - a not completely interpretable concept. The odds of an outcome are still a degree removed from explaining direct changes in behavior, so can be a bit confusing for nontechnical stakeholders.

Method #3: Change in Probability via Dropping Feature

A more interpretable methodology for feature importance can come in evaluating the change in probability.

There are several ways to evaluate a feature or variable’s contribution to the change in probability. The simplest is to just compute the probability of an outcome with all your features, and compare it to the probability with the feature of interest removed. The percentage change in probability would constitute your sensitivity metric or ranking.

A quick footnote on this approach is that as opposed to Methodologies #1 and #2, this approach requires providing inputs for the variables (X) to actually compute a probability. As such, you should use the mean, median, or mode feature vectors, to have a consistent baseline for probability that you are perturbing.

The primary benefit of this approach is that you get an actionable interpretation to your feature rankings. Each ranking metric or sensitivity constitutes the change in probability for your average user given the variable of interest.

The negatives to the approach though lie in that it has to be evaluated for a specific user, and by consequence each user will have different feature rankings. To ensure stability in the rankings you can evaluate the sensitivities at the mean or median feature vectors as recommended above, but it’s still a generalization.

Nevertheless, computing the change in probability due to a feature provides one of the more interpretable methodologies for feature importance - permitting for explanations like “The average user who does feature_1 is 34% more likely to perform the goal”.

Method #4: Change in Probability via Slopes

Another method for calculating a variable’s contribution to the change in probability is to evaluate its slope.

Computing the slope of your probabilistic model with respect to a feature is also straightforward. Instead of dropping a particular feature, you can perturb it by some small value (ε). More specifically, you can compare the change in probability with a positive/negative perturbation (see figure above), and evaluate the percentage difference.

The effect of this approach is similar to Methodology #3 in dropping a feature - you gain an actionable and interpretable ranking system of your respective features. You use a change in probability as a mechanism to assess the sensitivity of a feature to your probabilistic outcomes. It however carries some of the same tradeoffs of Methodology # 3 as well, in that each user will have different rankings for each feature, and so you’ll need to assess the respective probabilities at an average/median value to have a stable baseline.

One benefit however, is that this approach can be applied to models outside of just Logistic Regression. As indicated in the derivation above, the outcome of the calculation is independent of the inherent logistic function or its coefficients, and instead involves just perturbing the underlying feature value to extrapolate a change in probability. So in theory, you could leverage this same method of analysis for SVM, Decision Tree, or even Neural Networks to establish feature importance.

Method #5: Change in Probability via Partial Derivatives

The last methodology for evaluating changes in probability to ascertain feature importance involve a slight iteration of the previous method. Instead of evaluating approximations for the slope, we can evaluate the true slope by assessing the derivative of the machine learning function instead.

The Partial Derivative specifically assesses the first derivative of the Logistic Function with respect to the feature of interest. Simplifying a lot of the derivation, you’d arrive at the function noted above (see figure above).

A nice artifact of the Partial Derivative approach is its consistency from a computational approach. The right side of the equation ends up being a constant value for every variable evaluated, as the probability output is evaluated for all features. Indeed, the only difference in the Partial Derivative between features is it’s respective model coefficient (θ).

The caveat to this approach is similar to those prior. Namely, calculating the Partial Derivative still requires choosing a respective median/mode feature vector to compute the respective probability inputs. Additionally, this specific derivation for the Partial Derivative is only applicable to Logistic Regression models.

But the Partial Derivative approach provides a mathematically stable method for evaluating contributions to the unit change in probability - in turn a reasonably actionable and interpretable metric for ranking features.


Between these five respective approaches, you have several options to choose from in evaluating feature importance. Each have their benefits and tradeoffs between accuracy, stability, and interpretability, but all can provide reasonable approaches to understanding causal interactions between your user’s actions and the resulting change in probabilistic outcomes for your predicted goal.

*If you're interested in solving problems related to feature importance and automating machine learning, check out more at clearbrain.com or email us at founders@clearbrain.com.