These days we hear a lot about Artificial Neural Networks. Facebook uses them to classify different types of text in their posts. Zillow recently started using them to better predict house prices from images. Google even open sourced their technology to help any company build their own.

But what are Neural Networks? And when should you use one?

Put simply, a Neural Network is another application in Machine Learning, though based on how the human brain processes and solves problems. As opposed to regression models that predict an outcome based on a linear relationship between a set of inputs, Neural Networks can algorithmically construct a model based on more complex non-linear relationships.

In our continuing ML 101 Series, we’ll walk through when and how you can use Neural Networks to make predictions on your own data, and some examples of when they’re useful (and when they aren’t).

The Problem of Non-Linear Models

Before diving into how neural networks work, it’s useful to explore the scenarios in which they can be useful.

Let’s take the original case of Binary Classification, where we’re trying to distinguish between two classes of outcomes, such as spam vs. not spam, or fraud vs. not fraud. Previously we saw that we could use Logistic Regression to define a decision boundary to separate two classes of outcomes, as represented by a straight line in Figure 1A above.

But what happens when such a linear separation is not feasible? Often times, you’ll have data that looks more like Figure 1B, where the best way to distinguish between your respective classes requires a non-linear decision boundary.

Such a scenario often emerges when you have many input features. When you have input features (X1, X2, etc.) on the order of hundreds vs. thousands, we often find that simpler Logistic Regression techniques may not be sufficient in fitting a linear decision boundary to your dataset, and we’ll need to construct a non-linear model to better distinguish between your classes of outcomes.

The Basis for Neural Networks

It turns out that the human brain can provide a scalable framework to design non-linear models.

The human brain is composed of neurons or neural units. Each neural unit is composed of dendrites through which input information is received, an axon which processes that information, and an axon terminal that transmits the processed information (see Figure 2A above).

Independently, each neural unit is not that useful. Utility actually emerges when you connect multiple neural units together. If you connect multiple neural units, you’ll find that the output layer (axon terminal) of one unit is actually connected to the input layer (dendrite) of another unit. Each neural unit in turn is being fed inputs from other units, and outputting to even more neural units (similar to the artistic representation in the beginning of this post).

Extending the analogy over to Machine Learning models, we can use a neural unit as the basis for constructing non-linear functions. As seen in Figure 2B above, we can define our artificial neural unit also as a sequence of input and output layers – where the input layer is composed of our explanatory features (x0, x1, x2, etc), and our output layer (a1) is the outputted linear computation of those features.

But an artificial neural unit by itself is still a linear function. The diagram in Figure 1B is actually not that different from the output of a simple Logistic Regression. How do we extend the analogy of a human brain, and construct an actual non-linear function from multiple neural units together?

Designing an Artificial Neural Network

To construct an Artificial Neural Network, we need to define a set of relationships between multiple neural units – combining multiple linear functions to create a non-linear output.

An Artificial Neural Network can be represented as a series of layers, where each layer is a function of the previous one (see Figure 3 above).

Let’s use the example of credit card transactions, where we’re trying to determine the probability that a transaction is fraudulent based on the transaction amount (X1) and time (X2). If we used Logistic Regression as we reviewed previously, our “Input Layer” would map directly to the “Output Layer” with just one logistic function to compute the probability (PX).

Using an Artificial Neural Network, we’ll instead insert a “Hidden Layer” of neural units. Each neural unit is essentially a combination of the Input Layer (X1, X2), with its own parameters (θ) to output an intermediate value (A11 and A12). To compute each value of the neural unit A, we can use the function:

In turn, the Neural Network then uses the output of each neural unit in the Hidden Layer (A11 and A12), as an Input into the final Output Layer (A21) to compute the predicted probability of a fraudulent transaction. So now the probability of fraud (PX) Layer is a function of A11 and A12, which are in turn functions of Amount (X1) and Time (X2) – constructing a non-linear model from the original inputs to the final output.

What this means in totality, is that we can construct additional features (A11 and A12) using distinct linear functions on our original features (X1 and X2), the combination thereof producing the final output (A21) in a non-linear framework to compute a prediction.

When You Should (and Shouldn’t) Use a Neural Network

To review, Artificial Neural Networks help you create non-linear models to better classify and predict outcomes.

Whereas Logistic Regression outputted a prediction based on a single linear unit, Artificial Neural Networks combine multiple neural units to create non-linear relationships. The output creates more powerful and accurate predictions for separating classes of outcomes, especially in scenarios where the classes are not easily separable by linear decision boundaries.

But while Artificial Neural Networks have their benefits, they shouldn’t necessarily be used indiscriminately. Given Artificial Neural Networks involve multiple layers of computations, they are computationally expensive – taking longer to train a model and costing more compute power to generate predictions. Additionally, they are more helpful when you have high dimensional datasets with lots of input features which lend themselves to less linear models.

If your dataset actually has less features (on the order of 100s rather than 1000s), it’s often sufficient to just run a simpler Logistic Regression. The difference in accuracy on lower dimensional sets is marginal, but the savings in compute costs may be significant.

But if compute costs are less of an issue, Artificial Neural Networks provide a powerful tool for more accurate predictions.

This blog post is based on concepts taught in Stanford’s Machine Learning course notes by Andrew Ng on Coursera.