ClearBrain Blog

Posts By

ClearBrain Team

Why You Don’t Need a Data Scientist

Data Science is growing. It’s been called the “sexiest job of the 21st century”, and is attracting a flood of new entrants. But what does a data scientist do? And does your company actually need one?

What is an Artificial Neural Network?

ClearBrain Team

These days we hear a lot about Artificial Neural Networks. Leading companies, from Facebook, to Google, to Zillow use them throughout their core products. But what are Neural Networks? And when should you use one?

We Need To Talk About Data Fragmentation

ClearBrain Team

We analyzed data fragmentation among the Alexa Top 1M domains over the past 3 years. A large fraction used at least one external user or marketing data source, and the rate is growing exponentially at 2.88X Y/Y.

Using Apache Spark for Machine Learning – Benefits of DataFrames vs. RDDs

ClearBrain Team

After several months building on Apache Spark here are some lessons we learned about the benefits of DataFrame vs RDDs and several situations in which the RDD API may still be preferable.

The Two Types of Machine Learning

ClearBrain Team

In this post, we review common applications of Machine Learning, and the differences between the two subtypes of Supervised vs. Unsupervised Machine Learning.

How to Predict Yes/No Outcomes Using Logistic Regression

ClearBrain Team

Often we want to predict discrete outcomes in our data. Can an email be designated as spam or not spam? Was a transaction fraudulent or valid?

How to Predict Any Value Using Linear Regression

ClearBrain Team

One of the most common questions we have of our data is evaluating the value of something. How many items will we sell next month? How much does it cost to produce them? How much revenue will we make over the year?

How to Automatically Segment Your Data with Clustering

ClearBrain Team

One of the most common analyses we perform is to look for patterns in data. What market segments can we divide our customers into? How do we find clusters of individuals in a network of users?

Explaining Your Machine Learning Model (or 5 Ways to Assess Feature Importance)

ClearBrain Team

Machine Learning can often be a black box. To gain actionable insights, its helpful to know how a variable influences a model. Here we outline 5 ways to assess feature importance to affecting the probability of an outcome.

7 Simple Rules to Ensure Data Quality in Your Data Warehouse

ClearBrain Team

When importing data into your data warehouse, you will almost certainly encounter data quality errors at many steps of the ETL pipeline. How do you catch these errors proactively, and ensure data quality in your data warehouse?

4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

ClearBrain Team

A seemingly good machine learning model may still be wrong. We’ll show how you can evaluate these issues by assessing metrics of bias vs. variance and precision vs. recall, and present some solutions for such scenarios.

10 Free Resources for Customer Intelligence

ClearBrain Team

Customer intelligence requires segmenting customers by their company’s properties, such as web traffic, app performance, technology adoption, ad spend, and company size. But how do you identify companies with these properties?