4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

A seemingly good machine learning model may still be wrong. We’ll show how you can evaluate these issues by assessing metrics of bias vs. variance and precision vs. recall, and present some solutions for such scenarios.

Read more...

What is an Artificial Neural Network?

These days we hear a lot about Artificial Neural Networks. Leading companies, from Facebook, to Google, to Zillow use them throughout their core products. But what are Neural Networks? And when should you use one?

Read more...

How to Predict Yes/No Outcomes Using Logistic Regression

Often we want to predict discrete outcomes in our data. Can an email be designated as spam or not spam? Was a transaction fraudulent or valid?

Read more...

How to Predict Any Value Using Linear Regression

One of the most common questions we have of our data is evaluating the value of something. How many items will we sell next month? How much does it cost to produce them? How much revenue will we make over the year?

Read more...

The Two Types of Machine Learning

In this post, we review common applications of Machine Learning, and the differences between the two subtypes of Supervised vs. Unsupervised Machine Learning.

Read more...

Why You Don’t Need a Data Scientist

Data Science is growing. It’s been called the “sexiest job of the 21st century”, and is attracting a flood of new entrants. But what does a data scientist do? And does your company actually need one?

Read more...

7 Simple Rules to Ensure Data Quality in Your Data Warehouse

When importing data into your data warehouse, you will almost certainly encounter data quality errors at many steps of the ETL pipeline. How do you catch these errors proactively, and ensure data quality in your data warehouse?

Read more...

10 Free Resources for Customer Intelligence

Customer intelligence requires segmenting customers by their company’s properties, such as web traffic, app performance, technology adoption, ad spend, and company size. But how do you identify companies with these properties?

Read more...

We Need To Talk About Data Fragmentation

We analyzed data fragmentation among the Alexa Top 1M domains over the past 3 years. A large fraction used at least one external user or marketing data source, and the rate is growing exponentially at 2.88X Y/Y.

Read more...