# Machine Learning 101 – Bias-Variance Trade-off This Lecture follows from Lecture 7 on Underfitting and Overfitting. Here we would discuss Bias-Variance Trade-off.

I will try to make this lesson very clear. You already know that in Supervised Learning, we are trying to model a relationship between variables. In doing that, we come up with a function f(x).

With this function, we can calculate f(x0), f(x1)….. just any value for x. So we hope that f(x0) would give us y0, but it doesn’t! It however gives us a value that is close to y0. And same for other values of x.

This means that  our function is not accurate. Therefore there will be some error for each calculation. This error is represented as MSE (Mean Squared Error) and is given as the difference between actual y0 and the f(x0).

We write it as:

MSE     =     E(y0 – f(x0))

We can decompose the MSE  into 3 terms:

• variance of f(x0)
• squared bias of f(x0)
• variance of the error term, ε

Writing these in notations, we have: I recommend you write this out a number time. So you can get used to it. We would not bother about trying to prove it.

To get a good model, then, we need to minimize this error. We can achieve that when:

• reducing the bias term
• reducing the variance term

The error term var(ε) cannot be reduced.

What does bias and variance mean?

Variance here means the amount by which our function f will change if we estimated it b using a different training set. And this variation should not be much if we have a good estimate of f. If However, the variance is high, then a small change in the training data set can produce large changes in the function f.

So generally, the more flexible a statistical method is, the higher the variance.

What of bias?

Bias is the error incurred as a result of approximating a real-life problem using a model.

For instance when we make assumptions of linear relationship or polynomial relationship, we are introducing a bias. This is because, we really don’t know what the relationship looks like. We are just been biased! But we can assume that polynomial regression will do a better job than linear regression because it is more flexible.

So generally, a more flexible statistical method results in less bias.

Let’s now try to understand the relationship between bias, variance, flexibility and model complexity. You can see the graph shown below . • The black line represents the total error
• The blue line represents the variance
• The red line represents the squared bias

Now, what happens as the model complexity increases?

• the squared bias reduces as shown in the red line
• the variance increases as shown in the blue line
• the total error decreases up to some point, then starts increasing again

The point where the total error is minimum is the trade-off point or the point of optimal complexity

I also recommend you read my previous article on Bias-Variance Trade-off. It gives a mode detailed discussion

Also try to review Underfitting and Overfitting.