In the last lecture, we discussed Bayes’ Classifier. Now, we are going to discuss K-Nearest Neighbors Classifier.
Remember that Bayes Classifier tries to classify X depending on the conditional probability of Y given X. However, the conditional distribution of Y over X is not known. Therefore, we can’t actually use Bayes Classifier in practical scenarios.
One approach would be to estimate the conditional distribution of Y given X. Using this, we then classify any observation to the class with the highest estimated probability.
This is how the K-nearest neighbors (KNN) classifier works. Let’s now examine KNN more closely.
How KNN Works
Start by choosing initial value of and integer K. That is a certain number of data points in the training data. Then choose test observation, say x0.
Next, the KNN identifies the first K points that are closest to x0. These points form a region N0. Then KNN estimates the conditional probability for a class j to be the fraction of point in N0 whose response value equals j. That is points that belong to class j.
This conditional probability is written as:
This equation reads as:
The sum over N0 of the conditional probabilities of Y = j given x0
Finally, Bayes rule is applied to classify the test observation x0 to the class with the largest probability.
Illustrating K-Nearest Neighbors
Let’s illustrate KNN using an example.
In Figure 1 below, we have a plot of the training data set. It’s made up of 6 blue observations and 6 orange observations. Now, we would like to classify the data point marked with a black cross (x).
We would take the following steps:
Step 1: We choose the value of K = 3
Step 2: Identify 3 observations that are nearest to the cross. This is shown enclosed in a green circle. It has two blue points and one orange point
Step 3: Estimate the probability for each class given the data point (marked with cross) we are trying to classify.
P(blue class | observation) = 2/3
P(orange class | observation) = 1/3
Step 4: Draw a conclusion. Since the the blue class has the highest probability given the observation, therefore we classify the black cross as belonging to the blue class.
This process is repeated until all the datapoints is classified. I recommend you watch the video explanation of this.
However, while K-nearest neighbor does well in classification, it is possible that misclassification can could occur. In the next lesson we’ll see how to minimize misclassification