In Lecture 4, we learnt about the Bayes’ classifier. Here we would see how to minimize misclassfication rate in Bayes classifier. Again, we would review the cancer diagnosis example.

**Review of Cancer Diagnosis Example**

In this example, the doctors need to determine if the patient has cancer or now. To make this decision, they take an X-ray image of the patient. The output of this image is represented a a vector **x** which represents a set of pixels from the X-ray image.

So this is a typical classification problem. They need to classify the patient into one of two classes:

- C
_{1}– cancer present - C
_{2}– cancer absent

If we let k = 1, 2, then we can represent the classes as C_{k}.

So, given the patient’s X-ray image x, what is the probability of Ck (that is cancer is present or absent)?

We can represent this as:

*p(C _{k} | x)*

Do you remember that this is conditional probability?

And by Bayes’ Theorem, this is gotten using the formula:

**When Misclassification occurs**

Misclassification occurs if cancer is present but the doctors decide that it is not.

Representing this with notations, we say that:

- misclassification occurs if the patient belongs to class C
_{2}, but the doctors assign him to C_{1}(false positive) - Or the patient belongs to C
_{1}but the doctors assign him to C_{2}(false negative)

**Misclassification Rate**

This is number of misclassifications divided by total number evaluated

**Minimizing Misclassification Rate**

First, we need to partition the input space into regions. (input space is all possible values **x** can take).

These regions are called decision regions. Let’s represent them using R_{k}. This is because, the number of decision regions equals number of classes. So one region for each class.

Now, all points in R_{k} are assigned to C_{k}.

The boundary between the decision regions are known as **decision surfaces** or **decision boundaries**.

So we need to a rule that would take each input x, and assign it to a region.

From our example, a mistake can occur in two possible ways:

**x**belongs to R_{1}but assigned to C_{2}**x**belongs to R_{2}bu assigned to C_{2}

The probability of a mistake occurring is now, the sum of this two probabilities (remember the sum rule). This is given as

*p(mistake) = p( x ∈ R_{1}, C_{2}) + p(x ∈ R_{2}, C_{1})*

This can be broken down by taking these probabilities over the regions R_{1}, and R_{2}. Therefor we have:

Clearly, to minimize p(mistake), we need to try to assign x to whichever class has the smaller value of the integrand.

Therefore, if *p( x, C_{1}) < p(x, C_{2}*) for a given

**x**, then we should assign

*x*to C

_{1}.

Let’s evaluate p(mistake) further using the product rule.

We know from the product rule that:

*p( x, C_{2}) = p(C_{2} | x)p(x)*

also

*p( x, C_{1}) = p(C_{1} | x ) p(x)*

If we use this in the formula for p(mistake), we would then have:

You can now see that the terms in the integral is either p(C_{2} | **x**) d**x** or

p(C_{1} | **x**) d**x**

These conditional probabilities can work if we already know the posterior probabilities of C_{1} and C_{2} after obtaining x.

Therefore, the probability of misclassification p(mistake) is minimized if each value of x is assigned to the class with the largest posterior probability.

**Probability of Correct Classification**

Similarly, we can apply the same method to obtain *p(correct).*

I recommend this as an exercise for you.

## One thought on “Machine Learning 101 – Minimizing Misclassification Rate in Bayes’ Classifier”