We will now consider some of the important rules of probability. Meanwhile we would also understand the meaning of terms along the line. They include:

Some term you need to know includes *Joint Probability* and *Marginal Probability*

Let’s start with Conditional Probability.

**1. Conditional Probability**

We would use the example of the box of apples and oranges from Lecture 8 (Introduction to Probability Theory). And we would illustrate by an example.

Assume that we randomly select a box. Then from this box, we randomly pick a fruit without replacing the first one. If we already know the probability of picking a box to be P(B), then what is the probability P(F) that the fruit is an apple.

In other words, given a known probability, P(B), what is the probability of P(F).

This is written of the form:

P(F | B)

and read as: the conditional probability of F, given B.

**2. The Sum Rule**

In this case, we would state the Sum Rule and then explain it. Later, we would apply it to our apple and oranges example.

Note that I’m using upper and lower case* p* the same.

We would have written it in terms of the boxes example but it would be clearer we understand the formula.

P(X, Y) is known as the** joint probability **of X and Y. It is read as the probability of X and Y.

Also, P(X) is known as marginal probability of X.

Therefore, the sum rule simply means that we can find the probability of X by summing up all the joint probabilities of X over Y. But you may ask, how do we find the joint probability?

We get it using the product rule!

**3. The Product Rule**

As mentioned, the product rule helps us find joint probability. The product rule states:

What if we interchange X and Y? The know about the symmetry property which says that the product rule is same as:

Here P(Y, X) is the joint probability of Y and X while P(X | Y) is the conditional probability of X given Y.

Finally, P(X) is the * marginal probability* of X (or just the

*probability of X*). But you may ask: how do we find conditional probability?

We get it using the Bayes’ Theorem!

**4. Bayes Theorem**

Bayes theorem helps us find conditional probability. It simply derived from the product rule.

If we rewrite the product rule in terms of P(X|Y) we would have:

Now we can use the symmetry property from the product rule to replace the numerator. The we have:

This is the legendary Bayes’ theorem!

I would recommend you take some time to get it around your head. Maybe, write it out a number of times. Also see how you can derive it.

**5. Summary**

What have we learnt so far?

- First you now understand the terms, conditional probability, marginal probability and joint probability
- You now know of the sum rule which helps us find the marginal probability. It states that the marginal probability of X is the sum of joint probabilities of X and Y over Y
- You also now know of the product rule which helps us find the joint probability
- You also know know of Bayes’ theorem which helps us find the conditional probabilities.

In the next class, we would see how we can apply all of these to solve a problem

## 5 thoughts on “Machine Learning 101 – Rules of Probability & Bayes’ Theorem”