Machine Learning 101 – Rules of Probability & Bayes’ Theorem

We will now consider some of the important rules of probability. Meanwhile we would also understand the meaning of terms along the line. They include:

  1. Conditional Probability
  2. Sum Rule
  3. Product Rule
  4. Bayes Theorem
  5. Summary

Some term you need to know includes Joint Probability and Marginal Probability

Let’s start with Conditional Probability.


1. Conditional Probability

We would use the example of the box of apples and oranges from Lecture 8 (Introduction to Probability Theory).  And we would illustrate by an example.

Assume that we randomly select a box. Then from this box, we randomly pick a fruit without replacing the first one. If we already know the probability of picking a box to be P(B), then what is the probability P(F) that the fruit is an apple.

In other words, given a known probability, P(B), what is the probability of P(F).

This is written of the form:

P(F | B)

and read as: the conditional probability of F, given B.



2. The Sum Rule

In this case, we would state the Sum Rule and then explain it. Later, we would apply it to our apple and oranges example.

Note that I’m using upper and lower case p the same.

Sum rule

We would have written it in terms of the boxes example but it would be clearer we understand the formula.

P(X, Y) is known as the joint probability of X and Y. It is read as the probability of X and Y.

Also, P(X) is known as marginal probability of X.

Therefore, the sum rule simply means that we can find the probability of X by summing up all the joint probabilities of X over Y. But you may ask, how do we find the joint probability?

We get it using the product rule!



3. The Product Rule

As mentioned, the product rule helps us find joint probability. The product rule states:

Product Rule

What if we interchange X and Y? The know about the symmetry property which says that the product rule is same as:

Symmetry property of product rule

Here P(Y, X) is the joint probability of Y and X while P(X | Y) is the conditional probability of X  given Y.

Finally, P(X) is the marginal probability of X (or just the probability of X).  But you may ask: how do we find conditional probability?

We get it using the Bayes’ Theorem!



4. Bayes Theorem

Bayes theorem helps us find conditional probability.  It simply derived from the product rule.

If we rewrite the product rule  in terms of P(X|Y)  we would have:

Bayes Rule 1

Now we can use the symmetry property from the product rule to replace the numerator. The we have:

Final Bayes Theorem

This is the legendary Bayes’ theorem!

I would recommend you take some time to get it around your head. Maybe, write it out a number of times. Also see how you can derive it.


5. Summary

What have we learnt so far?

  • First you now understand the terms, conditional probability, marginal probability and joint probability
  • You now know of the sum rule which helps us find the marginal probability. It states that the marginal probability of X is the sum of joint probabilities of X and Y over Y
  • You also now know of the product rule which helps us find the joint probability
  • You also know know of Bayes’ theorem which helps us find the conditional probabilities.

In the next class, we would see how we can apply all of these to solve a problem

Admin bar avatar


Kindson Munonye is currently completing his doctoral program in Software Engineering in Budapest University of Technology and Economics

View all posts by kindsonthegenius →

5 thoughts on “Machine Learning 101 – Rules of Probability & Bayes’ Theorem

Leave a Reply

Your email address will not be published. Required fields are marked *