Bayesian Inference is simply a way of making statistical inference by applying Bayes’ Theorem.

Assuming there is a particular hypothesis H.

Let the probability of this hypothesis be p(H).

According the Bayes Inference, we would update this probability are more information (or evidence) becomes available.

To understand Bayes Inference, we need to briefly review Bayes’ Theorem (or Bayes’ Rule)

**Review of Bayes’ Rule**

Just as explained the the previous lesson, we use Bayes’ Theorem to find conditional probability.

But in this discussion, we would say: Bayes’ theorem is used to find posterior probability. This is explained earlier in Application of Bayes’ Theorem.

Posterior probability is derived from two things:

- prior probability
- likelihood function

While prior probability has already been explained, likelihood function is deduced from the observed data.

So let’s now state Bayes’ Theorem

In this equation:

H is the hypothesis such that more information about it is expected. So the probability of H would be affected by new information or evidence, E.

P(H) is called the prior probability. This is the initial probability of H before we receive the evidence or new information E.

E is the evidence which means a new information received

P(H | E) is the posterior probability of H given the new evidence E. Means the probability of H after E has been observed.

P(E | H) similarly is the probability of E given H. This is called ** likelihood**. Since the function of E with H is fixed, it therefore shows compatibility of the evidence E with the hypothesis H. The posterior probability is a function of H while the likelihood function is a function E.

P(E) is the probability of E and is called the marginal likelihood. It is the same for all possible hypothesis being considered.

**Relationship between Prior and Posterior Probabilities**

Now, look at Bayes’ theorem again:

For the hypothesis H, only the factor in the numerator P(E | H) and P(H) actually affect the value of the posterior probability P(H | E).

So you can see that the posterior probability is proportional to the likelihood P(E | H) and the prior probability P(H).

Therefore, the posterior probability is proportional to the prior probability. If we rewrite Bayes’ Theorem to reflect the proportion, we would have:

In term of simple proportions we can write:

*P(H | E) = k. P(H)*

This means that the factor k is would be given by:

This factor is known as the impact of E on the probability of H. That is what effect the new evidence E have on the probability of H, p(H)

**Formal Definition of Bayesian Inference**

Now that you understand Bayes’ Theorem, let’s now define Bayesian Inference.

Let be a data point

Let θ be a parameter of the data point’s distribution. That is x ~ p(x | θ)

Let α b a *hyperparameter* of the parameter distribution. That is θ ~ p(θ | α )

Let X be the sample, that is a set of n observed data points, x_{1}, x_{2},…, xn

Now is the new data point we need to predict the distribution

**Bayesian Inference States that:**

- the prior distribution is the distribution of the parameters prior to observation of any data. That is p(θ | α)
- the sampling distribution is the distribution of the observed data based on its parameters. That is p(X | θ). This is the likelihood. It can also be written in terms of L(θ | X)
- the marginal likelihood is therefore the distribution of the observed data over the parameters.
- similarly, the posterior distribution is the distribution of the parameters after the data is observed. This is of course the Bayes’ rule you know

The derivation if Bayes Inference is given below:

We can state this as:

‘posterior probability equals likelihood times prior over evidence’ or

‘posterior probability is proportional to likelihood times prior’