Proof of Bayes Rule in Continuous Distributions
This lecture shows how to apply the basic principles of Bayesian inference to the problem of estimating the parameters (mean and variance) of a normal distribution.
Table of contents
-
Unknown mean and known variance
-
The likelihood
-
The prior
-
The posterior
-
The prior predictive distribution
-
The posterior predictive distribution
-
Unknown mean and unknown variance
-
The likelihood
-
The prior
-
The posterior distribution of the mean conditional on the variance
-
The prior predictive distribution conditional on the variance
-
The posterior distribution of the variance
-
The prior predictive distribution
-
The posterior distribution of the mean
The observed sample used to carry out inferences is a vector whose entries are
independent and identically distributed draws
from a normal distribution.
In this section, we are going to assume that the mean of the distribution is unknown, while its variance
is known.
In the next section, also will be treated as unknown.
The likelihood
The probability density function of a generic draw is
where we use the notation
to highlight the fact that the density depends on the unknown parameter
.
Since are independent, the likelihood is
The prior
The prior is that is,
has a normal distribution with mean
and variance
.
This prior is used to express the statistician's belief that the unknown parameter is most likely equal to
and that values of
very far from
are quite unlikely (how unlikely depends on the variance
).
The posterior
Given the prior and the likelihood, specified above, the posterior is where
Proof
Thus, the posterior distribution of is a normal distribution with mean
and variance
.
Note that the posterior mean is the weighted average of two signals:
-
the sample mean
of the observed data;
-
the prior mean
.
The greater the precision of a signal, the higher its weight is. Both the prior and the sample mean convey some information (a signal) about . The signals are combined (linearly), but more weight is given to the signal that has higher precision (smaller variance).
The weight given to the sample mean increases with , while the weight given to the prior mean does not. As a consequence, when the sample size
becomes large, more and more weight is given to the sample mean. In the limit, all weight is given to the information coming from the sample and no weight is given to the prior.
The prior predictive distribution
The prior predictive distribution is where
is an
vector of ones, and
is the
identity matrix.
Proof
Thus, the prior predictive distribution of is multivariate normal with mean
and covariance matrix
Under this distribution, a draw has prior mean
, variance
and covariance with the other draws equal to
. The covariance is positive because the draws
, despite being independent conditional on
, all share the same mean parameter
, which is random.
The posterior predictive distribution
Assume that new observations
are drawn independently from the same normal distribution from which
have been extracted.
The posterior predictive distribution of the vector is
where
is the
identity matrix and
is a
vector of ones.
So, has a multivariate normal distribution with mean
(where
is the posterior mean of
) and covariance matrix
(where
is the posterior variance of
).
Proof
As in the previous section, the sample is assumed to be a vector of IID draws from a normal distribution.
However, we now assume that not only the mean , but also the variance
is unknown.
The likelihood
The probability density function of a generic draw is
The notation
highlights the fact that the density depends on the two unknown parameters
and
.
Since are independent, the likelihood is
The prior
The prior is hierarchical.
First, we assign the following prior to the mean, conditional on the variance: that is,
has a standard normal distribution with mean
and variance
.
Note that the variance of the parameter is assumed to be proportional to the unknown variance
of the data points. The constant of proportionality
determines how tight the prior is, that is, how probable we deem that
is very close to the prior mean
.
Then, we assign the following prior to the variance: that is,
has an inverse-Gamma distribution with parameters
and
(i.e., the precision
has a Gamma distribution with parameters
and
).
By the properties of the Gamma distribution, the prior mean of the precision is and its variance is
We can think of as our best guess of the precision of the data generating distribution.
is the parameter that we use to express our degree of confidence in our guess about the precision. The greater
, the tighter our prior about
is, and the more we deem probable that
is close to
.
The posterior distribution of the mean conditional on the variance
Conditional on , the posterior distribution of
is
where
Proof
Thus, conditional on and
,
is normal with mean
and variance
.
The prior predictive distribution conditional on the variance
Conditional on , the prior predictive distribution of
is
where
is an
vector of ones, and
is the
identity matrix.
Proof
The posterior distribution of the variance
The posterior distribution of the variance is where
Proof
Thus, has a Gamma distribution with parameters
and
.
The prior predictive distribution
The prior predictive distribution of is
that is, a multivariate Student's t distribution with mean
, scale matrix
and
degrees of freedom.
Proof
The posterior distribution of the mean
The posterior distribution of the mean is where
is the Beta function.
Proof
We have already proved that, conditional on and
,
is normal with mean
and variance
We have also proved that, conditional on
,
has a Gamma distribution with parameters
and
. Thus, we can write
where
is standard normal conditional on
and
, and
has a Gamma distribution with parameters
and
. Now, note that, by the properties of the Gamma distribution,
has a Gamma distribution with parameters
and
. We can write
But
has a standard Student's t distribution with
degrees of freedom (see the lecture on the t distribution). As a consequence,
has a Student's t distribution with mean
, scale parameter
and
degrees of freedom. Thus, its density is
where
is the Beta function.
In other words, has a t distribution with mean
, scale parameter
and
degrees of freedom.
Please cite as:
Taboga, Marco (2021). "Bayesian estimation of the parameters of the normal distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/normal-distribution-Bayesian-estimation.
morrisreckessequod1935.blogspot.com
Source: https://www.statlect.com/fundamentals-of-statistics/normal-distribution-Bayesian-estimation