Central limit theorem by example

From MathTank
Revision as of 21:44, 12 December 2021 by Alexandk (talk | contribs)
Jump to navigation Jump to search

The Central Limit Theorem (CLT for short) is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem:

Let [math]\displaystyle{ Y_1, Y_2,\dots ,Y_n, \dots }[/math] be a sequence of independent identically distributed random variables with mean [math]\displaystyle{ \mu }[/math] and variance [math]\displaystyle{ \sigma ^2 }[/math]. Let [math]\displaystyle{ \overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n }[/math] and

[math]\displaystyle{ X_n = \frac{\overline{Y}_n-\mu}{\sigma/\sqrt{n}}. }[/math]

Then [math]\displaystyle{ \{ X_n\}_{n=1}^\infty }[/math] converges in distribution to the standard normal random variable, i.e.

[math]\displaystyle{ \lim _{n\to\infty} P(X_n\le x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-t^2/2}\,dt }[/math]

for all [math]\displaystyle{ x }[/math].

While the proof of this theorem is often beyond the scope of introductory undergraduate probability and statistics courses, there are several "convincing" examples that make the statement of the theorem very plausible. Below we provide two such example.

Bernoulli trials and Binomial distribution

Let [math]\displaystyle{ Y_1,Y_2,\dots , Y_n,\dots }[/math] be random variables representing Bernoulli trials, i.e. [math]\displaystyle{ P(Y_n=1)=p }[/math] and [math]\displaystyle{ P(Y_n=0)=1-p }[/math] for all [math]\displaystyle{ n }[/math]. Then [math]\displaystyle{ X_n= Y_1+Y_2+\dots +Y_n }[/math] has Binomial distributions with parameters [math]\displaystyle{ p }[/math] and [math]\displaystyle{ n }[/math]. A concrete examples here would be rolling a die repeatedly, with success being, say, rolling a 1. For smaller [math]\displaystyle{ n }[/math] (e.g. [math]\displaystyle{ n= 10 }[/math]) the Binomial histogram is not symmetric. However, for larger [math]\displaystyle{ n }[/math] the histogram of the distribution of [math]\displaystyle{ X_n }[/math] resembles the normal density curve.

Convolution

Recall that the convolution of two functions [math]\displaystyle{ f \text{ and } g }[/math] is defined by [math]\displaystyle{ (f*g)(x) = \int_{-\infty}^\infty f(t) g(x-t)\, dt }[/math] and that the convolution has the following properties:

Commutativity: [math]\displaystyle{ f*g = g*f }[/math]

Associativity: [math]\displaystyle{ (f*g)*h = f*(g*h) }[/math]

Distributivity: [math]\displaystyle{ f*(ag+bh)= a(f*g)+b(f*h) }[/math]

Differentiation: [math]\displaystyle{ (f*g)' = (f')*g=f*(g') }[/math]

We will prove first that if [math]\displaystyle{ Y_1 }[/math] and [math]\displaystyle{ Y_2 }[/math] are independent random variables with densities [math]\displaystyle{ f_1 }[/math] and [math]\displaystyle{ f_2 }[/math] then the density of their sum [math]\displaystyle{ Y_1+Y_2 }[/math] is the convolution [math]\displaystyle{ f_1*f_2 }[/math].

Let [math]\displaystyle{ F, F_1, \text{ and }F_2 }[/math] denote the cumulative distribution functions of [math]\displaystyle{ Y_1+Y_2, Y_1, \text{ and }Y_2, }[/math] respectively. Let [math]\displaystyle{ f }[/math] denote the density of [math]\displaystyle{ Y_1+Y_2 }[/math]. Note that [math]\displaystyle{ f_1(y_1)f_2(y_2) }[/math] is the joint density of [math]\displaystyle{ (Y_1,Y_2). }[/math] For all [math]\displaystyle{ y }[/math] we have:

[math]\displaystyle{ \int_{-\infty} ^y f(t)\, dt = F(y) = P(Y_1+Y_2\le y) = \int_{-\infty}^{\infty} \int_{-\infty}^{y-y_1} f_1(y_1)f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) \int_{-\infty}^{y-y_1} f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) F_2(y-y_1) dy_1 . }[/math]

Summarizing, and replacing [math]\displaystyle{ y_1 }[/math] with [math]\displaystyle{ t }[/math], for all [math]\displaystyle{ y }[/math] we get:

[math]\displaystyle{ F(y) = \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt }[/math]

Taking derivative with respect to [math]\displaystyle{ y }[/math] we get:

[math]\displaystyle{ f(y) = \frac{dF(y)}{dy} = \frac{d}{dy} \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt = \int_{-\infty}^{\infty} f_1(t) \frac{dF_2(y-t)}{dy} dt = \int_{-\infty}^{\infty} f_1(t) f_2(y-t)dt =(f_1*f_2)(y), }[/math]

as required.

Example: sums of uniformly distributed random variables

Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables having exponential distribution with mean 1. Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math] for all [math]\displaystyle{ n=1, 2,\dots }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.

Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) =f_1(y) = e^{-y} }[/math] for [math]\displaystyle{ y \ge 0 }[/math] (and [math]\displaystyle{ 0 }[/math] for [math]\displaystyle{ y \lt 0 }[/math]). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is

[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).

Computing these convolutions (either directly or using software) we get:

[math]\displaystyle{ f_1(y) = \begin{cases} e^{-y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_2(y) = \begin{cases} \frac{1}{2}y^{2} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_3(y) = \begin{cases} \frac{1}{6}y^{3} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_4(y) = \begin{cases} \frac{1}{24}y^{4} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_5(y) = \begin{cases} \frac{1}{120}y^{5} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Note: it can be shown by induction that

[math]\displaystyle{ f_5(y) = \begin{cases} \frac{1}{n!}y^{n} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Example: sums of uniformly distributed random variables

Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables uniformly distributed on [math]\displaystyle{ [0,1] }[/math] . Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.

Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) = 1 }[/math] for [math]\displaystyle{ 0\le y \le 1 }[/math] (and [math]\displaystyle{ 0 }[/math] outside of [math]\displaystyle{ [0,1] }[/math] ). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is

[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).

Computing these convolutions (either directly or using software) we get:

[math]\displaystyle{ f_2(y) = \begin{cases} y, & 0\le y \le 1 \\ 2-y, & 1\le y \le 2\\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_3(y) = \begin{cases} y, & 0\le y \le 1 \\ 2-y, & 1\le y \le 2\\ 0, &\mbox{ otherwise} \end{cases} }[/math]

[math]\displaystyle{ f_4(y) = \begin{cases} y, & 0\le y \le 1 \\ 2-y, & 1\le y \le 2\\ 0, &\mbox{ otherwise} \end{cases} }[/math]