Difference between revisions of "Central limit theorem by example"

From MathTank
Jump to navigation Jump to search
m
m
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The Central Limit Theorem (CLT for short) is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem:
+
The Central Limit Theorem is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem:
  
 
Let <math>Y_1, Y_2,\dots ,Y_n, \dots </math>  be a sequence of independent identically distributed random variables with mean <math>\mu </math> and variance <math>\sigma ^2 </math>. Let <math>\overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n </math> and
 
Let <math>Y_1, Y_2,\dots ,Y_n, \dots </math>  be a sequence of independent identically distributed random variables with mean <math>\mu </math> and variance <math>\sigma ^2 </math>. Let <math>\overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n </math> and
Line 101: Line 101:
  
 
(use induction or the fact that sum of independent identically distributed exponential random variables has Gamma distribution; the latter can be shown using moment-generating functions).  
 
(use induction or the fact that sum of independent identically distributed exponential random variables has Gamma distribution; the latter can be shown using moment-generating functions).  
 +
  
 
'''Example: sums of uniformly distributed random variables'''
 
'''Example: sums of uniformly distributed random variables'''

Latest revision as of 21:11, 16 December 2021

The Central Limit Theorem is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem:

Let [math]\displaystyle{ Y_1, Y_2,\dots ,Y_n, \dots }[/math] be a sequence of independent identically distributed random variables with mean [math]\displaystyle{ \mu }[/math] and variance [math]\displaystyle{ \sigma ^2 }[/math]. Let [math]\displaystyle{ \overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n }[/math] and

[math]\displaystyle{ X_n = \frac{\overline{Y}_n-\mu}{\sigma/\sqrt{n}}. }[/math]

Then [math]\displaystyle{ \{ X_n\}_{n=1}^\infty }[/math] converges in distribution to the standard normal random variable, i.e.

[math]\displaystyle{ \lim _{n\to\infty} P(X_n\le x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-t^2/2}\,dt }[/math]

for all [math]\displaystyle{ x }[/math].

While the proof of this theorem is often beyond the scope of introductory undergraduate probability and statistics courses, there are several "convincing" examples that make the statement of the theorem very plausible. Below we provide two such example.

Bernoulli trials and Binomial distribution

Let [math]\displaystyle{ Y_1,Y_2,\dots , Y_n,\dots }[/math] be random variables representing Bernoulli trials, i.e. [math]\displaystyle{ P(Y_n=1)=p }[/math] and [math]\displaystyle{ P(Y_n=0)=1-p }[/math] for all [math]\displaystyle{ n }[/math]. Then [math]\displaystyle{ X_n= Y_1+Y_2+\dots +Y_n }[/math] has Binomial distributions with parameters [math]\displaystyle{ p }[/math] and [math]\displaystyle{ n }[/math]. A concrete examples here would be rolling a die repeatedly, with success being, say, rolling a 1. For smaller [math]\displaystyle{ n }[/math] (e.g. [math]\displaystyle{ n= 10 }[/math]) the Binomial histogram is not symmetric. However, for larger [math]\displaystyle{ n }[/math] the histogram of the distribution of [math]\displaystyle{ X_n }[/math] resembles the normal density curve.

Convolution

Recall that the convolution of two functions [math]\displaystyle{ f \text{ and } g }[/math] is defined by [math]\displaystyle{ (f*g)(x) = \int_{-\infty}^\infty f(t) g(x-t)\, dt }[/math] and that the convolution has the following properties:

Commutativity: [math]\displaystyle{ f*g = g*f }[/math]

Associativity: [math]\displaystyle{ (f*g)*h = f*(g*h) }[/math]

Distributivity: [math]\displaystyle{ f*(ag+bh)= a(f*g)+b(f*h) }[/math]

Differentiation: [math]\displaystyle{ (f*g)' = (f')*g=f*(g') }[/math]

We will prove first that if [math]\displaystyle{ Y_1 }[/math] and [math]\displaystyle{ Y_2 }[/math] are independent random variables with densities [math]\displaystyle{ f_1 }[/math] and [math]\displaystyle{ f_2 }[/math] then the density of their sum [math]\displaystyle{ Y_1+Y_2 }[/math] is the convolution [math]\displaystyle{ f_1*f_2 }[/math].

Let [math]\displaystyle{ F, F_1, \text{ and }F_2 }[/math] denote the cumulative distribution functions of [math]\displaystyle{ Y_1+Y_2, Y_1, \text{ and }Y_2, }[/math] respectively. Let [math]\displaystyle{ f }[/math] denote the density of [math]\displaystyle{ Y_1+Y_2 }[/math]. Note that [math]\displaystyle{ f_1(y_1)f_2(y_2) }[/math] is the joint density of [math]\displaystyle{ (Y_1,Y_2). }[/math] For all [math]\displaystyle{ y }[/math] we have:

[math]\displaystyle{ \int_{-\infty} ^y f(t)\, dt = F(y) = P(Y_1+Y_2\le y) = \int_{-\infty}^{\infty} \int_{-\infty}^{y-y_1} f_1(y_1)f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) \int_{-\infty}^{y-y_1} f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) F_2(y-y_1) dy_1 . }[/math]

Summarizing, and replacing [math]\displaystyle{ y_1 }[/math] with [math]\displaystyle{ t }[/math], for all [math]\displaystyle{ y }[/math] we get:

[math]\displaystyle{ F(y) = \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt }[/math]

Taking derivative with respect to [math]\displaystyle{ y }[/math] we get:

[math]\displaystyle{ f(y) = \frac{dF(y)}{dy} = \frac{d}{dy} \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt = \int_{-\infty}^{\infty} f_1(t) \frac{dF_2(y-t)}{dy} dt = \int_{-\infty}^{\infty} f_1(t) f_2(y-t)dt =(f_1*f_2)(y), }[/math]

as required.

Example: sums of uniformly distributed random variables

Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables having exponential distribution with mean 1. Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math] for all [math]\displaystyle{ n=1, 2,\dots }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.

Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) =f_1(y) = e^{-y} }[/math] for [math]\displaystyle{ y \ge 0 }[/math] (and [math]\displaystyle{ 0 }[/math] for [math]\displaystyle{ y \lt 0 }[/math]). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is

[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).

Computing these convolutions (either directly or using software) we get:

[math]\displaystyle{ f_1(y) = \begin{cases} e^{-y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Exponential density
Exponential density

[math]\displaystyle{ f_2(y) = \begin{cases} \frac{1}{2}y^{2} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Density f2
Density of Y1+Y2

[math]\displaystyle{ f_3(y) = \begin{cases} \frac{1}{6}y^{3} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Density f3
Density of Y1+Y2+Y3

[math]\displaystyle{ f_4(y) = \begin{cases} \frac{1}{24}y^{4} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Density f4
Density of Y1+Y2+Y3+Y4

[math]\displaystyle{ f_5(y) = \begin{cases} \frac{1}{120}y^{5} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Density f5
Density of Y1+Y2+Y3+Y4+Y5

Note: it can be shown that

[math]\displaystyle{ f_n(y) = \begin{cases} \frac{1}{n!}y^{n} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

(use induction or the fact that sum of independent identically distributed exponential random variables has Gamma distribution; the latter can be shown using moment-generating functions).


Example: sums of uniformly distributed random variables

Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables uniformly distributed on [math]\displaystyle{ [0,1] }[/math] . Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.

Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) = 1 }[/math] for [math]\displaystyle{ 0\le y \le 1 }[/math] (and [math]\displaystyle{ 0 }[/math] outside of [math]\displaystyle{ [0,1] }[/math] ). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is

[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).

Computing these convolutions (either directly or using software) we get:

[math]\displaystyle{ f_1(y) = \begin{cases} y, & 0\le y \le 1 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Uniform density
Uniform density

[math]\displaystyle{ f_2(y) = \begin{cases} y, & 0\le y \le 1 \\ 2-y, & 1\le y \le 2\\ 0, &\mbox{ otherwise} \end{cases} }[/math]

Density f2
Density of Y1+Y2

[math]\displaystyle{ f_3(y) = \begin{cases} \frac{y^{2}}{2} & 0\le y \le 1 \\ - y^{2} + 3 x - \frac{3}{2} & 1\le y \le 2 \\ \frac{y^{2}}{2} - 3 y + \frac{9}{2} & 2\le y \le 3\\ 0 &\mbox{ otherwise} \end{cases} }[/math]

Density f3
Density of Y1+Y2+Y3

[math]\displaystyle{ f_4(y) }[/math]

Density f4
Density of Y1+Y2+Y3+Y4