Difference between revisions of "Central limit theorem by example"
m |
m |
||
| (33 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | The Central Limit Theorem | + | The Central Limit Theorem is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem: |
Let <math>Y_1, Y_2,\dots ,Y_n, \dots </math> be a sequence of independent identically distributed random variables with mean <math>\mu </math> and variance <math>\sigma ^2 </math>. Let <math>\overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n </math> and | Let <math>Y_1, Y_2,\dots ,Y_n, \dots </math> be a sequence of independent identically distributed random variables with mean <math>\mu </math> and variance <math>\sigma ^2 </math>. Let <math>\overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n </math> and | ||
| Line 44: | Line 44: | ||
as required. | as required. | ||
| + | |||
| + | '''Example: sums of uniformly distributed random variables''' | ||
| + | |||
| + | Let <math> Y_1, Y_2, \dots , Y_n,\dots </math> be independent random variables having exponential distribution with mean 1. Let <math> X_n = Y_1+Y_2+\dots +Y_n </math> for all <math> n=1, 2,\dots </math>. We will find the densities of <math> X_2, X_3, X_4 </math> and graph them. | ||
| + | |||
| + | Note that each <math> Y_n </math> has the density <math> f(y) =f_1(y) = e^{-y} </math> for <math> y \ge 0 </math> (and <math> 0 </math> for <math> y <0 </math>). Further, the density <math> f_n </math> of <math> X_n </math> is | ||
| + | |||
| + | <math> f_n (y) = (f*f*\dots *f) (y) </math> (<math> n </math> -fold convolution). | ||
| + | |||
| + | Computing these convolutions (either directly or using software) we get: | ||
| + | |||
| + | <math> f_1(y) = | ||
| + | \begin{cases} | ||
| + | e^{-y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | [[File:Exp1.png|alt=Exponential density|thumb|Exponential density|none]] | ||
| + | <math> f_2(y) = | ||
| + | \begin{cases} | ||
| + | \frac{1}{2}y^{2} e^{- y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | [[File:Exp2.png|alt=Density f2|thumb|Density of Y1+Y2|none]] | ||
| + | <math> f_3(y) = | ||
| + | \begin{cases} | ||
| + | \frac{1}{6}y^{3} e^{- y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | [[File:Exp3.png|alt=Density f3|thumb|Density of Y1+Y2+Y3|none]] | ||
| + | <math> f_4(y) = | ||
| + | \begin{cases} | ||
| + | \frac{1}{24}y^{4} e^{- y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | [[File:Exp4.png|alt=Density f4|thumb|Density of Y1+Y2+Y3+Y4|none]] | ||
| + | <math> f_5(y) = | ||
| + | \begin{cases} | ||
| + | \frac{1}{120}y^{5} e^{- y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | [[File:Exp5.png|alt=Density f5|thumb|Density of Y1+Y2+Y3+Y4+Y5|none]] | ||
| + | |||
| + | Note: it can be shown that | ||
| + | |||
| + | <math> f_n(y) = | ||
| + | \begin{cases} | ||
| + | \frac{1}{n!}y^{n} e^{- y}, & y>0 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | |||
| + | (use induction or the fact that sum of independent identically distributed exponential random variables has Gamma distribution; the latter can be shown using moment-generating functions). | ||
| + | |||
'''Example: sums of uniformly distributed random variables''' | '''Example: sums of uniformly distributed random variables''' | ||
| Line 50: | Line 108: | ||
Note that each <math> Y_n </math> has the density <math> f(y) = 1 </math> for <math> 0\le y \le 1 </math> (and <math> 0 </math> outside of <math> [0,1] </math> ). Further, the density <math> f_n </math> of <math> X_n </math> is | Note that each <math> Y_n </math> has the density <math> f(y) = 1 </math> for <math> 0\le y \le 1 </math> (and <math> 0 </math> outside of <math> [0,1] </math> ). Further, the density <math> f_n </math> of <math> X_n </math> is | ||
| − | <math> f_n (y) = (f*f*\dots f) (y) </math> (<math> n </math> -fold convolution). | + | |
| + | <math> f_n (y) = (f*f*\dots *f) (y) </math> (<math> n </math> -fold convolution). | ||
| + | |||
| + | Computing these convolutions (either directly or using software) we get: | ||
| + | |||
| + | <math> f_1(y) = | ||
| + | \begin{cases} | ||
| + | y, & 0\le y \le 1 \\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | |||
| + | [[File:uni1.png|alt=Uniform density|thumb|Uniform density|none]] | ||
| + | |||
| + | <math> f_2(y) = | ||
| + | \begin{cases} | ||
| + | y, & 0\le y \le 1 \\ | ||
| + | 2-y, & 1\le y \le 2\\ | ||
| + | 0, &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | </math> | ||
| + | |||
| + | [[File:uni2.png|alt=Density f2|thumb|Density of Y1+Y2|none]] | ||
| + | |||
| + | <math> f_3(y) = | ||
| + | \begin{cases} | ||
| + | \frac{y^{2}}{2} & 0\le y \le 1 \\ | ||
| + | - y^{2} + 3 x - \frac{3}{2} & 1\le y \le 2 \\ | ||
| + | \frac{y^{2}}{2} - 3 y + \frac{9}{2} & 2\le y \le 3\\ | ||
| + | 0 &\mbox{ otherwise} | ||
| + | \end{cases} | ||
| + | |||
| + | </math> | ||
| + | |||
| + | [[File:uni3.png|alt=Density f3|thumb|Density of Y1+Y2+Y3|none]] | ||
| + | |||
| + | <math> f_4(y) | ||
| + | </math> | ||
| + | |||
| + | [[File:uni4.png|alt=Density f4|thumb|Density of Y1+Y2+Y3+Y4|none]] | ||
Latest revision as of 21:11, 16 December 2021
The Central Limit Theorem is one of the most fundamental results in Probability and Statistics, that provides numerous applications and, to some extent, "explains" ubiquity of normal distribution. Below is one of the versions of this theorem:
Let [math]\displaystyle{ Y_1, Y_2,\dots ,Y_n, \dots }[/math] be a sequence of independent identically distributed random variables with mean [math]\displaystyle{ \mu }[/math] and variance [math]\displaystyle{ \sigma ^2 }[/math]. Let [math]\displaystyle{ \overline{Y}_n = (Y_1+Y_2+\dots +Y_n)/n }[/math] and
[math]\displaystyle{ X_n = \frac{\overline{Y}_n-\mu}{\sigma/\sqrt{n}}. }[/math]
Then [math]\displaystyle{ \{ X_n\}_{n=1}^\infty }[/math] converges in distribution to the standard normal random variable, i.e.
[math]\displaystyle{ \lim _{n\to\infty} P(X_n\le x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^{-t^2/2}\,dt }[/math]
for all [math]\displaystyle{ x }[/math].
While the proof of this theorem is often beyond the scope of introductory undergraduate probability and statistics courses, there are several "convincing" examples that make the statement of the theorem very plausible. Below we provide two such example.
Bernoulli trials and Binomial distribution
Let [math]\displaystyle{ Y_1,Y_2,\dots , Y_n,\dots }[/math] be random variables representing Bernoulli trials, i.e. [math]\displaystyle{ P(Y_n=1)=p }[/math] and [math]\displaystyle{ P(Y_n=0)=1-p }[/math] for all [math]\displaystyle{ n }[/math]. Then [math]\displaystyle{ X_n= Y_1+Y_2+\dots +Y_n }[/math] has Binomial distributions with parameters [math]\displaystyle{ p }[/math] and [math]\displaystyle{ n }[/math]. A concrete examples here would be rolling a die repeatedly, with success being, say, rolling a 1. For smaller [math]\displaystyle{ n }[/math] (e.g. [math]\displaystyle{ n= 10 }[/math]) the Binomial histogram is not symmetric. However, for larger [math]\displaystyle{ n }[/math] the histogram of the distribution of [math]\displaystyle{ X_n }[/math] resembles the normal density curve.
Convolution
Recall that the convolution of two functions [math]\displaystyle{ f \text{ and } g }[/math] is defined by [math]\displaystyle{ (f*g)(x) = \int_{-\infty}^\infty f(t) g(x-t)\, dt }[/math] and that the convolution has the following properties:
Commutativity: [math]\displaystyle{ f*g = g*f }[/math]
Associativity: [math]\displaystyle{ (f*g)*h = f*(g*h) }[/math]
Distributivity: [math]\displaystyle{ f*(ag+bh)= a(f*g)+b(f*h) }[/math]
Differentiation: [math]\displaystyle{ (f*g)' = (f')*g=f*(g') }[/math]
We will prove first that if [math]\displaystyle{ Y_1 }[/math] and [math]\displaystyle{ Y_2 }[/math] are independent random variables with densities [math]\displaystyle{ f_1 }[/math] and [math]\displaystyle{ f_2 }[/math] then the density of their sum [math]\displaystyle{ Y_1+Y_2 }[/math] is the convolution [math]\displaystyle{ f_1*f_2 }[/math].
Let [math]\displaystyle{ F, F_1, \text{ and }F_2 }[/math] denote the cumulative distribution functions of [math]\displaystyle{ Y_1+Y_2, Y_1, \text{ and }Y_2, }[/math] respectively. Let [math]\displaystyle{ f }[/math] denote the density of [math]\displaystyle{ Y_1+Y_2 }[/math]. Note that [math]\displaystyle{ f_1(y_1)f_2(y_2) }[/math] is the joint density of [math]\displaystyle{ (Y_1,Y_2). }[/math] For all [math]\displaystyle{ y }[/math] we have:
[math]\displaystyle{ \int_{-\infty} ^y f(t)\, dt = F(y) = P(Y_1+Y_2\le y) = \int_{-\infty}^{\infty} \int_{-\infty}^{y-y_1} f_1(y_1)f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) \int_{-\infty}^{y-y_1} f_2(y_2) \,dy_2dy_1 = \int_{-\infty}^{\infty} f_1(y_1) F_2(y-y_1) dy_1 . }[/math]
Summarizing, and replacing [math]\displaystyle{ y_1 }[/math] with [math]\displaystyle{ t }[/math], for all [math]\displaystyle{ y }[/math] we get:
[math]\displaystyle{ F(y) = \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt }[/math]
Taking derivative with respect to [math]\displaystyle{ y }[/math] we get:
[math]\displaystyle{ f(y) = \frac{dF(y)}{dy} = \frac{d}{dy} \int_{-\infty}^{\infty} f_1(t) F_2(y-t) dt = \int_{-\infty}^{\infty} f_1(t) \frac{dF_2(y-t)}{dy} dt = \int_{-\infty}^{\infty} f_1(t) f_2(y-t)dt =(f_1*f_2)(y), }[/math]
as required.
Example: sums of uniformly distributed random variables
Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables having exponential distribution with mean 1. Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math] for all [math]\displaystyle{ n=1, 2,\dots }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.
Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) =f_1(y) = e^{-y} }[/math] for [math]\displaystyle{ y \ge 0 }[/math] (and [math]\displaystyle{ 0 }[/math] for [math]\displaystyle{ y \lt 0 }[/math]). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is
[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).
Computing these convolutions (either directly or using software) we get:
[math]\displaystyle{ f_1(y) = \begin{cases} e^{-y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_2(y) = \begin{cases} \frac{1}{2}y^{2} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_3(y) = \begin{cases} \frac{1}{6}y^{3} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_4(y) = \begin{cases} \frac{1}{24}y^{4} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_5(y) = \begin{cases} \frac{1}{120}y^{5} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
Note: it can be shown that
[math]\displaystyle{ f_n(y) = \begin{cases} \frac{1}{n!}y^{n} e^{- y}, & y\gt 0 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
(use induction or the fact that sum of independent identically distributed exponential random variables has Gamma distribution; the latter can be shown using moment-generating functions).
Example: sums of uniformly distributed random variables
Let [math]\displaystyle{ Y_1, Y_2, \dots , Y_n,\dots }[/math] be independent random variables uniformly distributed on [math]\displaystyle{ [0,1] }[/math] . Let [math]\displaystyle{ X_n = Y_1+Y_2+\dots +Y_n }[/math]. We will find the densities of [math]\displaystyle{ X_2, X_3, X_4 }[/math] and graph them.
Note that each [math]\displaystyle{ Y_n }[/math] has the density [math]\displaystyle{ f(y) = 1 }[/math] for [math]\displaystyle{ 0\le y \le 1 }[/math] (and [math]\displaystyle{ 0 }[/math] outside of [math]\displaystyle{ [0,1] }[/math] ). Further, the density [math]\displaystyle{ f_n }[/math] of [math]\displaystyle{ X_n }[/math] is
[math]\displaystyle{ f_n (y) = (f*f*\dots *f) (y) }[/math] ([math]\displaystyle{ n }[/math] -fold convolution).
Computing these convolutions (either directly or using software) we get:
[math]\displaystyle{ f_1(y) = \begin{cases} y, & 0\le y \le 1 \\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_2(y) = \begin{cases} y, & 0\le y \le 1 \\ 2-y, & 1\le y \le 2\\ 0, &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_3(y) = \begin{cases} \frac{y^{2}}{2} & 0\le y \le 1 \\ - y^{2} + 3 x - \frac{3}{2} & 1\le y \le 2 \\ \frac{y^{2}}{2} - 3 y + \frac{9}{2} & 2\le y \le 3\\ 0 &\mbox{ otherwise} \end{cases} }[/math]
[math]\displaystyle{ f_4(y) }[/math]








