Proof of Berry-Esseen theorem

Debanjan Bhattacharya, Swarnadeep Datta

Introduction

We know that according to Central limit Theorem, for a sequence of i.i.d random variables, the sample mean, after centering by its mean and scaling by its variance, converges in distribution to the standard normal random variable.

The Berry-Esseen Theorem provides us with an upper bound for the maximum possible distance between the cdf of the distribution of the sample mean and that of the standard normal distribution for each sample size n.

The significance of this theorem lies in when \[{n\to\infty}\] It actually helps us in identifying the rate at which the cdf converges by providing us with an upper bound to the distances between the cdf of the sample mean and the standard normal cdf for all values.

Statement of the Theorem

Berry-Esseen Theorem

Suppose,\(\;X_1,\; X_2, \ldots \;are\;i.i.d \;random\; variables\) with \(\mathbb{E}[X_1] = \mu, \hspace{0.1cm} \mathbb{E}[(X_1-\mu)^2] = \sigma^2 ,\hspace{0.1cm} \mathbb{E}[|X_1|^3] < \infty\)

Let \(\;\bar X_n = \frac{1}{n}\times\sum_{i=1}^n X_i\)

Then,

\[\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|\mathbb P(\frac{\sqrt n(\bar X_n - \mu)}{\sigma}\le t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1 - \mu|^3}{\sigma^3}\]

where, the constant C > 0 is a universal constant that does not depend on the distribution of \(X_1,X_2,\ldots\)

Proof of the Theorem

An Important Lemma

Suppose that F is a distribution function and G is a function such that
\(G(-\infty)=0,\) \(\hspace{0.1cm}G(\infty)=1,\) \(\hspace{0.1cm}\)and\(\hspace{0.2cm}\underset{x\in\mathbb R}{sup}\hspace{0.1cm}|G'(x)|\le m<\infty.\)

Let, \(\varphi(t)=\int_{-\infty}^{\infty}e^{itx}\;dF(x), \hspace{0.3cm}\gamma(t)=\int_{-\infty}^{\infty}e^{itx}\;dG(x)\)

Then, for every \(T\hspace{0.1cm}>\hspace{0.1cm}0,\)

\(\hspace{0.2cm}\)\({\underset{x\in\mathbb R}{sup}\hspace{0.1cm}|F(x)-G(x)|\le \frac{1}{\pi}\int_{-T}^{T}|\frac{\varphi(t)-\gamma(t)}{t}|dt\hspace{0.1cm}+\hspace{0.1cm}\frac{24m}{\pi T} }\)


For proof, one can refer to Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley, New York

Some Important Inequalities

  1. \(\;For\;\;n\;=\;1,\; 2\;,\;...,\)

    \(\;\;\;\;|\alpha^n\;-\beta^n|\;\le\;n\;|\alpha\;-\;\beta|\;\gamma^{n-1}\;\;\;for\;\;|\alpha|\;\le\;\gamma\;,\;\;|\beta|\;\le\;\gamma\)


  2. \(\;For\;\;n\;=\;1,\; 2\;,\;...\;\;\&\;\;t\;>\;0,\;\)

    \(\;\;\;\;|e^{it}\;-\;1\;-\;\frac{it}{1!}\;-\;...\;-\;\frac{(it)^{n-1}}{(n-1)!}|\;\le\;\frac{t^n}{n!}\)


  3. \(\;For\;x\;>\;0,\)

    \(\;\;\;\;e^{-x}\;-\;1\;+\;x\;\le\;\frac{1}{2}x^2\;\;\)

Proof of Berry-Esseen Theorem

Firstly, we consider the sequence of i.i.d random variables \(\{{X_n}\}_{n\geq1}\) with \(\mathbb{E}[X_1]\;=\;0,\;\;Var[X_1]\;=\;\mathbb{E}[X_1^2]\;=\;\sigma^2\;\;\&\;\;\mathbb{E}|X_1|^3=\rho<\infty.\)

Let, \(F_n\) be the cdf of \(\frac{\sqrt{n}\bar{X_n}}{\sigma}\;=\;\frac{X_1\;+\;X_2\;+\;...\;+\;X_n}{\sigma\sqrt{n}},\)
\(\;\;\;\;\;\;\varphi_n\) be the characteristic function (cf) of \(\frac{\sqrt{n}\bar{X_n}}{\sigma}\;,\)

\(\;\;\;\;\;\;\varphi\) be the cf of \(X_1.\)

\(\therefore\;\varphi_n(t)\;=\;\mathbb{E}(e^{it\frac{X_1\;+\;X_2\;+\;...\;+\;X_n}{\sigma\sqrt{n}}})\;=\;\mathbb{E}^n(e^{it\frac{X_1}{\sigma\sqrt{n}}})\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\;\varphi^n(\frac{t}{\sigma\sqrt{n}})\)

\(\&\;\;F_n(t)\;=\;\mathbb P(\frac{\sqrt n(\bar X_n)}{\sigma}\le t)\)

Continuing

Hence, now we are to show :
\(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|F_n(t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1|^3}{\sigma^3}\)

Now, from the lemma, here we take \(F\;=\;F_n,\;\;G\;=\;\Phi\)
and for T, we choose

\(\;\;\;\;\;\;\;\;\;\;\;T\;=\;\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\le\frac{4}{3}\sqrt{n}\)

\(\therefore\;using\;the\;lemma,\)
\(\;\;\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\int_{-T}^{T}|\frac{\varphi_n(u)-e^{-\frac{1}{2}x^2}}{u}|\;du\;+\;\frac{24m}{T}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\;\int_{-T}^{T}|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})-e^{-\frac{1}{2}x^2}}{u}|\;du\;+\;\frac{24m}{T}\;\;.....(4)\)

Continuing

Now, we are going to apply inequality (1) on the integrand in RHS of (4).
Hence, taking \(\;\alpha=\varphi(\frac{u}{\sigma\sqrt{n}})\;\&\;\beta=e^{-\frac{1}{2n}x^2},\) we would at first find a \(\;\gamma\) such that \(|\alpha|\le\gamma,\;|\beta|\le\gamma.\)

Now, \(\;|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|=|\int^{\infty}_{-\infty}(e^{itx}-1-itx+\frac{1}{2}t^2x^2)\;dF(x)|\)

\(\implies|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|\le\int^{\infty}_{-\infty}|e^{itx}-1-itx+\frac{1}{2}t^2x^2|\;dF(x)\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le\int^{\infty}_{-\infty}\frac{|tx|^3}{3!}\;dF(x)\;\;\;\;\;\;\)[using inequality (2)]

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=|\frac{t^3}{6}|\int^{\infty}_{-\infty}|x|^3\;dF(x)=\frac{|t|^3}{6}\rho\)

Thus we got, \(|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|\le\frac{|t|^3}{6}\rho\;\;.......(5)\)

Continuing

Now, in the integrand in the inequality (4), we have \(|u|\le T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\le\frac{4}{3}\sqrt{n}\)

\(\implies\frac{1}{2}|\frac{u}{\sigma\sqrt{n}}|^2\;\sigma^2\le\frac{1}{2}\frac{16n}{9n}=\frac{8}{9}<1.\)

Hence, in that integrand, if \(t\) denotes the quantity inside the function \(\varphi\), then \(\frac{1}{2}t^2\sigma^2<1.\)

\(\therefore\;in\;(5),\;\)considering the region of \(t\) where \(\frac{1}{2}t^2\sigma^2<1,\) we have,

\(|\varphi(t)|-(1-\frac{1}{2}\sigma^2t^2)\le\frac{|t|^3}{6}\rho\)

\(\implies |\varphi(t)|\le 1-\frac{1}{2}\sigma^2t^2+\frac{|t|^3}{6}\rho\;......(6)\)

Continuing

Thus, for the function values \(\varphi(\frac{u}{\sigma\sqrt{n}}),\;\)we have the bound of its absolute value as :

\(|\varphi(\frac{u}{\sigma\sqrt{n}})|\le1-\frac{1}{2}\sigma^2\frac{u^2}{\sigma^2n}+\frac{1}{6}\rho\frac{|u|^3}{\sigma^3n^{\frac{3}{2}}}\)

Using \(|u|\le T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n},\) in the last term of RHS of the above inequation, we have,

\(|\varphi(\frac{u}{\sigma\sqrt{n}})|\le1-\frac{5}{18}\frac{u^2}{n}\le e^{-\frac{5}{18}\frac{u^2}{n}}\;.........(7)\)

Continuing

Now for \(\sqrt{n}\le3,\) we have, \(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|F_n(t) - \Phi (t)| \le1\le\frac{C}{\sqrt n}\frac{\mathbb{E}|X_1|^3}{\sigma^3},\) taking \(C=3\) and using the moment inequality \(\frac{\rho}{\sigma^3}\geq1.\)

Hence, we have the inequality of the theorem holding for \(n\le9\) taking \(C=3\). Thus for any \(C=k\) if we can show that the inequality holds for \(n\geq10\), taking \(C=max\{{3,\;k}\}\), we will have our theorem for a common C.

So, now we will consider for \(n\ge10.\)

\(\therefore from\;(7),\;|\varphi(\frac{u}{\sigma\sqrt{n}})|^{n-1}\le e^{-\frac{5}{18}u^2(1-\frac{1}{n})}\le e^{-\frac{5}{18}u^2\frac{9}{10}}=e^{-\frac{1}{4}u^2}\)

\(\;\;\implies\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;|\varphi(\frac{u}{\sigma\sqrt{n}})|\le e^{-\frac{1}{4(n-1)}u^2}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;.......(8)\)

Continuing

Now, for \(n\geq2,\) \(e^{-\frac{1}{2n}u^2}\le e^{-\frac{1}{4(n-1)}u^2}\)

Hence, as we were looking for a \(\gamma\) for \(\alpha=\varphi(\frac{u}{\sigma\sqrt{n}})\;\&\;\beta=e^{-\frac{1}{2n}u^2}\)
we can take \(\gamma=e^{-\frac{1}{4(n-1)}u^2}\)

\(\therefore \; applying\;inequality\;(1),\;we\;have,\)

\(|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}u^2}}{u}|\le n\frac{|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|}{|u|}\; e^{-\frac{1}{4}u^2}\;\;.....(9)\)

Now, we are going to bound \(n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|.\)

Continuing

\(n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|\le n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;1\;+\;\frac{u^2}{2n}|\;+\;n\;|1\;-\;\frac{u^2}{2n}\;+\;e^{-\frac{1}{2}\frac{u^2}{n}}|\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le n\;\frac{\rho}{6\sigma^3n\sqrt{n}}|u|^3\;+\;\frac{n}{2}(\frac{u^2}{2n})^2\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;[using\;inequality\;(3)\;\&\;(5)]\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\frac{\rho}{6\sigma^3\sqrt{n}}|u|^3\;+\;\frac{1}{8n}u^4\;\;....(10)\)

\(From\;(9)\;\&\;(10),\;|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}u^2}}{u}|\le(\frac{\rho}{6\sigma^3\sqrt{n}}u^2\;+\;\frac{1}{8n}|u|^3)\;e^{-\frac{1}{4}u^2}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;.........(11)\)

Continuing

Using (4) & (11), we have,

\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\int_{-T}^{T}(\frac{\rho}{6\sigma^3\sqrt{n}}u^2\;+\;\frac{1}{8n}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)

Now, putting the value of \(\rho\) from \(T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\) in the first term of the integrand
\(\&\;\frac{1}{8n}\le\frac{1}{18T}\) in the second term of the integrand, we ultimately have

\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\frac{1}{T}\int_{-T}^{T}(\frac{2}{9}u^2\;+\;\frac{1}{18}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le\frac{1}{T}\int_{-\infty}^{\infty}(\frac{2}{9}u^2\;+\;\frac{1}{18}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)

\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\frac{1}{T}(\frac{4}{9}\sqrt{\pi}\;+\;\frac{8}{9}\;+\;\frac{24m}{T})\)

Continuing

Now, we know from the lemma that \(m\) is an upper bound for the \(1^{st}\) derivative of \(\Phi\).

\(\therefore\) taking \(m=\frac{1}{\sqrt{2\pi}},\) we get,

\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\frac{1}{T}(\frac{4}{9}\sqrt{\pi}\;+\;\frac{8}{9}\;+\;\frac{24m}{T})\)

Now, putting \(T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n},\) we have,

\(\;\;\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le \frac{C}{\sqrt{n}}\frac{\rho}{\sigma^3},\;\;where\;we\;found\;C\;to\;be\;=\;3.58.\)

Continuing

Now, taking sequence of random variables \(\{X_n\}_{n\geq1}\) with \(\mathbb{E}[X_1]=\mu\), taking the sequence \(\{Y_n\}_{n\geq1}\;with\;Y_n=X_n-\mu\;\;\forall n\) and then proceeding in the same way we will have

\(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|\mathbb P(\frac{\sqrt n(\bar X_n - \mu)}{\sigma}\le t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1 - \mu|^3}{\sigma^3}.\)

Hence, our proof is complete.

Note

According to our ways of proving, we found the universal constant C to be approximately 3.58. But there are other ways of proving the theorem where the value of C can be improved. Unpublished calculations of Esseen (1956) yielded value of C to be \(\le2.9\) and that of D. L. Wallace (1958) yileded value of C to be \(\le2.05\).

In practice, the value of C does not matter a lot. The useful thing to note is that the order of decay of the distance between the cdf’s, which decays faster than \(O(\frac{1}{\sqrt{n}})\).

Reference

\[Feller, W. (1971). An\; Introduction \;to \;Probability \;Theory \;and\] \[\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;Its\; Applications,\; Vol. \;2,\; Wiley,\; New \;York\]

Thank You!