We know that according to Central limit Theorem, for a sequence of i.i.d random variables, the sample mean, after centering by its mean and scaling by its variance, converges in distribution to the standard normal random variable.
The Berry-Esseen Theorem provides us with an upper bound for the maximum possible distance between the cdf of the distribution of the sample mean and that of the standard normal distribution for each sample size n.
The significance of this theorem lies in when \[{n\to\infty}\] It actually helps us in identifying the rate at which the cdf converges by providing us with an upper bound to the distances between the cdf of the sample mean and the standard normal cdf for all values.
Suppose,\(\;X_1,\; X_2, \ldots \;are\;i.i.d \;random\; variables\) with \(\mathbb{E}[X_1] = \mu, \hspace{0.1cm} \mathbb{E}[(X_1-\mu)^2] = \sigma^2 ,\hspace{0.1cm} \mathbb{E}[|X_1|^3] < \infty\)
Let \(\;\bar X_n = \frac{1}{n}\times\sum_{i=1}^n X_i\)
Then,
\[\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|\mathbb P(\frac{\sqrt n(\bar X_n - \mu)}{\sigma}\le t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1 - \mu|^3}{\sigma^3}\]
where, the constant C > 0 is a universal constant that does not depend on the distribution of \(X_1,X_2,\ldots\)
Suppose that F is a distribution function and G is a function such that
\(G(-\infty)=0,\) \(\hspace{0.1cm}G(\infty)=1,\) \(\hspace{0.1cm}\)and\(\hspace{0.2cm}\underset{x\in\mathbb R}{sup}\hspace{0.1cm}|G'(x)|\le m<\infty.\)
Let, \(\varphi(t)=\int_{-\infty}^{\infty}e^{itx}\;dF(x), \hspace{0.3cm}\gamma(t)=\int_{-\infty}^{\infty}e^{itx}\;dG(x)\)
Then, for every \(T\hspace{0.1cm}>\hspace{0.1cm}0,\)
\(\hspace{0.2cm}\)\({\underset{x\in\mathbb R}{sup}\hspace{0.1cm}|F(x)-G(x)|\le \frac{1}{\pi}\int_{-T}^{T}|\frac{\varphi(t)-\gamma(t)}{t}|dt\hspace{0.1cm}+\hspace{0.1cm}\frac{24m}{\pi T} }\)
For proof, one can refer to Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2, Wiley, New York
Firstly, we consider the sequence of i.i.d random variables \(\{{X_n}\}_{n\geq1}\) with \(\mathbb{E}[X_1]\;=\;0,\;\;Var[X_1]\;=\;\mathbb{E}[X_1^2]\;=\;\sigma^2\;\;\&\;\;\mathbb{E}|X_1|^3=\rho<\infty.\)
Let, \(F_n\) be the cdf of \(\frac{\sqrt{n}\bar{X_n}}{\sigma}\;=\;\frac{X_1\;+\;X_2\;+\;...\;+\;X_n}{\sigma\sqrt{n}},\)
\(\;\;\;\;\;\;\varphi_n\) be the characteristic function (cf) of \(\frac{\sqrt{n}\bar{X_n}}{\sigma}\;,\)
\(\;\;\;\;\;\;\varphi\) be the cf of \(X_1.\)
\(\therefore\;\varphi_n(t)\;=\;\mathbb{E}(e^{it\frac{X_1\;+\;X_2\;+\;...\;+\;X_n}{\sigma\sqrt{n}}})\;=\;\mathbb{E}^n(e^{it\frac{X_1}{\sigma\sqrt{n}}})\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\;\varphi^n(\frac{t}{\sigma\sqrt{n}})\)
\(\&\;\;F_n(t)\;=\;\mathbb P(\frac{\sqrt n(\bar X_n)}{\sigma}\le t)\)
Hence, now we are to show :
\(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|F_n(t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1|^3}{\sigma^3}\)
Now, from the lemma, here we take \(F\;=\;F_n,\;\;G\;=\;\Phi\)
and for T, we choose
\(\;\;\;\;\;\;\;\;\;\;\;T\;=\;\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\le\frac{4}{3}\sqrt{n}\)
\(\therefore\;using\;the\;lemma,\)
\(\;\;\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\int_{-T}^{T}|\frac{\varphi_n(u)-e^{-\frac{1}{2}x^2}}{u}|\;du\;+\;\frac{24m}{T}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\;\int_{-T}^{T}|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})-e^{-\frac{1}{2}x^2}}{u}|\;du\;+\;\frac{24m}{T}\;\;.....(4)\)
Now, we are going to apply inequality (1) on the integrand in RHS of (4).
Hence, taking \(\;\alpha=\varphi(\frac{u}{\sigma\sqrt{n}})\;\&\;\beta=e^{-\frac{1}{2n}x^2},\) we would at first find a \(\;\gamma\) such that \(|\alpha|\le\gamma,\;|\beta|\le\gamma.\)
Now, \(\;|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|=|\int^{\infty}_{-\infty}(e^{itx}-1-itx+\frac{1}{2}t^2x^2)\;dF(x)|\)
\(\implies|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|\le\int^{\infty}_{-\infty}|e^{itx}-1-itx+\frac{1}{2}t^2x^2|\;dF(x)\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le\int^{\infty}_{-\infty}\frac{|tx|^3}{3!}\;dF(x)\;\;\;\;\;\;\)[using inequality (2)]
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=|\frac{t^3}{6}|\int^{\infty}_{-\infty}|x|^3\;dF(x)=\frac{|t|^3}{6}\rho\)
Thus we got, \(|\varphi(t)-1+\frac{1}{2}\sigma^2t^2|\le\frac{|t|^3}{6}\rho\;\;.......(5)\)
Now, in the integrand in the inequality (4), we have \(|u|\le T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\le\frac{4}{3}\sqrt{n}\)
\(\implies\frac{1}{2}|\frac{u}{\sigma\sqrt{n}}|^2\;\sigma^2\le\frac{1}{2}\frac{16n}{9n}=\frac{8}{9}<1.\)
Hence, in that integrand, if \(t\) denotes the quantity inside the function \(\varphi\), then \(\frac{1}{2}t^2\sigma^2<1.\)
\(\therefore\;in\;(5),\;\)considering the region of \(t\) where \(\frac{1}{2}t^2\sigma^2<1,\) we have,
\(|\varphi(t)|-(1-\frac{1}{2}\sigma^2t^2)\le\frac{|t|^3}{6}\rho\)
\(\implies |\varphi(t)|\le 1-\frac{1}{2}\sigma^2t^2+\frac{|t|^3}{6}\rho\;......(6)\)
Thus, for the function values \(\varphi(\frac{u}{\sigma\sqrt{n}}),\;\)we have the bound of its absolute value as :
\(|\varphi(\frac{u}{\sigma\sqrt{n}})|\le1-\frac{1}{2}\sigma^2\frac{u^2}{\sigma^2n}+\frac{1}{6}\rho\frac{|u|^3}{\sigma^3n^{\frac{3}{2}}}\)
Using \(|u|\le T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n},\) in the last term of RHS of the above inequation, we have,
\(|\varphi(\frac{u}{\sigma\sqrt{n}})|\le1-\frac{5}{18}\frac{u^2}{n}\le e^{-\frac{5}{18}\frac{u^2}{n}}\;.........(7)\)
Now for \(\sqrt{n}\le3,\) we have, \(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|F_n(t) - \Phi (t)| \le1\le\frac{C}{\sqrt n}\frac{\mathbb{E}|X_1|^3}{\sigma^3},\) taking \(C=3\) and using the moment inequality \(\frac{\rho}{\sigma^3}\geq1.\)
Hence, we have the inequality of the theorem holding for \(n\le9\) taking \(C=3\). Thus for any \(C=k\) if we can show that the inequality holds for \(n\geq10\), taking \(C=max\{{3,\;k}\}\), we will have our theorem for a common C.
So, now we will consider for \(n\ge10.\)
\(\therefore from\;(7),\;|\varphi(\frac{u}{\sigma\sqrt{n}})|^{n-1}\le e^{-\frac{5}{18}u^2(1-\frac{1}{n})}\le e^{-\frac{5}{18}u^2\frac{9}{10}}=e^{-\frac{1}{4}u^2}\)
\(\;\;\implies\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;|\varphi(\frac{u}{\sigma\sqrt{n}})|\le e^{-\frac{1}{4(n-1)}u^2}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;.......(8)\)
Now, for \(n\geq2,\) \(e^{-\frac{1}{2n}u^2}\le e^{-\frac{1}{4(n-1)}u^2}\)
Hence, as we were looking for a \(\gamma\) for \(\alpha=\varphi(\frac{u}{\sigma\sqrt{n}})\;\&\;\beta=e^{-\frac{1}{2n}u^2}\)
we can take \(\gamma=e^{-\frac{1}{4(n-1)}u^2}\)
\(\therefore \; applying\;inequality\;(1),\;we\;have,\)
\(|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}u^2}}{u}|\le n\frac{|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|}{|u|}\; e^{-\frac{1}{4}u^2}\;\;.....(9)\)
Now, we are going to bound \(n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|.\)
\(n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}\frac{u^2}{n}}|\le n\;|\varphi(\frac{u}{\sigma\sqrt{n}})\;-\;1\;+\;\frac{u^2}{2n}|\;+\;n\;|1\;-\;\frac{u^2}{2n}\;+\;e^{-\frac{1}{2}\frac{u^2}{n}}|\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le n\;\frac{\rho}{6\sigma^3n\sqrt{n}}|u|^3\;+\;\frac{n}{2}(\frac{u^2}{2n})^2\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;[using\;inequality\;(3)\;\&\;(5)]\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\frac{\rho}{6\sigma^3\sqrt{n}}|u|^3\;+\;\frac{1}{8n}u^4\;\;....(10)\)
\(From\;(9)\;\&\;(10),\;|\frac{\varphi^n(\frac{u}{\sigma\sqrt{n}})\;-\;e^{-\frac{1}{2}u^2}}{u}|\le(\frac{\rho}{6\sigma^3\sqrt{n}}u^2\;+\;\frac{1}{8n}|u|^3)\;e^{-\frac{1}{4}u^2}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;.........(11)\)
Using (4) & (11), we have,
\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\int_{-T}^{T}(\frac{\rho}{6\sigma^3\sqrt{n}}u^2\;+\;\frac{1}{8n}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)
Now, putting the value of \(\rho\) from \(T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n}\) in the first term of the integrand
\(\&\;\frac{1}{8n}\le\frac{1}{18T}\) in the second term of the integrand, we ultimately have
\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\frac{1}{T}\int_{-T}^{T}(\frac{2}{9}u^2\;+\;\frac{1}{18}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\le\frac{1}{T}\int_{-\infty}^{\infty}(\frac{2}{9}u^2\;+\;\frac{1}{18}|u|^3)\;e^{-\frac{1}{4}u^2}\;du\;+\;\frac{24m}{T}\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;=\frac{1}{T}(\frac{4}{9}\sqrt{\pi}\;+\;\frac{8}{9}\;+\;\frac{24m}{T})\)
Now, we know from the lemma that \(m\) is an upper bound for the \(1^{st}\) derivative of \(\Phi\).
\(\therefore\) taking \(m=\frac{1}{\sqrt{2\pi}},\) we get,
\(\;\;\;\pi\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le\frac{1}{T}(\frac{4}{9}\sqrt{\pi}\;+\;\frac{8}{9}\;+\;\frac{24m}{T})\)
Now, putting \(T=\frac{4}{3}\frac{\sigma^3}{\rho}\sqrt{n},\) we have,
\(\;\;\;\underset{t\in\mathbb{R}}{sup}\;|F_n(t)-\Phi(t)|\le \frac{C}{\sqrt{n}}\frac{\rho}{\sigma^3},\;\;where\;we\;found\;C\;to\;be\;=\;3.58.\)
Now, taking sequence of random variables \(\{X_n\}_{n\geq1}\) with \(\mathbb{E}[X_1]=\mu\), taking the sequence \(\{Y_n\}_{n\geq1}\;with\;Y_n=X_n-\mu\;\;\forall n\) and then proceeding in the same way we will have
\(\underset{t\in\mathbb R}{sup}\hspace{0.2cm}|\mathbb P(\frac{\sqrt n(\bar X_n - \mu)}{\sigma}\le t) - \Phi (t)| \le \frac{C}{\sqrt n}\frac{\mathbb{E}|X_1 - \mu|^3}{\sigma^3}.\)
Hence, our proof is complete.
According to our ways of proving, we found the universal constant C to be approximately 3.58. But there are other ways of proving the theorem where the value of C can be improved. Unpublished calculations of Esseen (1956) yielded value of C to be \(\le2.9\) and that of D. L. Wallace (1958) yileded value of C to be \(\le2.05\).
In practice, the value of C does not matter a lot. The useful thing to note is that the order of decay of the distance between the cdf’s, which decays faster than \(O(\frac{1}{\sqrt{n}})\).
\[Feller, W. (1971). An\; Introduction \;to \;Probability \;Theory \;and\] \[\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;Its\; Applications,\; Vol. \;2,\; Wiley,\; New \;York\]