GE193 - Course Introduction

Data cleansing

Dr Robert Batzinger
Instructor Emeritus

2/12/23

1 Descriptive Statistics

1.1 Average

\[\bar x = \frac{\sum x_i}{n}\]

1.2 Standard deviation

\[\displaystyle\sigma = \sqrt{\frac{\sum (x - \bar x)^2}{n-1}}=\sqrt{\frac{\sum x^2 - \left(\sum x/n\right)^2}{n-1}}\] \[\sqrt{\frac{4085.959}{999}}=\sqrt{\frac{497946.5 - 403860.5}{999}}=2.0223874\]

1.3 t-test

\[\displaystyle{t=\frac{difference}{standard\ error}}\]

1.4 t test - One sample

\[{\displaystyle t={\frac {{\bar {x}}-\mu _{0}}{ \frac{s}{\sqrt {n}}}} = \frac{\Delta x}{\frac{\sigma}{\sqrt{n}}}}\]

\[df = n - 1\]

1.5 2 Sample of equal size equal variance and size

\[{\displaystyle t={\frac {{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}{\sqrt {\frac {2}{n}}}}}}\]

\[{\displaystyle s_{p}={\sqrt {\frac {s_{X_{1}}^{2}+s_{X_{2}}^{2}}{2}}}}\] \[df= 2n - 2\]

1.6 2 Sample t-test

\[t = \frac{\bar x_1 - \bar x_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 +(n_2-1)s_2^2}{n_1+n_2-2}}\]

\[df = n_1 + n_2 -2\]

1.7 Welches t-test

\[t = \frac{\bar x_1 - \bar x_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

\[{\displaystyle \mathrm {d.f.} ={\frac {\left({\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {\left(s_{1}^{2}/n_{1}\right)^{2}}{n_{1}-1}}+{\frac {\left(s_{2}^{2}/n_{2}\right)^{2}}{n_{2}-1}}}}} \]

2 CHI SQUARE TEST

\[\chi ^{2}=\sum _{{i=1}}^{{r}}\sum _{{j=1}}^{{c}}{(O_{{i,j}}-E_{{i,j}})^{2} \over E_{{i,j}}}\]

\[df = r + c - 2\]