Stats cheat sheet

Mingyang Zheng

2017-07-07

Standard Deviation

Standard Deviation

Standard Deviation

\(\begin{aligned}\\ S = \sqrt{\frac{\sum(X-\bar{X})^2}{n-1}} \end{aligned}\)

* SD is also used to represent Standard Deviation.


Standard Error

Standard Error: It is the actual or estimated standard deviation of the sampling distribution of the sample mean.

\(\begin{aligned}\\ SE = \frac{S}{\sqrt{n}} \end{aligned}\)


One Sample T-test

\(\begin{aligned} t &= \frac{\bar{X} - \mu}{SE_{X}} \\ &=\frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}} \end{aligned}\)

* \(\mu\) is the population mean, and \(\bar{X}\) is the sample mean.

\(~\)

Two Sample T-test

When variances are equal:

\(\begin{aligned}\\ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{S^2(\frac{1}{n_{1}}+\frac{1}{n_{2}})}} \end{aligned}\)

Where \(S^2\):

\(\begin{aligned}\\ S^2 = \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})^2 + \sum_{j=1}^{n}(X_{j} - \bar(X))^2}{n_{1}+n_{2} - 2} \end{aligned}\)


Confidence Interval

For a known standard deviation:

\(\begin{aligned}\\ (\bar{X} - z * \frac{\sigma}{\sqrt{n}}, \bar{X} + z * \frac{\sigma}{\sqrt{n}}) \end{aligned}\)

For an unknow standard deviation:

\(\begin{aligned}\\ (\bar{X} - t * \frac{s}{\sqrt{n}}, \bar{X} + t * \frac{s}{\sqrt{n}}) \end{aligned}\)


Simple Regression

\(\hat{y} = \beta_{1}{x} + \beta_{0}\)

\(y = \beta_{1}{x}+\beta_{0} + \epsilon\)

\(~\)

\(\begin{aligned} \hat{\beta_{1}} &=\frac{\sum_{i=1}^{n}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}\\ &= \frac{cov_{(x,y)}}{var_{(x)}} \\ &= cor_{(X,Y)}\frac{sd(X)}{sd(Y)} \end{aligned}\)

*The variance \(var_(X)\) refers to the spread of the data set

\(\beta_{0} = \hat{y} - \beta_{1}{x}\)


Sum of Squares

Regression sum of squares: Exaplained by the model

\(SS_{R} = \sum_{i = 1}^{n}({\hat{Y}_{i} - \bar{Y}})^2\)

Sum of square errors: Unexplained by the model

\(SS_{E} = \sum_{i = 1}^{n}({{Y}_{i} - \hat{Y}})^2\)

Total Sum of Square

\(SS_{T} = \sum_{i = 1}^{n}({{Y}_{i} - \bar{Y}})^2\)

* In some texts, the abbreviations \(SS_{R}\) and \(SS_{E}\) have the opposite meaning: \(SS_{R}\) stands for the residual sum of squares (which then refers to the sum of squared errors in the upper example) and \(SS_{E}\) stands for the explained sum of squares (another name for the regression sum of squares).


R2

\(R^2\): The proportion of variability in a data set that is accounted for by a statistical model. (e.g. Our model explained 15% of the variation…)

\(\begin{aligned} R^2 = \frac{SS_{R}}{SS_{T}} = \frac{\sum_{i =1}^{n}(\hat{Y}_{i} - \bar{Y})^2}{\sum_{i=1}^{n}(Y_{i} - \bar{Y})^2} \end{aligned}\)


Covariance and Correlation

Covariance or \(cov_{(X, Y)}\): is basically a number that reflects the degree to which two variables vary together

\(\begin{aligned} cov_{(X, Y)} = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y})}{n-1} \end{aligned}\)

Correlation or \(cor_{(X,Y)}\):is a scaled version of covariance. We can see it as standardized covariance.

\(\begin{aligned} cor_{(X,Y)} = \frac{cov_{(X, Y)}}{sd(X)sd(Y)} \end{aligned}\)

\(cor^2_{(X,Y)} = R^2\)


Prediction

The standard error for estimation of \(\mu_{x}\) at X is: \(~\)

\(\begin{aligned}\\ SE(\hat\mu_{x}) = \hat\sigma\sqrt{\frac{1}{n} + \frac{(X - \bar{X})^2}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}} \end{aligned}\)

where
\(\begin{aligned}\\ \hat\sigma = \sqrt{\frac{1}{n-2}\sum_{i=1}^{n}(Y_{i}-\hat{Y_{i}})^2} \end{aligned}\)

The standard error for prediction of y at x is

\(\begin{aligned}\\ SE(\hat\mu_{x}) = \hat\sigma\sqrt{1 + \frac{1}{n} + \frac{(X - \bar{X})^2}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}} \end{aligned}\)