Standard Deviation
Standard Deviation
\(\begin{aligned}\\ S = \sqrt{\frac{\sum(X-\bar{X})^2}{n-1}} \end{aligned}\)
* SD is also used to represent Standard Deviation.
Standard Error
Standard Error: It is the actual or estimated standard deviation of the sampling distribution of the sample mean.
\(\begin{aligned}\\ SE = \frac{S}{\sqrt{n}} \end{aligned}\)
One Sample T-test
\(\begin{aligned} t &= \frac{\bar{X} - \mu}{SE_{X}} \\ &=\frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}} \end{aligned}\)
* \(\mu\) is the population mean, and \(\bar{X}\) is the sample mean.
\(~\)
Two Sample T-test
When variances are equal:
\(\begin{aligned}\\ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{S^2(\frac{1}{n_{1}}+\frac{1}{n_{2}})}} \end{aligned}\)
Where \(S^2\):
\(\begin{aligned}\\ S^2 = \frac{\sum_{i=1}^{n}(X_{i}-\bar{X})^2 + \sum_{j=1}^{n}(X_{j} - \bar(X))^2}{n_{1}+n_{2} - 2} \end{aligned}\)
Confidence Interval
For a known standard deviation:
\(\begin{aligned}\\ (\bar{X} - z * \frac{\sigma}{\sqrt{n}}, \bar{X} + z * \frac{\sigma}{\sqrt{n}}) \end{aligned}\)
For an unknow standard deviation:
\(\begin{aligned}\\ (\bar{X} - t * \frac{s}{\sqrt{n}}, \bar{X} + t * \frac{s}{\sqrt{n}}) \end{aligned}\)
Simple Regression
\(\hat{y} = \beta_{1}{x} + \beta_{0}\)
\(y = \beta_{1}{x}+\beta_{0} + \epsilon\)
\(~\)
\(\begin{aligned} \hat{\beta_{1}} &=\frac{\sum_{i=1}^{n}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}\\ &= \frac{cov_{(x,y)}}{var_{(x)}} \\ &= cor_{(X,Y)}\frac{sd(X)}{sd(Y)} \end{aligned}\)
*The variance \(var_(X)\) refers to the spread of the data set
\(\beta_{0} = \hat{y} - \beta_{1}{x}\)
Sum of Squares
Regression sum of squares: Exaplained by the model
\(SS_{R} = \sum_{i = 1}^{n}({\hat{Y}_{i} - \bar{Y}})^2\)
Sum of square errors: Unexplained by the model
\(SS_{E} = \sum_{i = 1}^{n}({{Y}_{i} - \hat{Y}})^2\)
Total Sum of Square
\(SS_{T} = \sum_{i = 1}^{n}({{Y}_{i} - \bar{Y}})^2\)
* In some texts, the abbreviations \(SS_{R}\) and \(SS_{E}\) have the opposite meaning: \(SS_{R}\) stands for the residual sum of squares (which then refers to the sum of squared errors in the upper example) and \(SS_{E}\) stands for the explained sum of squares (another name for the regression sum of squares).
R2
\(R^2\): The proportion of variability in a data set that is accounted for by a statistical model. (e.g. Our model explained 15% of the variation…)
\(\begin{aligned} R^2 = \frac{SS_{R}}{SS_{T}} = \frac{\sum_{i =1}^{n}(\hat{Y}_{i} - \bar{Y})^2}{\sum_{i=1}^{n}(Y_{i} - \bar{Y})^2} \end{aligned}\)
Covariance and Correlation
Covariance or \(cov_{(X, Y)}\): is basically a number that reflects the degree to which two variables vary together
\(\begin{aligned} cov_{(X, Y)} = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y})}{n-1} \end{aligned}\)
Correlation or \(cor_{(X,Y)}\):is a scaled version of covariance. We can see it as standardized covariance.
\(\begin{aligned} cor_{(X,Y)} = \frac{cov_{(X, Y)}}{sd(X)sd(Y)} \end{aligned}\)
\(cor^2_{(X,Y)} = R^2\)
Prediction
The standard error for estimation of \(\mu_{x}\) at X is: \(~\)
\(\begin{aligned}\\ SE(\hat\mu_{x}) = \hat\sigma\sqrt{\frac{1}{n} + \frac{(X - \bar{X})^2}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}} \end{aligned}\)
where
\(\begin{aligned}\\ \hat\sigma = \sqrt{\frac{1}{n-2}\sum_{i=1}^{n}(Y_{i}-\hat{Y_{i}})^2} \end{aligned}\)
The standard error for prediction of y at x is
\(\begin{aligned}\\ SE(\hat\mu_{x}) = \hat\sigma\sqrt{1 + \frac{1}{n} + \frac{(X - \bar{X})^2}{\sum_{i=1}^{n}(X_{i}-\bar{X})^2}} \end{aligned}\)