M. Drew LaMar
February 4, 2019
Definition:
Type I error is rejecting a true null hypothesis. The probability of a Type I error is given by \[ \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ true}] = \alpha \]
Definition:
Type II error is failing to reject a false null hypothesis. The probability of a Type II error is given by \[ \mathrm{Pr[Do \ not \ reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] = \beta \]
Definition: The
power of a statistical test (denoted \( 1-\beta \)) is given by \[ \begin{align*} \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] & = 1-\beta \\ & = 1 - \mathrm{Pr[Type \ II \ error]} \end{align*} \]
Statistical power example
https://qubeshub.org/tools/statpowerviz/
Power of a statistical test is a function of
- Significance level \( \alpha \)
- Variability of data
- Sample size
- Effect size
Definition:
Regression is the method used to predict values of one numerical variable (response) from values of another (explanatory).
Note: Regression can be done on data from an observational or experimental study.
We will discuss 3 types:
Definition:
Linear regression draws a straight line through the data to predict the response variable from the explanatory variable.
Definition: For the
population , the regression line is
\[ Y = \alpha + \beta X, \]
where \( \alpha \) (theintercept ) and \( \beta \) (theslope ) are population parameters.
Definition: For a
sample , the regression line is
\[ Y = a + b X, \]
where \( a \) and \( b \) are estimates of \( \alpha \) and \( \beta \), respectively.
Note: At each value of \( X \), there is a population of \( Y \)-values whose mean lies on the true regression line (this is the linear assumption).
Linear regression is a model formulation
Usually (but not always) it is reserved for situations where you assert evidence of causation (e.g. A causes B)
Correlation, in contrast, describes relationships (e.g. A and B are positively correlated)
Variables: For a correlation, our data consist of two numerical variables (continuous or discrete).
Definition: The (linear)
correlation coefficient \( \rho \) measures the strength and direction of the association between two numerical variables in a population.
The linear (Pearson) correlation coefficient measures the tendency of two numerical variables to co-vary in a linear way.
The symbol \( r \) denotes a sample estimate of \( \rho \).