December 3, 2023
Definition:
Regression is the method used to predict values of one numerical variable (response) from values of another (explanatory).
Note: Regression can be done on data from an observational or experimental study.
We will discuss 3 types:
Definition:
Linear regression draws a straight line through the data to predict the response variable from the explanatory variable.
Slope determines rate of change of response with explanatory - humans lose 0.076 units of genetic diversity with every 10,000 km from East Africa.
Definition: For the
population , the regression line is \[Y = \alpha + \beta X,\] where \(\alpha\) (theintercept ) and \(\beta\) (theslope ) are population parameters.
Definition: For a
sample , the regression line is \[Y = a + b X,\] where \(a\) and \(b\) are estimates of \(\alpha\) and \(\beta\), respectively.
Note: At each value of \(X\), there is a population of \(Y\)-values whose mean lies on the true regression line (this is the linear assumption).
Technically, the linear regression equation is
\[\mu_{Y\, |\, X=X^{*}} = \alpha + \beta X^{*},\]
were \(\mu_{Y\, |\, X=X^{*}}\) is the mean of \(Y\) in the sub-population with \(X=X^{*}\) (called predicted values).
You are predicting the mean of Y given X.
Method of least squares
Definition: The
least-squares regression line is the line for which the sum of all thesquared deviations in \(Y\) is smallest.
The method of least-squares leads to the following estimates for intercept and slope:
\[ \begin{align} b & = \frac{\sum_{i}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sum_{i}(X_{i}-\bar{X})^2} \\ a & = \bar{Y}-b\bar{X} \end{align} \]
Note:
\[b = \frac{\mathrm{Covariance(X,Y)}}{s_{X}^2} = r\frac{s_{Y}}{s_{X}},\]
where \(r\) is the correlation coefficient!