Covariance

The covariance between two variables \(x\) and \(y\) is:

\[Cov(x,y) =\frac{1}{n-1} \sum_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y})\] Where:

  • \(x_i =\) the i-th data value of \(x\)
  • \(\bar{x} =\) the mean of \(x\)
  • \(y_i =\) the i-th data value of \(y\)
  • \(\bar{y} =\) the mean of \(y\)
  • \(n =\) the number of data values

  1. Find the covariance of between \(x\) and \(y\) of the points from HW1: \[(0,8), (1,5), (2,7), (3,4)\]

  2. Create data sets with at least three points with covariance:

    1. 1
    2. 0
    3. -2

Correlation

The correlation between two variables \(x\) and \(y\) is:

\[R = Cor(x,y)=\frac{Cov(x,y)}{s_x s_y}\]

Where \(s_x\) and \(s_y\) are the standard deviation of \(x\) and \(y\)

  • \(s_x =\sqrt{ \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})}\)
  • \(s_y =\sqrt{ \frac{1}{n-1}\sum_{i=1}^n (y_i-\bar{y})}\)

  1. Calculate the correlations of \(x\) an \(y\) from #1 and 2.

  2. Show \(\frac{s_y}{s_x}R\) is equal to the slope you found in part 2 of Written HW 1.

  3. Show \(\bar{y}-(\frac{s_y}{s_x}R)\bar{x}\) is equal to the y-intercept you found in part 2 of Written HW 1.

Generalizing

  1. Show that the slope of the least squares line can always be estimated by: \[\beta_1=\frac{\sum_i^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_i^n(x_i-\bar{x})^2}=\frac{s_y}{s_x}R\]

  2. Show that, given \(\beta_1\), the y-intercept, \(\beta_0\), of the least squares line can be found by: \[\beta_0=\bar{y}-\beta_1 \bar{x}\]