1. Consider the data set given below

x <- c(0.18, -1.54, 0.42, 0.95) And weights given by w <- c(2, 1, 3, 1) Give the value of ?? that minimizes the least squares equation ???ni=1wi(xi?????)2

x <- c(0.18, -1.54, 0.42, 0.95)
w <- c(2, 1, 3, 1)
mu <- sum(x*w)/sum(w)
mu
## [1] 0.1471429
  1. Consider the following data set

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42) y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05) Fit the regression through the origin and get the slope treating y as the outcome and x as the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.) /*** or can use coefficients(lm(y~x -1))**/

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

coefficients(lm(y~x +0))
##         x 
## 0.8262517
  1. Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and weight as the predictor. Give the slope coefficient.
data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
coefficients(lm( mpg~wt, data = mtcars))
## (Intercept)          wt 
##   37.285126   -5.344472
  1. Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one half that of the outcome. The correlation between the two variables is .5. What value would the slope coefficient for the regression model with Y as the outcome and X as the predictor?

sd_x = sd_y/2

Cor(y,x) = .5

Formula Cov(y,x) = Cor(y,x)* (sd_y / sd_x). we need to find Cov(y,x)

Cov(y,x) = .5*(1/(1/2))

Cov(y,x) = 1

  1. Students were given two hard tests and scores were normalized to have empirical mean 0 and variance 1. The correlation between the scores on the two tests was 0.4. What would be the expected score on Quiz 2 for a student who had a normalized score of 1.5 on Quiz 1?

Since scores are normalized, we can compute the expected score on Quiz 2 = 1.5*.4 = .6

  1. Consider the data given by the following

x <- c(8.58, 10.46, 9.01, 9.64, 8.86) What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?

x <- c(8.58, 10.46, 9.01, 9.64, 8.86)

mean <- mean(x)
sd <- sd(x)
xnorm <- (x-mean)/sd
xnorm[1]
## [1] -0.9718658
  1. Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42) y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

(lm(y~x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##       1.567       -1.713
  1. You know that both the predictor and response have mean 0. What can be said about the intercept when you fit a linear regression? Answer : Nothing about the intercept can be said from the information given.

  2. Consider the data given by x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42) What value minimizes the sum of the squared distances between these points and itself?

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)

sum(x)/length(x)
## [1] 0.573
  1. Let the slope having fit Y as the outcome and X as the predictor be denoted as Beta1. Let the slope from fitting X as the outcome and Y as the predictor be denoted as Gamma1. Suppose that you divide Beta1 by Gamma1; in other words consider Beta1/Gamma1. What is this ratio always equal to?

Cov(Y,X) = Beta1 Cov(X,Y) = Gamma1

Beta1 = Cor(Y,X) * (SD_Y/SD_X) Gamma1 = Cor(X,Y) * (SD_X/SD_Y)

Beta1 / Gamma1 = (Cor(Y,X) * (SD_Y/SD_X)) / (Cor(X,Y) * (SD_X/SD_Y))

Since Cor(Y,X) = Cor(X,Y)

Beta1 / Gamma1 = (SD_Y/SD_X) / (SD_X/SD_Y)

Beta1 / Gamma1 = SD_Y2/SD_X2

we know SD = (var/sqrt(n))

Beta1 / Gamma1 = var_y/var_X