1. Consider the data set given below
x<-c(0.22, -2.54, 0.52, 0.75, 25)
# and weights given by
w<-c(2, 1, 3, 1, 2)
# Give the value of mu that minimizes the least squares equation

answer:

weighted_mean <- sum(w * x) / sum(w)
weighted_mean
## [1] 5.578889
  1. Consider the following data set
x<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)
y<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)

# Fit the regression through the origin and get the slope treating y as the outcome and x is the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.)

answer:

regression_model <- lm(y ~ 0 + x)
slope <- coef(regression_model)
slope
##        x 
## 1.151408
  1. Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and drat (Rear axle ratio) as the predictor. Give the slope coefficient.

Answer:

data("mtcars")
DATA<- lm(mpg ~ drat, mtcars)
DATA
## 
## Call:
## lm(formula = mpg ~ drat, data = mtcars)
## 
## Coefficients:
## (Intercept)         drat  
##      -7.525        7.678
  1. Refer to questoin 3. Test the hypothesis of no linear relationship between rear axle ration and miles per gallon.

Answer: Hypothesis Ho: The coefficient in front of drat is 0, which implies that there is no linear relationship. Ha:The coefficient in front of drat is nonzero, which implies that there is a linear relationship.

summary(lm(mpg ~ drat, mtcars))$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -7.524618   5.476663 -1.373942 0.1796390847
## drat         7.678233   1.506705  5.096042 0.0000177624

Hence, we reject the null hypothesis tha the coefficient in front og drat is zero since the p-value for drat is less than 0.05 using alpha = 0.05.

  1. Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one third that of the outcome. The correlation between the two variables is 0.7. What value would the slope coefficient for the regression model with Y as the outcome and X as the predictor?

Answer:

Slope <- 0.7 * 1/(1/3)
Slope
## [1] 2.1
  1. You ask a collection of husbands and wives to guess how many jellybeans are in a jar. The correlation is 0.6. The standard deviation for the husbands is 14 beans while the standard deviation for wives is 10 beans. Assume that the data were centered so that 0 is the mean for each. The centered guess for a husband was 40 beans (above the mean). What would be your best estimate of the wife’s guess?

Answer:

Slope<-0.6*(10/14)
Slope
## [1] 0.4285714
WifesGuess<-Slope*40
WifesGuess
## [1] 17.14286
  1. Consider the data given by the following
x <- c(10.45, 9.45, 12.41, 14.46, 15.26)
# What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?

Answer:

mean_x <- mean(x)
sd_x <- sd(x)
normalized_x <- (x - mean_x) / sd_x
cat("Normalized data:", normalized_x, "\n")
## Normalized data: -0.7835272 -1.184103 0.001602305 0.8227837 1.143245

Thus the value of first measurement if x were normalized is -0.7835272

  1. Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?
x<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)

y<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)

Answer:

x<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)

y<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)

FIT <-lm(y ~ x)
FIT
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      2.2472      -0.2627
  1. Consider the data given by
x <- c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)

What value minimizes the sum of the squared distances between these points and itself?

Answer:

Mean<-mean(x)
Mean
## [1] 1.573
  1. Fit a linear regression model to the mtcars dataset with the variable drat as the predictor and the variable mpg as the outcome. Plot the drat (horizontal axis) versus the residuals (vertical axis).

Answer:

library(ggplot2)
fit = lm(mpg~drat, data = mtcars)
temp = mtcars; temp$resid <- resid(fit)
plot<-ggplot(temp, aes(x=drat, y=resid))+geom_hline(yintercept=0, col="black")+geom_point(alpha=0.5, cex=5)
plot

  1. Refer to question 10. Directly estimate the residual variance and compare this estimate to the output of lm.

Answer:

fit=lm(mpg~drat, data = mtcars)
sum(resid(fit))
## [1] 3.275158e-15
sum(resid(fit)^2)/(nrow(mtcars)-2)
## [1] 20.11889
summary(fit)$sigma^2
## [1] 20.11889
  1. Refer to question 10. Give the R squared for this model.

Answer:

fit=lm(mpg~drat, data = mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ drat, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0775 -2.6803 -0.2095  2.2976  9.0225 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -7.525      5.477  -1.374     0.18    
## drat           7.678      1.507   5.096 1.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.485 on 30 degrees of freedom
## Multiple R-squared:  0.464,  Adjusted R-squared:  0.4461 
## F-statistic: 25.97 on 1 and 30 DF,  p-value: 1.776e-05
summary(fit)$r.squared
## [1] 0.4639952
  1. Load the mtcars dataset. Fit a linear regression with miles per gallon as the outcome and drat as the predictor. Plot drat versus the residuals.

Answer:

library(ggplot2)
fit = lm(mpg~drat, data = mtcars)
temp = mtcars; temp$resid <- resid(fit)
plot<-ggplot(temp, aes(x=drat, y=resid))+geom_hline(yintercept=0, col="black")+geom_point(alpha=0.5, cex=5)
plot

  1. Refer to question 13. Directly estimate the residual variance and compare this estimate to the output of lm.

Answer:

sum(resid(fit)^2)/(nrow(mtcars)-2)
## [1] 20.11889
summary(fit)$sigma^2
## [1] 20.11889
  1. Refer to question 13. Give the R squared for this model.

Answer:

fit=lm(mpg~drat, data = mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ drat, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0775 -2.6803 -0.2095  2.2976  9.0225 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -7.525      5.477  -1.374     0.18    
## drat           7.678      1.507   5.096 1.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.485 on 30 degrees of freedom
## Multiple R-squared:  0.464,  Adjusted R-squared:  0.4461 
## F-statistic: 25.97 on 1 and 30 DF,  p-value: 1.776e-05
summary(fit)$r.squared
## [1] 0.4639952