Regression Analysis Midterm Exam

Consider the data set given below

x<-c(0.22, -2.54, 0.52, 0.75, 25)
# and weights given by
w<-c(2, 1, 3, 1, 2)
# Give the value of mu that minimizes the least squares equation

mu <- sum(w * x) / sum(w)
print(mu)

## [1] 5.578889

Consider the following data set

x1<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)
y1<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)

# Fit the regression through the origin and get the slope treating y as the outcome and x is the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.)

lm(y1~x1-1 )

## 
## Call:
## lm(formula = y1 ~ x1 - 1)
## 
## Coefficients:
##    x1  
## 1.151

Do data(mtcars) from the datasets package and fit the regression model with mpg as the outcome and drat (Rear axle ratio) as the predictor. Give the slope coefficient.

data(mtcars)

three<-lm(mtcars$mpg ~ mtcars$drat)
summary(three)

## 
## Call:
## lm(formula = mtcars$mpg ~ mtcars$drat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0775 -2.6803 -0.2095  2.2976  9.0225 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -7.525      5.477  -1.374     0.18    
## mtcars$drat    7.678      1.507   5.096 1.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.485 on 30 degrees of freedom
## Multiple R-squared:  0.464,  Adjusted R-squared:  0.4461 
## F-statistic: 25.97 on 1 and 30 DF,  p-value: 1.776e-05

Refer to question 3. Test the hypothesis of no linear relationship between rear axle ration and miles per gallon.

cor(mtcars$drat,mtcars$mpg)

## [1] 0.6811719

As we can see in the result, it tells us that the linear relationship between the rear axle ration and miles per gallon has a positive linear relationship.

Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one third that of the outcome. The correlation between the two variables is 0.7. What value would the slope coefficient for the regression model with Y as the outcome and X as the predictor?

\[\hat \beta = Cor(Y, X) \frac{Sd(Y)}{Sd(X)}\]

where \(\hat \beta_1\) is the slope related to the correlation, \(Cor(Y,X)\) is the correlation between (Y) and (X).

Given that \(Cor(Y,X)\) = 0.7 , and the standard deviation is \(SD(X)=\frac{1}{3} SD(Y)\) , so\[\hat \beta = 0.7 \frac{Sd(Y)}{\frac{1}{3} SD(Y)} \]

\[\hat \beta = 0.7 (3) \]

\[\hat \beta = 2.1 \]

Therefore, the value of the slope coefficient is 2.1
You ask a collection of husbands and wives to guess how many jellybeans are in a jar. The correlation is 0.6. The standard deviation for the husbands is 14 beans while the standard deviation for wives is 10 beans. Assume that the data were centered so that 0 is the mean for each. The centered guess for a husband was 40 beans (above the mean). What would be your best estimate of the wife’s guess?

The \(Cor(Y,X)\) = 0.6 , the standard deviation for the husband \(Sd(HB)\) = 14, while \(Sd(WF)\) = 10, first we need to get the slope. \[\hat \beta = 0.6 (\frac{10}{14}) \]

\[\hat \beta = 0.428 \]

Hence, the regression equation would be,

\[Y = 0.428(Sd(HB)) \]

The centered guess for a husband was 40 beans, hence the wife’s guess would be,

\[Y = 0.428(40) = 17.44 \]
Consider the data given by the following

x2 <- c(10.45, 9.45, 12.41, 14.46, 15.26)
# What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?

xnorm <- scale(x2, center = TRUE, scale = TRUE)

norm_meas <-xnorm[1]
print(norm_meas)

## [1] -0.7835272

The value if x were normalized would be -0.7835

Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?

x3<-c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)

y3<-c(2.39, 1.72, 2.55, 1.48, 2.19, 0.59, 2.23, 1.65, 2.49, 1.05)

lm(y3~x3)

## 
## Call:
## lm(formula = y3 ~ x3)
## 
## Coefficients:
## (Intercept)           x3  
##      2.2472      -0.2627

print(coef(lm(y3~x3))[1])

## (Intercept) 
##    2.247175

Consider the data given by

x4 <- c(1.8, 1.47, 1.51, 1.73, 1.36, 1.58, 1.57, 1.85, 1.44, 1.42)

What value minimizes the sum of the squared distances between these points and itself?

print(mean(x4))

## [1] 1.573

Fit a linear regression model to the mtcars dataset with the variable drat as the predictor and the variable mpg as the outcome. Plot the drat (horizontal axis) versus the residuals (vertical axis).

looking back at #3, I have already set a data with drat as the predictor, and mpg as outcome.

residuals(three)

##           1           2           3           4           5           6 
## -1.42048871 -1.42048871  0.76342292  5.27566203  2.03818575  4.43269646 
##           7           8           9          10          11          12 
## -2.82250821  3.59194014  0.22594664 -3.37405336 -4.77405336  0.35244435 
##          13          14          15          16          17          18 
##  1.25244435 -0.84755565 -4.57260308 -5.11007936 -2.57607286  8.59742943 
##          19          20          21          22          23          24 
##  0.07093171  9.02247686  0.61515782  1.83269646 -1.46181425 -7.81518916 
##          25          26          27          28          29          30 
##  3.07566203  3.49742943 -0.48995198  8.97768153 -9.07752314 -0.57058358 
##          31          32 
## -4.65632497 -2.63291755

plot(mtcars$drat, residuals(three), main = "Residuals vs drat", xlab = "drat", ylab = "Residuals")
abline(h = 0, col = "red", lty = 2)

Refer to question 10. Directly estimate the residual variance and compare this estimate to the output of lm.

n <- length(mtcars$mpg)
p <- length(coef(three))
residual_variance_direct <- sum(residuals(three)^2) / (n - p)

cat("Residual Variance (direct estimate):", residual_variance_direct, "\n")

## Residual Variance (direct estimate): 20.11889

cat("Residual Variance (lm output):", summary(three)$sigma^2, "\n")

## Residual Variance (lm output): 20.11889

Refer to question 10. Give the R squared for this model.

summary(three)$r.squared

## [1] 0.4639952

Load the mtcars dataset. Fit a linear regression with miles per gallon as the outcome and drat as the predictor. Plot horsepower versus the residuals.

plot(mtcars$hp, residuals(three), main = "Residuals vs Horsepower", xlab = "Horsepwer", ylab = "Residuals")
abline(h = 0, col = "red", lty = 2)

Refer to question 13. Directly estimate the residual variance and compare this estimate to the output of lm.

n1 <- length(mtcars$hp)
p1 <- length(coef(three))
residual_variance_direct <- sum(residuals(three)^2) / (n1 - p1)
cat("Residual Variance (direct estimate):", residual_variance_direct, "\n")

## Residual Variance (direct estimate): 20.11889

cat("Residual Variance (lm output):", summary(three)$sigma^2, "\n")

## Residual Variance (lm output): 20.11889

Refer to question 13. Give the R squared for this model.

summary(three)$r.squared

## [1] 0.4639952

Regression Analysis Midterm Exam

Bienuel M. Esmeralda

November 14, 2023