HW 2

#Part II ## 3.7 Excercises ##1,3, and 8

Describe the null hypotheses to which the p-values given in Table 3.4 correspond. Explain what conclusions you can draw based on these p-values. Your explanation should be phrased in terms of sales, TV, radio, and newspaper, rather than in terms of the coefficients of the linear model.

\[ H_{0}: \beta_{1}=\beta_{2}=\beta_{3}=0 \] Based on the non-significant p-value of newspaper, we can say that Newspaper ad do not affect Sales.

Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Level (1 for College and 0 for High School), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Level. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get βˆ0 = 50, βˆ1 = 20, βˆ2 = 0.07, βˆ3 = 35, βˆ4 = 0.01, βˆ5 = −10.

Answer C: For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates. This is true because a college graduate get a 1 value where a HS graduate get a value of 0.

This question involves the use of simple linear regression on the Auto data set.

Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results. Comment on the output. For example:

library(ISLR2)

## Warning: package 'ISLR2' was built under R version 4.1.2

library(tidyverse)

## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'pillar'

## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'

## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'hms'

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.3     ✔ dplyr   1.0.7
## ✔ tidyr   1.1.3     ✔ stringr 1.4.0
## ✔ readr   2.0.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

dim(Auto)

## [1] 392   9

glimpse(Auto)

## Rows: 392
## Columns: 9
## $ mpg          <dbl> 18, 15, 18, 16, 17, 15, 14, 14, 14, 15, 15, 14, 15, 14, 2…
## $ cylinders    <int> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 6, 6, 6, 4, …
## $ displacement <dbl> 307, 350, 318, 304, 302, 429, 454, 440, 455, 390, 383, 34…
## $ horsepower   <int> 130, 165, 150, 150, 140, 198, 220, 215, 225, 190, 170, 16…
## $ weight       <int> 3504, 3693, 3436, 3433, 3449, 4341, 4354, 4312, 4425, 385…
## $ acceleration <dbl> 12.0, 11.5, 11.0, 12.0, 10.5, 10.0, 9.0, 8.5, 10.0, 8.5, …
## $ year         <int> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 7…
## $ origin       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, …
## $ name         <fct> chevrolet chevelle malibu, buick skylark 320, plymouth sa…

df1<-Auto
lg1<- lm(mpg~horsepower, data = df1)
summary(lg1)

## 
## Call:
## lm(formula = mpg ~ horsepower, data = df1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

Is there a relationship between the predictor and the response?
Yes
How strong is the relationship between the predictor and the response?

0.61

Is the relationship between the predictor and the response positive or negative?

Negative

What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?

coef(lg1)

## (Intercept)  horsepower 
##  39.9358610  -0.1578447

pred_mpg<-39.94 + (98*-0158)
print('predicited mpg')

## [1] "predicited mpg"

pred_mpg

## [1] -15444.06

df<-data.frame(horsepower=98) 
predict (lg1 , df, interval = "confidence")

##        fit      lwr      upr
## 1 24.46708 23.97308 24.96108

Plot the response and the predictor. Use the abline() function to display the least squares regression line.

plot(lg1)

abline (lg1)

Use the plot() function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.

plot ( predict (lg1), residuals (lg1))

plot ( predict (lg1), rstudent (lg1))

#Part III

Suppose that the predictive model is given as \(\hat{y} =1.25−0.02×x1+1.7×x4+0.5×x7\)

Calculate ŷ when x1=1, x4=Male and x7=6 where Female is the reference for x4.

y_hat_1= 1.25-0.02*1+1.7*0+0.5*6
print(y_hat_1)

## [1] 4.23

Calculate ŷ when x1=0, x4=Female and x7=4 where Female is the reference for x4.

y_hat= 1.25-0.02*0+1.7*1+0.5*4
print(y_hat)

## [1] 4.95

When y=4.2, calculate the residuals for both ŷ ’s above.

true_y=4.2

resid_1=true_y-y_hat_1
resid_1

## [1] -0.03

resid_2=true_y-y_hat
resid_2

## [1] -0.75

```

HW 2

Arielle King

9/6/2022