HW1

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(MASS)
## Warning: package 'MASS' was built under R version 4.3.3
library(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.3.3
## 
## Attaching package: 'ISLR2'
## The following object is masked from 'package:MASS':
## 
##     Boston
"Exercise 8"
## [1] "Exercise 8"
attach(Auto)
model <- lm(mpg ~ horsepower, data = Auto)

summary(model)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
plot(Auto$horsepower, Auto$mpg)
abline(model, col = "red")

par(mfrow=c(2,2))
plot(model)

"Output interpretation:
i. The output will show whether there is a statistically significant relationship between the predictor (horsepower) and the response (mpg).
ii. The R-squared value will indicate the strength of the relationship between the predictor and the response.
iii. The sign of the coefficient for the predictor (horsepower) will indicate whether the relationship is positive or negative.
iv. The predicted mpg associated with a horsepower of 98 can be obtained from the regression equation."
## [1] "Output interpretation:\ni. The output will show whether there is a statistically significant relationship between the predictor (horsepower) and the response (mpg).\nii. The R-squared value will indicate the strength of the relationship between the predictor and the response.\niii. The sign of the coefficient for the predictor (horsepower) will indicate whether the relationship is positive or negative.\niv. The predicted mpg associated with a horsepower of 98 can be obtained from the regression equation."
"Exercise 10"
## [1] "Exercise 10"
attach(Carseats)
"a"
## [1] "a"
model <- lm(Sales ~ Price + Urban + US, data = Carseats)

summary(model)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
"b"
## [1] "b"
"Price: For a one-unit increase in Price, Sales are expected to decrease by the coefficient value, holding other variables constant.
Urban: If Urban is 1 (indicating the store is in an urban area), Sales are expected to increase/decrease by the coefficient value compared to stores in non-urban areas, holding other variables constant.
US: If the store is in the US, Sales are expected to increase/decrease by the coefficient value compared to stores outside the US, holding other variables constant."
## [1] "Price: For a one-unit increase in Price, Sales are expected to decrease by the coefficient value, holding other variables constant.\nUrban: If Urban is 1 (indicating the store is in an urban area), Sales are expected to increase/decrease by the coefficient value compared to stores in non-urban areas, holding other variables constant.\nUS: If the store is in the US, Sales are expected to increase/decrease by the coefficient value compared to stores outside the US, holding other variables constant."
"c"
## [1] "c"
"Model in equation form: Sales=β0​+β1​×Price+β2×Urban+β3​×US+ϵ"
## [1] "Model in equation form: Sales=β0​+β1​×Price+β2×Urban+β3​×US+ϵ"
"d"
## [1] "d"
"To test for rejection of the null hypothesis H:βj=0, where j is the coefficient index, you can look at the p-values associated with each coefficient in the summary output. If the p-value is less than the chosen significance level (typically 0.05), you can reject the null hypothesis for that coefficient."
## [1] "To test for rejection of the null hypothesis H:βj=0, where j is the coefficient index, you can look at the p-values associated with each coefficient in the summary output. If the p-value is less than the chosen significance level (typically 0.05), you can reject the null hypothesis for that coefficient."
"e"
## [1] "e"
smaller_model <- lm(Sales ~ Price + US, data = Carseats)
summary(smaller_model)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16
"f"
## [1] "f"
"Compare the models fit using statistical metrics such as R-squared, adjusted R-squared, AIC, BIC, etc."
## [1] "Compare the models fit using statistical metrics such as R-squared, adjusted R-squared, AIC, BIC, etc."
"g"
## [1] "g"
"To obtain 95% confidence intervals for the coefficients in the smaller model:"
## [1] "To obtain 95% confidence intervals for the coefficients in the smaller model:"
confint(smaller_model)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632
"h"
## [1] "h"
"To check for outliers or high leverage observations, you can use diagnostic plots such as residuals vs. fitted values, Q-Q plot of residuals, residuals vs. leverage, etc., as shown in question 8(c)."
## [1] "To check for outliers or high leverage observations, you can use diagnostic plots such as residuals vs. fitted values, Q-Q plot of residuals, residuals vs. leverage, etc., as shown in question 8(c)."
"Exercise 14"
## [1] "Exercise 14"
"a"
## [1] "a"
set.seed(1)
x1 <- runif(100)
x2 <- 0.5 * x1 + rnorm(100) / 10
y <- 2 + 2 * x1 + 0.3 * x2 + rnorm(100)
"y=β0+β1×x1+β2×x2+ϵ"
## [1] "y=β0+β1×x1+β2×x2+ϵ"
"b"
## [1] "b"
correlation <- cor(x1, x2)
print(correlation)
## [1] 0.8351212
plot(x1, x2, main = "Scatterplot of x1 vs. x2", xlab = "x1", ylab = "x2")

"c"
## [1] "c"
model <- lm(y ~ x1 + x2)

summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8311 -0.7273 -0.0537  0.6338  2.3359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
## x1            1.4396     0.7212   1.996   0.0487 *  
## x2            1.0097     1.1337   0.891   0.3754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared:  0.2088, Adjusted R-squared:  0.1925 
## F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05
"d"
## [1] "d"
model_x1 <- lm(y ~ x1)

summary(model_x1)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.89495 -0.66874 -0.07785  0.59221  2.45560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
## x1            1.9759     0.3963   4.986 2.66e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared:  0.2024, Adjusted R-squared:  0.1942 
## F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06
"e"
## [1] "e"
model_x2 <- lm(y ~ x2)

summary(model_x2)
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.62687 -0.75156 -0.03598  0.72383  2.44890 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
## x2            2.8996     0.6330    4.58 1.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared:  0.1763, Adjusted R-squared:  0.1679 
## F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05
"f"
## [1] "f"
"it's important to carefully interpret the results of each model and consider the context of the data to determine if there are any contradictions or inconsistencies between them"
## [1] "it's important to carefully interpret the results of each model and consider the context of the data to determine if there are any contradictions or inconsistencies between them"

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.