library(imager)
im<-load.image("table34.jpg")
plot(im)
Respuesta: Hipótesis nula (TV): “Esta variable no es significativa”, conclusión (p-value): NO es significativa
Hipótesis nula (radio): “Esta variable no es significativa”, conclusión (p-value): NO es significativa
Hipótesis nula (newspaper): “Esta variable no es significativa”, conclusión (p-value): ES significativa
Respuesta Los “KNN classifier” se utilizan para resolver problemas con una respuesta cualitativa (problemas de clasificaciOn), los “KNN regression methods” se utilizan para resolver problemas cuantitativos (problemas de regresiOn).
Suppose we have a data set with five predictors, X1 =GPA, X2 = IQ, X3 = Gender (1 for Female and 0 for Male), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Gender. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get ??0 = 50, ??1 = 20, ??2 = 0.07, ??3 = 35, ??4 = 0.01, ??5 = ???10.
y = 50 + 20(GPA) + 0.07(IQ) + 35(Gender) + 0.01(GPA)(IQ) -10(GPA)(Gender)
Respuesta El salario de los hombres es mayor que el de las mujeres, por lo que la respuesta correcta es la 3.
Predict the salary of a female with IQ of 110 and a GPA of 4.0.
Respuesta 50 + 20(4) + 0.07(110) + 35(1) + 0.01(4)(110) -10(4)(1) = 137.1
Respuesta Como la relación es lineal, se esperaría que la suma de los residuos cuadrados sean menor que el residuo de los cubos.
Answer (a) using test rather than training RSS.
Suppose that the true relationship between X and Y is not linear, but we don’t know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.
Respuesta Se esperaría que el training RSS sea menor que la regresión lineal, pues es más flexible que el anterior y al no ser lineal, la respuesta se puede adaptar más.
What is ai’? Note: We interpret this result by saying that the fitted values from linear regression are linear combinations of the response values.
library(ISLR)
Auto_reg <- lm(mpg~horsepower,Auto)
summary(Auto_reg)
Call:
lm(formula = mpg ~ horsepower, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-13.5710 -3.2592 -0.3435 2.7630 16.9240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.935861 0.717499 55.66 <2e-16 ***
horsepower -0.157845 0.006446 -24.49 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
a.1. Is there a relationship between the predictor and the response? Respuesta Como p es muy bajo, se rechaza la hipótesis nula y se dice que “horsepower” tiene relación con “mpg”
a.2 How strong is the relationship between the predictor and the response? Respuesta Esto se observa en el r^2 que explica la relación entre las variables, siendo en este caso de 0.6059
a.3 Is the relationship between the predictor and the response positive or negative? Respuesta El coeficiente de “horsepower” es negativo, por lo tanto la relación también lo es
a.4 What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?
predict(Auto_reg ,data.frame(horsepower=98),interval ="confidence")
fit lwr upr
1 24.46708 23.97308 24.96108
predict(Auto_reg ,data.frame(horsepower=98),interval ="prediction")
fit lwr upr
1 24.46708 14.8094 34.12476
Plot the response and the predictor. Use the abline() function to display the least squares regression line.
plot(Auto$horsepower,Auto$mpg,col="blue",pch=20)
abline(Auto_reg,col="red",lwd=3)
par(mfrow =c(2,2))
plot(Auto_reg)
plot(Auto)
cor(Auto[,-9])
mpg cylinders displacement horsepower weight acceleration year origin
mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442 0.4233285 0.5805410 0.5652088
cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273 -0.5046834 -0.3456474 -0.5689316
displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944 -0.5438005 -0.3698552 -0.6145351
horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377 -0.6891955 -0.4163615 -0.4551715
weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000 -0.4168392 -0.3091199 -0.5850054
acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392 1.0000000 0.2903161 0.2127458
year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199 0.2903161 1.0000000 0.1815277
origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054 0.2127458 0.1815277 1.0000000
AutoN <- Auto[,-9]
Auto_mreg <- lm(mpg~.,AutoN)
summary(Auto_mreg)
Call:
lm(formula = mpg ~ ., data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-9.5903 -2.1565 -0.1169 1.8690 13.0604
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.218435 4.644294 -3.707 0.00024 ***
cylinders -0.493376 0.323282 -1.526 0.12780
displacement 0.019896 0.007515 2.647 0.00844 **
horsepower -0.016951 0.013787 -1.230 0.21963
weight -0.006474 0.000652 -9.929 < 2e-16 ***
acceleration 0.080576 0.098845 0.815 0.41548
year 0.750773 0.050973 14.729 < 2e-16 ***
origin 1.426141 0.278136 5.127 4.67e-07 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.328 on 384 degrees of freedom
Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
c.1. Is there a relationship between the predictors and the response? Respuesta Sí, existe una relación entre los predictores y la respuesta, puesto que se tiene un r^2 significativo de 0.8215
c.2. Which predictors appear to have a statistically significant relationship to the response? Respuesta “displacemente”, “weight”, “year”, “origin”
c.3. What does the coefficient for the year variable suggest? Respuesta Cada Aumento de 1 año, aumenta 0.75 mpg el carro
par(mfrow =c(2,2))
plot(Auto_mreg)
e.1 Sin las variables no significativas modelo anterior
A1 <- lm(mpg~.-cylinders -horsepower -acceleration ,AutoN)
summary(A1)
Call:
lm(formula = mpg ~ . - cylinders - horsepower - acceleration,
data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-9.8102 -2.1129 -0.0388 1.7725 13.2085
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.861e+01 4.028e+00 -4.620 5.25e-06 ***
displacement 5.588e-03 4.768e-03 1.172 0.242
weight -6.575e-03 5.571e-04 -11.802 < 2e-16 ***
year 7.714e-01 4.981e-02 15.486 < 2e-16 ***
origin 1.226e+00 2.670e-01 4.593 5.92e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.346 on 387 degrees of freedom
Multiple R-squared: 0.8181, Adjusted R-squared: 0.8162
F-statistic: 435.1 on 4 and 387 DF, p-value: < 2.2e-16
Respuesta r^2 0.81 (menor al modelo anterior)
e.2 Sin las variables no significativas modelo anterior
A2 <- lm(mpg~.-cylinders -horsepower -acceleration-displacement ,AutoN)
summary(A2)
Call:
lm(formula = mpg ~ . - cylinders - horsepower - acceleration -
displacement, data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-9.9440 -2.0948 -0.0389 1.7255 13.2722
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.805e+01 4.001e+00 -4.510 8.60e-06 ***
weight -5.994e-03 2.541e-04 -23.588 < 2e-16 ***
year 7.571e-01 4.832e-02 15.668 < 2e-16 ***
origin 1.150e+00 2.591e-01 4.439 1.18e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.348 on 388 degrees of freedom
Multiple R-squared: 0.8175, Adjusted R-squared: 0.816
F-statistic: 579.2 on 3 and 388 DF, p-value: < 2.2e-16
e.3
A3 <- lm(mpg~cylinders+horsepower+(cylinders*horsepower)+
weight+acceleration+(weight*acceleration)+
year+origin,AutoN)
summary(A3)
Call:
lm(formula = mpg ~ cylinders + horsepower + (cylinders * horsepower) +
weight + acceleration + (weight * acceleration) + year +
origin, data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-8.9936 -1.6294 -0.0371 1.3014 11.7317
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.807e+00 8.135e+00 0.468 0.64011
cylinders -3.962e+00 5.416e-01 -7.315 1.52e-12 ***
horsepower -2.934e-01 3.508e-02 -8.363 1.15e-15 ***
weight -2.147e-03 1.607e-03 -1.336 0.18240
acceleration 1.647e-01 2.912e-01 0.566 0.57193
year 7.479e-01 4.516e-02 16.559 < 2e-16 ***
origin 9.066e-01 2.333e-01 3.885 0.00012 ***
cylinders:horsepower 3.635e-02 4.718e-03 7.705 1.14e-13 ***
weight:acceleration -1.157e-04 9.664e-05 -1.197 0.23217
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.923 on 383 degrees of freedom
Multiple R-squared: 0.8626, Adjusted R-squared: 0.8597
F-statistic: 300.5 on 8 and 383 DF, p-value: < 2.2e-16
e.4
A4 <- lm(mpg~cylinders+horsepower+(cylinders*horsepower)+year+origin,AutoN)
summary(A4)
Call:
lm(formula = mpg ~ cylinders + horsepower + (cylinders * horsepower) +
year + origin, data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-9.2635 -1.9176 -0.4087 1.7760 12.3113
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.313808 4.691320 2.625 0.00901 **
cylinders -5.961005 0.429236 -13.887 < 2e-16 ***
horsepower -0.380413 0.027737 -13.715 < 2e-16 ***
year 0.691017 0.049132 14.065 < 2e-16 ***
origin 1.412447 0.252303 5.598 4.12e-08 ***
cylinders:horsepower 0.045883 0.003818 12.016 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.251 on 386 degrees of freedom
Multiple R-squared: 0.8288, Adjusted R-squared: 0.8266
F-statistic: 373.7 on 5 and 386 DF, p-value: < 2.2e-16
e.5
A5 <- lm(mpg~cylinders+horsepower+(cylinders*horsepower)+
weight+acceleration+(weight*acceleration)+year,AutoN)
summary(A5)
Call:
lm(formula = mpg ~ cylinders + horsepower + (cylinders * horsepower) +
weight + acceleration + (weight * acceleration) + year, data = AutoN)
Residuals:
Min 1Q Median 3Q Max
-8.2693 -1.7845 -0.0232 1.4741 12.4551
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.847e+00 8.266e+00 0.707 0.480
cylinders -4.194e+00 5.481e-01 -7.651 1.62e-13 ***
horsepower -2.943e-01 3.571e-02 -8.239 2.77e-15 ***
weight -2.285e-03 1.636e-03 -1.397 0.163
acceleration 2.268e-01 2.961e-01 0.766 0.444
year 7.548e-01 4.595e-02 16.426 < 2e-16 ***
cylinders:horsepower 3.723e-02 4.799e-03 7.758 7.88e-14 ***
weight:acceleration -1.350e-04 9.827e-05 -1.374 0.170
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.977 on 384 degrees of freedom
Multiple R-squared: 0.8572, Adjusted R-squared: 0.8546
F-statistic: 329.2 on 7 and 384 DF, p-value: < 2.2e-16
Respuesta De los modelos realizados, el más apegado (mejor r^2) es el modelo A3 (inciso e.3)
f.Try a few different transformations of the variables, such as log(X),square(X), X^2. Comment on your findings.
par(mfrow = c(2, 2))
plot(log(Auto$horsepower), Auto$mpg)
plot(sqrt(Auto$horsepower), Auto$mpg)
plot((Auto$horsepower)^2, Auto$mpg)
Cars <- Carseats
names(Cars)
[1] "Sales" "CompPrice" "Income" "Advertising" "Population" "Price" "ShelveLoc"
[8] "Age" "Education" "Urban" "US"
cars_mreg<-lm(Sales~Price+Urban+US,Cars)
summary(cars_mreg)
Call:
lm(formula = Sales ~ Price + Urban + US, data = Cars)
Residuals:
Min 1Q Median 3Q Max
-6.9206 -1.6220 -0.0564 1.5786 7.0581
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
Price -0.054459 0.005242 -10.389 < 2e-16 ***
UrbanYes -0.021916 0.271650 -0.081 0.936
USYes 1.200573 0.259042 4.635 4.86e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.472 on 396 degrees of freedom
Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
b.Provide an interpretation of each coefficient in the model. Be careful-some of the variables in the model are qualitative! Respuesta si es urbano y de US entonces las ventas se reduce en 0.07 del intercepto, si solo es US y no Urban entonces solo se reduce 0.05 del intercepto, si es urbano y no US se reduce 1.27
Write out the model in equation form, being careful to handle the qualitative variables properly. Respuesta Sales = 13.0-0.054(Price)-0.02(Urban)+1.2(US)
For which of the predictors can you reject the null hypothesis H0 : ??j = 0? Respuesta ??1 = falso, la hipótesis nula no se acepta, por lo tanto el Precio es significativo ??2 = verdadero, la hipótesis nula se acepta, el Urban es poco significativo ??3 = falso, la hipótesis nula no se acepta, el US es significativo
On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.
c1<-lm(Sales~Price+US,Cars)
summary(c1)
Call:
lm(formula = Sales ~ Price + US, data = Cars)
Residuals:
Min 1Q Median 3Q Max
-6.9269 -1.6286 -0.0574 1.5766 7.0515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
Price -0.05448 0.00523 -10.416 < 2e-16 ***
USYes 1.19964 0.25846 4.641 4.71e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.469 on 397 degrees of freedom
Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
How well do the models in (a) and (e) fit the data? El modelo a y el e tiene residuos similares y r^2 parecidos, por lo que se ajustan de manera muy similar y pobre.
Using the model from (e), obtain 95% confidence intervals for the coefficient(s).
confint(c1,level=0.95)
2.5 % 97.5 %
(Intercept) 11.79032020 14.27126531
Price -0.06475984 -0.04419543
USYes 0.69151957 1.70776632
set.seed (1)
x=rnorm (100)
y=2*x+rnorm (100)
reg1<-lm(y~x+0)
summary(reg1)
Call:
lm(formula = y ~ x + 0)
Residuals:
Min 1Q Median 3Q Max
-1.9154 -0.6472 -0.1771 0.5056 2.3109
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 1.9939 0.1065 18.73 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.9586 on 99 degrees of freedom
Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
reg2<-lm(x~y+0)
summary(reg2)
Call:
lm(formula = x ~ y + 0)
Residuals:
Min 1Q Median 3Q Max
-0.8699 -0.2368 0.1030 0.2858 0.8938
Coefficients:
Estimate Std. Error t value Pr(>|t|)
y 0.39111 0.02089 18.73 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4246 on 99 degrees of freedom
Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
What is the relationship between the results obtained in (a) and (b)? Respuesta Es una relación directa, solo cambian los ejes, por eso se obtiene el mismo r^2 y el mismo F-statistic
In R, show that when regression is performed with an intercept, the t-statistic for H0 : ??1 = 0 is the same for the regression of y onto x as it is for the regression of x onto y.
reg1<-lm(y~x+0)
summary(reg1)$coefficients
Estimate Std. Error t value Pr(>|t|)
x 1.993876 0.1064767 18.72593 2.642197e-34
reg2<-lm(x~y+0)
summary(reg2)$coefficients
Estimate Std. Error t value Pr(>|t|)
y 0.3911145 0.02088625 18.72593 2.642197e-34
Recall that the coefficient estimate ?? for the linear regression of Y onto X without an intercept is given by (3.38). Under what circumstance is the coefficient estimate for the regression of X onto Y the same as the coefficient estimate for the regression of Y onto X? Respuesta Cuando la sumatoria de los x al cuadrado y los y al cuadrado sea la misma
Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is different from the coefficient estimate for the regression of Y onto X.
x<-1:100
y<-101:200
rx<-lm(y~x+0)
summary(rx)
Call:
lm(formula = y ~ x + 0)
Residuals:
Min 1Q Median 3Q Max
-49.25 -12.31 24.63 61.57 98.51
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 2.49254 0.08574 29.07 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 49.88 on 99 degrees of freedom
Multiple R-squared: 0.8951, Adjusted R-squared: 0.8941
F-statistic: 845 on 1 and 99 DF, p-value: < 2.2e-16
ry<-lm(x~y+0)
summary(ry)
Call:
lm(formula = x ~ y + 0)
Residuals:
Min 1Q Median 3Q Max
-35.272 -19.410 -3.548 12.313 28.175
Coefficients:
Estimate Std. Error t value Pr(>|t|)
y 0.35912 0.01235 29.07 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.93 on 99 degrees of freedom
Multiple R-squared: 0.8951, Adjusted R-squared: 0.8941
F-statistic: 845 on 1 and 99 DF, p-value: < 2.2e-16
x<-1:100
y<-100:1
rx<-lm(y~x+0)
summary(rx)
Call:
lm(formula = y ~ x + 0)
Residuals:
Min 1Q Median 3Q Max
-49.75 -12.44 24.87 62.18 99.49
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 0.5075 0.0866 5.86 6.09e-08 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 50.37 on 99 degrees of freedom
Multiple R-squared: 0.2575, Adjusted R-squared: 0.25
F-statistic: 34.34 on 1 and 99 DF, p-value: 6.094e-08
ry<-lm(x~y+0)
summary(ry)
Call:
lm(formula = x ~ y + 0)
Residuals:
Min 1Q Median 3Q Max
-49.75 -12.44 24.87 62.18 99.49
Coefficients:
Estimate Std. Error t value Pr(>|t|)
y 0.5075 0.0866 5.86 6.09e-08 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 50.37 on 99 degrees of freedom
Multiple R-squared: 0.2575, Adjusted R-squared: 0.25
F-statistic: 34.34 on 1 and 99 DF, p-value: 6.094e-08
set.seed(1)
x<- rnorm(n=100)
set.seed(2)
eps<- rnorm(n=100,sd=sqrt(0.25))
y=-1+0.5*x+eps
length(y)
[1] 100
Respuesta los valores de ??0 y ??1 son -1 y 0.5 respectivamente
plot(x,y)
fit <- lm(y~x)
summary(fit)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.22689 -0.40393 -0.04575 0.41574 1.14118
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.00454 0.05804 -17.308 < 2e-16 ***
x 0.40072 0.06446 6.216 1.25e-08 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.5761 on 98 degrees of freedom
Multiple R-squared: 0.2828, Adjusted R-squared: 0.2755
F-statistic: 38.64 on 1 and 98 DF, p-value: 1.247e-08
plot(x,y)
abline(fit,col="red",lwd=3)
fit1 <- lm(y ~ x + I(x^2))
summary(fit1)
Call:
lm(formula = y ~ x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-1.30604 -0.38957 -0.06695 0.40921 1.13539
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.92420 0.06967 -13.265 < 2e-16 ***
x 0.41623 0.06394 6.509 3.33e-09 ***
I(x^2) -0.10121 0.05020 -2.016 0.0465 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.5673 on 97 degrees of freedom
Multiple R-squared: 0.3116, Adjusted R-squared: 0.2974
F-statistic: 21.96 on 2 and 97 DF, p-value: 1.362e-08
set.seed (1)
x1=runif (100)
x2 =0.5* x1+rnorm (100) /10
y=2+2* x1 +0.3* x2+rnorm (100)
The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?
Respuesta ??0=2; ??1=2; ??3=0.3
cor(x1,x2)
[1] 0.8351212
plot(x1,x2)
reg_y <- lm(y~x1+x2)
summary(reg_y)
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-2.8311 -0.7273 -0.0537 0.6338 2.3359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
x1 1.4396 0.7212 1.996 0.0487 *
x2 1.0097 1.1337 0.891 0.3754
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.056 on 97 degrees of freedom
Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
reg_yx1 <- lm(y~x1)
summary(reg_yx1)
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.89495 -0.66874 -0.07785 0.59221 2.45560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
x1 1.9759 0.3963 4.986 2.66e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.055 on 98 degrees of freedom
Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942
F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06
reg_yx2 <- lm(y~x2)
summary(reg_yx2)
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-2.62687 -0.75156 -0.03598 0.72383 2.44890
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
x2 2.8996 0.6330 4.58 1.37e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.072 on 98 degrees of freedom
Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679
F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05
x1=c(x1 , 0.1)
x2=c(x2 , 0.8)
y=c(y,6)
Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.
reg_y <- lm(y~x1+x2)
summary(reg_y)
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-2.73348 -0.69318 -0.05263 0.66385 2.30619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2267 0.2314 9.624 7.91e-16 ***
x1 0.5394 0.5922 0.911 0.36458
x2 2.5146 0.8977 2.801 0.00614 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.075 on 98 degrees of freedom
Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029
F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
reg_yx1 <- lm(y~x1)
summary(reg_yx1)
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.8897 -0.6556 -0.0909 0.5682 3.5665
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2569 0.2390 9.445 1.78e-15 ***
x1 1.7657 0.4124 4.282 4.29e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.111 on 99 degrees of freedom
Multiple R-squared: 0.1562, Adjusted R-squared: 0.1477
F-statistic: 18.33 on 1 and 99 DF, p-value: 4.295e-05
reg_yx2 <- lm(y~x2)
summary(reg_yx2)
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-2.64729 -0.71021 -0.06899 0.72699 2.38074
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3451 0.1912 12.264 < 2e-16 ***
x2 3.1190 0.6040 5.164 1.25e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.074 on 99 degrees of freedom
Multiple R-squared: 0.2122, Adjusted R-squared: 0.2042
F-statistic: 26.66 on 1 and 99 DF, p-value: 1.253e-06