Show that the ridge regression estimate is the mean (and mode) of the posterior distribution, under a Gaussian prior ?? ??? N(0, ?? 2I), and Gaussian sampling model y ??? N(X??, ??2I). Find the relationship between the regularization parameter ?? in the ridge formula, and the variances ??2 and ??2.
This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.
For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.
rm(list=ls())
setwd("/Users/Mughundhan/UIC/UIC Academics/FALL 2017/BIZ ANALYTICS STATS/Assignment 2")
library(MASS)
data(Boston)
colnames(Boston)
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
# summary(lm(crim ~ zn, data = Boston))
# summary(lm(crim ~ indus, data = Boston))
summary(lm(crim ~ chas, data = Boston))
##
## Call:
## lm(formula = crim ~ chas, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.738 -3.661 -3.435 0.018 85.232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7444 0.3961 9.453 <2e-16 ***
## chas -1.8928 1.5061 -1.257 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.597 on 504 degrees of freedom
## Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
## F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
# summary(lm(crim ~ nox, data = Boston))
# summary(lm(crim ~ rm, data = Boston))
# summary(lm(crim ~ age, data = Boston))
# summary(lm(crim ~ dis, data = Boston))
# summary(lm(crim ~ rad, data = Boston))
# summary(lm(crim ~ tax, data = Boston))
# summary(lm(crim ~ ptratio, data = Boston))
# summary(lm(crim ~ black, data = Boston))
# summary(lm(crim ~ lstat, data = Boston))
# summary(lm(crim ~ medv, data = Boston))
par(mfrow=c(2,2)) #Fill by rows: Row, Cols
# par(mar = rep(2, 4)) #Setting Margins
plot(crim ~ . - crim, data = Boston)
Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : ??j = 0?
summary(lm(crim ~ . - crim, data = Boston))
##
## Call:
## lm(formula = crim ~ . - crim, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.924 -2.120 -0.353 1.019 75.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.033228 7.234903 2.354 0.018949 *
## zn 0.044855 0.018734 2.394 0.017025 *
## indus -0.063855 0.083407 -0.766 0.444294
## chas -0.749134 1.180147 -0.635 0.525867
## nox -10.313535 5.275536 -1.955 0.051152 .
## rm 0.430131 0.612830 0.702 0.483089
## age 0.001452 0.017925 0.081 0.935488
## dis -0.987176 0.281817 -3.503 0.000502 ***
## rad 0.588209 0.088049 6.680 6.46e-11 ***
## tax -0.003780 0.005156 -0.733 0.463793
## ptratio -0.271081 0.186450 -1.454 0.146611
## black -0.007538 0.003673 -2.052 0.040702 *
## lstat 0.126211 0.075725 1.667 0.096208 .
## medv -0.198887 0.060516 -3.287 0.001087 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.439 on 492 degrees of freedom
## Multiple R-squared: 0.454, Adjusted R-squared: 0.4396
## F-statistic: 31.47 on 13 and 492 DF, p-value: < 2.2e-16
Reject Null Hypotheses for p-value < 0.05 [95% Confidence]
Thus, we shall conclude that we can Reject Null Hypotheses for the predictors zn, dis, rad, black, medv.
How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.
univcof <- lm(crim ~ zn, data = Boston)$coefficients[2]
univcof <- append(univcof, lm(crim ~ indus, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ chas, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ nox, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ rm, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ age, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ dis, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ rad, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ tax, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ ptratio, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ black, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ lstat, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ medv, data = Boston)$coefficients[2])
fooBoston <- (lm(crim ~ . - crim, data = Boston))
fooBoston$coefficients[2:14]
## zn indus chas nox rm
## 0.044855215 -0.063854824 -0.749133611 -10.313534912 0.430130506
## age dis rad tax ptratio
## 0.001451643 -0.987175726 0.588208591 -0.003780016 -0.271080558
## black lstat medv
## -0.007537505 0.126211376 -0.198886821
plot(univcof, fooBoston$coefficients[2:14], main = "Simple Linear Regression vs. Multiple Linear Regression",
xlab = "Simple Linear Regression", ylab = "Multiple Linear Regression", col="blue")
Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form Y = ??0 + ??1X + ??2X2 + ??3X3 + .
summary(lm(crim ~ poly(zn, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(zn, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.821 -4.614 -1.294 0.473 84.130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3722 9.709 < 2e-16 ***
## poly(zn, 3)1 -38.7498 8.3722 -4.628 4.7e-06 ***
## poly(zn, 3)2 23.9398 8.3722 2.859 0.00442 **
## poly(zn, 3)3 -10.0719 8.3722 -1.203 0.22954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.372 on 502 degrees of freedom
## Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
## F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
summary(lm(crim ~ poly(indus, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(indus, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.278 -2.514 0.054 0.764 79.713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.330 10.950 < 2e-16 ***
## poly(indus, 3)1 78.591 7.423 10.587 < 2e-16 ***
## poly(indus, 3)2 -24.395 7.423 -3.286 0.00109 **
## poly(indus, 3)3 -54.130 7.423 -7.292 1.2e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.423 on 502 degrees of freedom
## Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
## F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(nox, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(nox, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.110 -2.068 -0.255 0.739 78.302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3216 11.237 < 2e-16 ***
## poly(nox, 3)1 81.3720 7.2336 11.249 < 2e-16 ***
## poly(nox, 3)2 -28.8286 7.2336 -3.985 7.74e-05 ***
## poly(nox, 3)3 -60.3619 7.2336 -8.345 6.96e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.234 on 502 degrees of freedom
## Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
## F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(rm, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(rm, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.485 -3.468 -2.221 -0.015 87.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3703 9.758 < 2e-16 ***
## poly(rm, 3)1 -42.3794 8.3297 -5.088 5.13e-07 ***
## poly(rm, 3)2 26.5768 8.3297 3.191 0.00151 **
## poly(rm, 3)3 -5.5103 8.3297 -0.662 0.50858
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.33 on 502 degrees of freedom
## Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
## F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
summary(lm(crim ~ poly(age, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(age, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.762 -2.673 -0.516 0.019 82.842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3485 10.368 < 2e-16 ***
## poly(age, 3)1 68.1820 7.8397 8.697 < 2e-16 ***
## poly(age, 3)2 37.4845 7.8397 4.781 2.29e-06 ***
## poly(age, 3)3 21.3532 7.8397 2.724 0.00668 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
## F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(dis, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.757 -2.588 0.031 1.267 76.378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3259 11.087 < 2e-16 ***
## poly(dis, 3)1 -73.3886 7.3315 -10.010 < 2e-16 ***
## poly(dis, 3)2 56.3730 7.3315 7.689 7.87e-14 ***
## poly(dis, 3)3 -42.6219 7.3315 -5.814 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.331 on 502 degrees of freedom
## Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
## F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(rad, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(rad, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.381 -0.412 -0.269 0.179 76.217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.2971 12.164 < 2e-16 ***
## poly(rad, 3)1 120.9074 6.6824 18.093 < 2e-16 ***
## poly(rad, 3)2 17.4923 6.6824 2.618 0.00912 **
## poly(rad, 3)3 4.6985 6.6824 0.703 0.48231
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.682 on 502 degrees of freedom
## Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
## F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(tax, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(tax, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.273 -1.389 0.046 0.536 76.950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3047 11.860 < 2e-16 ***
## poly(tax, 3)1 112.6458 6.8537 16.436 < 2e-16 ***
## poly(tax, 3)2 32.0873 6.8537 4.682 3.67e-06 ***
## poly(tax, 3)3 -7.9968 6.8537 -1.167 0.244
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.854 on 502 degrees of freedom
## Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
## F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(ptratio, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(ptratio, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.833 -4.146 -1.655 1.408 82.697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.361 10.008 < 2e-16 ***
## poly(ptratio, 3)1 56.045 8.122 6.901 1.57e-11 ***
## poly(ptratio, 3)2 24.775 8.122 3.050 0.00241 **
## poly(ptratio, 3)3 -22.280 8.122 -2.743 0.00630 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.122 on 502 degrees of freedom
## Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
## F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
summary(lm(crim ~ poly(black, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(black, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.096 -2.343 -2.128 -1.439 86.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3536 10.218 <2e-16 ***
## poly(black, 3)1 -74.4312 7.9546 -9.357 <2e-16 ***
## poly(black, 3)2 5.9264 7.9546 0.745 0.457
## poly(black, 3)3 -4.8346 7.9546 -0.608 0.544
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.955 on 502 degrees of freedom
## Multiple R-squared: 0.1498, Adjusted R-squared: 0.1448
## F-statistic: 29.49 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(lstat, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(lstat, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.234 -2.151 -0.486 0.066 83.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3392 10.654 <2e-16 ***
## poly(lstat, 3)1 88.0697 7.6294 11.543 <2e-16 ***
## poly(lstat, 3)2 15.8882 7.6294 2.082 0.0378 *
## poly(lstat, 3)3 -11.5740 7.6294 -1.517 0.1299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.629 on 502 degrees of freedom
## Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
## F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
summary(lm(crim ~ poly(medv, 3), data = Boston))
##
## Call:
## lm(formula = crim ~ poly(medv, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.427 -1.976 -0.437 0.439 73.655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.292 12.374 < 2e-16 ***
## poly(medv, 3)1 -75.058 6.569 -11.426 < 2e-16 ***
## poly(medv, 3)2 88.086 6.569 13.409 < 2e-16 ***
## poly(medv, 3)3 -48.033 6.569 -7.312 1.05e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.569 on 502 degrees of freedom
## Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
## F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16
Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, ????0 = ???6, ????1 = 0.05, ????2 = 1.
Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5 gets an A in the class.
hours <- 40
UG_GPA <- 3.5
prob_A <- function(x1,x2){ z <- exp(-6 + 0.05*x1 + 1*x2); return( round(z/(1+z),4))}
prob_A(hours,UG_GPA)
## [1] 0.3775
How many hours would the student in part (a) need to study to have a 50 % chance of getting an A in the class?
hours <- seq(45,55,1)
probs <- mapply(hours, 3.5, FUN=prob_A)
probs_tab <- cbind(probs, hours)
probs_tab
## probs hours
## [1,] 0.4378 45
## [2,] 0.4502 46
## [3,] 0.4626 47
## [4,] 0.4750 48
## [5,] 0.4875 49
## [6,] 0.5000 50
## [7,] 0.5125 51
## [8,] 0.5250 52
## [9,] 0.5374 53
## [10,] 0.5498 54
## [11,] 0.5622 55
probs_tab[probs==0.50, ]
## probs hours
## 0.5 50.0
This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar in nature to the Smarket data from this chapter???s lab, except that it contains 1, 089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.
Produce some numerical and graphical summaries of the Weekly data. Do there appear to be any patterns?
library(ISLR)
summary(Weekly)
## Year Lag1 Lag2 Lag3
## Min. :1990 Min. :-18.1950 Min. :-18.1950 Min. :-18.1950
## 1st Qu.:1995 1st Qu.: -1.1540 1st Qu.: -1.1540 1st Qu.: -1.1580
## Median :2000 Median : 0.2410 Median : 0.2410 Median : 0.2410
## Mean :2000 Mean : 0.1506 Mean : 0.1511 Mean : 0.1472
## 3rd Qu.:2005 3rd Qu.: 1.4050 3rd Qu.: 1.4090 3rd Qu.: 1.4090
## Max. :2010 Max. : 12.0260 Max. : 12.0260 Max. : 12.0260
## Lag4 Lag5 Volume
## Min. :-18.1950 Min. :-18.1950 Min. :0.08747
## 1st Qu.: -1.1580 1st Qu.: -1.1660 1st Qu.:0.33202
## Median : 0.2380 Median : 0.2340 Median :1.00268
## Mean : 0.1458 Mean : 0.1399 Mean :1.57462
## 3rd Qu.: 1.4090 3rd Qu.: 1.4050 3rd Qu.:2.05373
## Max. : 12.0260 Max. : 12.0260 Max. :9.32821
## Today Direction
## Min. :-18.1950 Down:484
## 1st Qu.: -1.1540 Up :605
## Median : 0.2410
## Mean : 0.1499
## 3rd Qu.: 1.4050
## Max. : 12.0260
pairs(Weekly)
cor(Weekly[,-ncol(Weekly)])
## Year Lag1 Lag2 Lag3 Lag4
## Year 1.00000000 -0.032289274 -0.03339001 -0.03000649 -0.031127923
## Lag1 -0.03228927 1.000000000 -0.07485305 0.05863568 -0.071273876
## Lag2 -0.03339001 -0.074853051 1.00000000 -0.07572091 0.058381535
## Lag3 -0.03000649 0.058635682 -0.07572091 1.00000000 -0.075395865
## Lag4 -0.03112792 -0.071273876 0.05838153 -0.07539587 1.000000000
## Lag5 -0.03051910 -0.008183096 -0.07249948 0.06065717 -0.075675027
## Volume 0.84194162 -0.064951313 -0.08551314 -0.06928771 -0.061074617
## Today -0.03245989 -0.075031842 0.05916672 -0.07124364 -0.007825873
## Lag5 Volume Today
## Year -0.030519101 0.84194162 -0.032459894
## Lag1 -0.008183096 -0.06495131 -0.075031842
## Lag2 -0.072499482 -0.08551314 0.059166717
## Lag3 0.060657175 -0.06928771 -0.071243639
## Lag4 -0.075675027 -0.06107462 -0.007825873
## Lag5 1.000000000 -0.05851741 0.011012698
## Volume -0.058517414 1.00000000 -0.033077783
## Today 0.011012698 -0.03307778 1.000000000
Use the full data set to perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. Use the summary function to print the results. Do any of the predictors appear to be statistically significant? If so, which ones?
log_fit = glm(Direction ~ Lag1+Lag2+Lag3+Lag4+Lag5+Volume, family=binomial, data=Weekly)
summary(log_fit)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 +
## Volume, family = binomial, data = Weekly)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6949 -1.2565 0.9913 1.0849 1.4579
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.26686 0.08593 3.106 0.0019 **
## Lag1 -0.04127 0.02641 -1.563 0.1181
## Lag2 0.05844 0.02686 2.175 0.0296 *
## Lag3 -0.01606 0.02666 -0.602 0.5469
## Lag4 -0.02779 0.02646 -1.050 0.2937
## Lag5 -0.01447 0.02638 -0.549 0.5833
## Volume -0.02274 0.03690 -0.616 0.5377
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1496.2 on 1088 degrees of freedom
## Residual deviance: 1486.4 on 1082 degrees of freedom
## AIC: 1500.4
##
## Number of Fisher Scoring iterations: 4
Compute the confusion matrix and overall fraction of correct predictions. Explain what the confusion matrix is telling you about the types of mistakes made by logistic regression.
glm_probs=predict(log_fit,Weekly,type="response")
glm_pred=rep("Down",nrow(Weekly))
glm_pred[glm_probs > 0.50]="Up"
table(glm_pred,Weekly$Direction)
##
## glm_pred Down Up
## Down 54 48
## Up 430 557
#mean(glm_pred==Weekly$Direction)
correct_pred <- round((54+557)/(54+557+48+430), 4)
Right_Up <- round(557/(48+557), 4)
Right_Down <- round(54/(54+430), 4)
error_rate <- 1-correct_pred
cbind(correct_pred, Right_Up, Right_Down, error_rate)
## correct_pred Right_Up Right_Down error_rate
## [1,] 0.5611 0.9207 0.1116 0.4389
Now fit the logistic regression model using a training data period from 1990 to 2008, with Lag2 as the only predictor. Compute the confusion matrix and the overall fraction of correct predictions for the held out data (that is, the data from 2009 and 2010).
train=Weekly$Year <= 2008
Weekly_test=Weekly[!train,]
log_fit = glm(Direction ~ Lag2, family=binomial, data=Weekly, subset=train)
contrasts(Weekly$Direction)
## Up
## Down 0
## Up 1
summary(log_fit)
##
## Call:
## glm(formula = Direction ~ Lag2, family = binomial, data = Weekly,
## subset = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.536 -1.264 1.021 1.091 1.368
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.20326 0.06428 3.162 0.00157 **
## Lag2 0.05810 0.02870 2.024 0.04298 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1354.7 on 984 degrees of freedom
## Residual deviance: 1350.5 on 983 degrees of freedom
## AIC: 1354.5
##
## Number of Fisher Scoring iterations: 4
glm_probs=predict(log_fit, Weekly_test, type="response")
glm_pred=rep("Down",nrow(Weekly_test))
glm_pred[glm_probs > 0.50]="Up"
table(glm_pred, Weekly_test$Direction)
##
## glm_pred Down Up
## Down 9 5
## Up 34 56
correct_pred <- round((9+56)/104, 4)
Right_Up <- round(56/(56+5), 4)
Right_Down <- round(9/(9+34), 4)
error_rate <- 1-correct_pred
log_details <- cbind(correct_pred, Right_Up, Right_Down, error_rate)
log_details
## correct_pred Right_Up Right_Down error_rate
## [1,] 0.625 0.918 0.2093 0.375
Repeat (d) using LDA.
lda_fit = lda(Direction ~ Lag2, data=Weekly, subset=train)
lda_fit
## Call:
## lda(Direction ~ Lag2, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag2
## Down -0.03568254
## Up 0.26036581
##
## Coefficients of linear discriminants:
## LD1
## Lag2 0.4414162
lda_class = predict(lda_fit, Weekly_test)$class
table(lda_class, Weekly_test$Direction)
##
## lda_class Down Up
## Down 9 5
## Up 34 56
correct_pred <- round((9+56)/104, 4)
Right_Up <- round(56/(56+5), 4)
Right_Down <- round(9/(9+34), 4)
error_rate <- 1-correct_pred
lda_details <- cbind(correct_pred, Right_Up, Right_Down, error_rate)
lda_details
## correct_pred Right_Up Right_Down error_rate
## [1,] 0.625 0.918 0.2093 0.375
Repeat (d) using QDA.
qda_fit = qda(Direction ~ Lag2, data=Weekly, subset=train)
qda_fit
## Call:
## qda(Direction ~ Lag2, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag2
## Down -0.03568254
## Up 0.26036581
qda_class = predict(qda_fit, Weekly_test)$class
table(qda_class, Weekly_test$Direction)
##
## qda_class Down Up
## Down 0 0
## Up 43 61
correct_pred <- round((0+61)/104, 4)
Right_Up <- round(61/61, 4)
Right_Down <- round(0/43, 4)
error_rate <- 1-correct_pred
qda_details <- cbind(correct_pred, Right_Up, Right_Down, error_rate)
qda_details
## correct_pred Right_Up Right_Down error_rate
## [1,] 0.5865 1 0 0.4135
Repeat (d) using KNN with K = 1.
library(class)
train_X=Weekly[train,"Lag2",drop=F]
test_X=Weekly[!train,"Lag2",drop=F]
train_Direction=Weekly[train,"Direction",drop=T]
test_Direction=Weekly[!train,"Direction",drop=T]
set.seed(1)
knn_pred=knn(train_X,test_X,train_Direction,k=1)
table(knn_pred,test_Direction)
## test_Direction
## knn_pred Down Up
## Down 21 30
## Up 22 31
correct_pred <- round((21+31)/104, 4)
Right_Up <- round(31/61, 4)
Right_Down <- round(21/43, 4)
error_rate <- 1-correct_pred
knn_details <- cbind(correct_pred, Right_Up, Right_Down, error_rate)
knn_details
## correct_pred Right_Up Right_Down error_rate
## [1,] 0.5 0.5082 0.4884 0.5
mean(knn_pred==test_Direction)
## [1] 0.5
Which of these methods appears to provide the best results on this data?
log_details * 100
## correct_pred Right_Up Right_Down error_rate
## [1,] 62.5 91.8 20.93 37.5
lda_details * 100
## correct_pred Right_Up Right_Down error_rate
## [1,] 62.5 91.8 20.93 37.5
qda_details * 100
## correct_pred Right_Up Right_Down error_rate
## [1,] 58.65 100 0 41.35
knn_details * 100
## correct_pred Right_Up Right_Down error_rate
## [1,] 50 50.82 48.84 50
On looking at the results of each model, we shall conclude the following:
Experiment with different combinations of predictors, including possible transformations and interactions, for each of the methods. Report the variables, method, and associated confusion matrix that appears to provide the best results on the held out data. Note that you should also experiment with values for K in the KNN classifier.
set.seed(1)
knn_pred=knn(train_X, test_X,train_Direction,k=4)
table(knn_pred,test_Direction)
## test_Direction
## knn_pred Down Up
## Down 20 17
## Up 23 44
mean(knn_pred==test_Direction)
## [1] 0.6153846
knn_pred=knn(train_X, test_X,train_Direction,k=10)
table(knn_pred,test_Direction)
## test_Direction
## knn_pred Down Up
## Down 20 21
## Up 23 40
mean(knn_pred==test_Direction)
## [1] 0.5769231
knn_pred=knn(train_X, test_X,train_Direction,k=100)
table(knn_pred,test_Direction)
## test_Direction
## knn_pred Down Up
## Down 10 12
## Up 33 49
mean(knn_pred==test_Direction)
## [1] 0.5673077
qda_fit = qda(Direction ~ Lag1, data=Weekly, subset=train)
qda_fit
## Call:
## qda(Direction ~ Lag1, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1
## Down 0.289444444
## Up -0.009213235
qda_class=predict(qda_fit,Weekly_test)$class
table(qda_class,Weekly_test$Direction)
##
## qda_class Down Up
## Down 0 0
## Up 43 61
mean(qda_class==Weekly_test$Direction)
## [1] 0.5865385
qda_fit = qda(Direction ~ Lag1 + Lag2, data=Weekly, subset=train)
qda_fit
## Call:
## qda(Direction ~ Lag1 + Lag2, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2
## Down 0.289444444 -0.03568254
## Up -0.009213235 0.26036581
qda_class=predict(qda_fit,Weekly_test)$class
table(qda_class,Weekly_test$Direction)
##
## qda_class Down Up
## Down 7 10
## Up 36 51
mean(qda_class==Weekly_test$Direction)
## [1] 0.5576923
qda_fit = qda(Direction ~ Lag1 + Lag2 + Lag3, data=Weekly, subset=train)
qda_fit
## Call:
## qda(Direction ~ Lag1 + Lag2 + Lag3, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2 Lag3
## Down 0.289444444 -0.03568254 0.17080045
## Up -0.009213235 0.26036581 0.08404044
qda_class=predict(qda_fit,Weekly_test)$class
table(qda_class,Weekly_test$Direction)
##
## qda_class Down Up
## Down 6 10
## Up 37 51
mean(qda_class==Weekly_test$Direction)
## [1] 0.5480769
qda_fit = qda(Direction ~ Lag1 + Lag2 + Lag3 + Lag4, data=Weekly, subset=train)
qda_fit
## Call:
## qda(Direction ~ Lag1 + Lag2 + Lag3 + Lag4, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2 Lag3 Lag4
## Down 0.289444444 -0.03568254 0.17080045 0.15925624
## Up -0.009213235 0.26036581 0.08404044 0.09220956
qda_class=predict(qda_fit,Weekly_test)$class
table(qda_class,Weekly_test$Direction)
##
## qda_class Down Up
## Down 9 16
## Up 34 45
mean(qda_class==Weekly_test$Direction)
## [1] 0.5192308
lda_fit = lda(Direction ~ Lag1 + Lag2, data=Weekly, subset=train)
lda_fit
## Call:
## lda(Direction ~ Lag1 + Lag2, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2
## Down 0.289444444 -0.03568254
## Up -0.009213235 0.26036581
##
## Coefficients of linear discriminants:
## LD1
## Lag1 -0.3013148
## Lag2 0.2982579
lda_class = predict(lda_fit, Weekly_test)$class
table(lda_class, Weekly_test$Direction)
##
## lda_class Down Up
## Down 7 8
## Up 36 53
mean(lda_class==Weekly_test$Direction)
## [1] 0.5769231
lda_fit = lda(Direction ~ Lag1 + Lag2 + Volume, data=Weekly, subset=train)
lda_fit
## Call:
## lda(Direction ~ Lag1 + Lag2 + Volume, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2 Volume
## Down 0.289444444 -0.03568254 1.266966
## Up -0.009213235 0.26036581 1.156529
##
## Coefficients of linear discriminants:
## LD1
## Lag1 -0.2979204
## Lag2 0.2366224
## Volume -0.3545069
lda_class = predict(lda_fit, Weekly_test)$class
table(lda_class, Weekly_test$Direction)
##
## lda_class Down Up
## Down 27 33
## Up 16 28
mean(lda_class==Weekly_test$Direction)
## [1] 0.5288462
lda_fit = lda(Direction ~ Lag1 + Lag2 + Volume + Lag3, data=Weekly, subset=train)
lda_fit
## Call:
## lda(Direction ~ Lag1 + Lag2 + Volume + Lag3, data = Weekly, subset = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4477157 0.5522843
##
## Group means:
## Lag1 Lag2 Volume Lag3
## Down 0.289444444 -0.03568254 1.266966 0.17080045
## Up -0.009213235 0.26036581 1.156529 0.08404044
##
## Coefficients of linear discriminants:
## LD1
## Lag1 -0.29112134
## Lag2 0.22615149
## Volume -0.36829715
## Lag3 -0.07080671
lda_class = predict(lda_fit, Weekly_test)$class
table(lda_class, Weekly_test$Direction)
##
## lda_class Down Up
## Down 30 37
## Up 13 24
mean(lda_class==Weekly_test$Direction)
## [1] 0.5192308
train=Weekly$Year <= 2008
Weekly_test=Weekly[!train,]
logit_fit = glm(Direction ~ Lag1+Lag2, family=binomial, data=Weekly, subset=train)
glm_probs = predict(logit_fit, Weekly_test, type="response")
glm_pred = rep("Down", nrow(Weekly_test))
glm_pred[glm_probs > 0.50]="Up"
table(glm_pred, Weekly_test$Direction)
##
## glm_pred Down Up
## Down 7 8
## Up 36 53
mean(glm_pred==Weekly_test$Direction)
## [1] 0.5769231
logit_fit = glm(Direction ~ Lag1+Lag2+Volume, family=binomial, data=Weekly, subset=train)
glm_probs = predict(logit_fit, Weekly_test, type="response")
glm_pred = rep("Down", nrow(Weekly_test))
glm_pred[glm_probs > 0.50]="Up"
table(glm_pred, Weekly_test$Direction)
##
## glm_pred Down Up
## Down 27 33
## Up 16 28
mean(glm_pred==Weekly_test$Direction)
## [1] 0.5288462
logit_fit = glm(Direction ~ Lag1+Lag2+Volume+Lag3, family=binomial, data=Weekly, subset=train)
glm_probs = predict(logit_fit, Weekly_test, type="response")
glm_pred = rep("Down", nrow(Weekly_test))
glm_pred[glm_probs > 0.50]="Up"
table(glm_pred, Weekly_test$Direction)
##
## glm_pred Down Up
## Down 30 37
## Up 13 24
mean(glm_pred==Weekly_test$Direction)
## [1] 0.5192308