DI Lab Homework 7

Part (a): Numerical and Graphical Summaries

library(ISLR2)
summary(Weekly)

##       Year           Lag1               Lag2               Lag3         
##  Min.   :1990   Min.   :-18.1950   Min.   :-18.1950   Min.   :-18.1950  
##  1st Qu.:1995   1st Qu.: -1.1540   1st Qu.: -1.1540   1st Qu.: -1.1580  
##  Median :2000   Median :  0.2410   Median :  0.2410   Median :  0.2410  
##  Mean   :2000   Mean   :  0.1506   Mean   :  0.1511   Mean   :  0.1472  
##  3rd Qu.:2005   3rd Qu.:  1.4050   3rd Qu.:  1.4090   3rd Qu.:  1.4090  
##  Max.   :2010   Max.   : 12.0260   Max.   : 12.0260   Max.   : 12.0260  
##       Lag4               Lag5              Volume            Today         
##  Min.   :-18.1950   Min.   :-18.1950   Min.   :0.08747   Min.   :-18.1950  
##  1st Qu.: -1.1580   1st Qu.: -1.1660   1st Qu.:0.33202   1st Qu.: -1.1540  
##  Median :  0.2380   Median :  0.2340   Median :1.00268   Median :  0.2410  
##  Mean   :  0.1458   Mean   :  0.1399   Mean   :1.57462   Mean   :  0.1499  
##  3rd Qu.:  1.4090   3rd Qu.:  1.4050   3rd Qu.:2.05373   3rd Qu.:  1.4050  
##  Max.   : 12.0260   Max.   : 12.0260   Max.   :9.32821   Max.   : 12.0260  
##  Direction 
##  Down:484  
##  Up  :605  
##            
##            
##            
##

cor(Weekly[, -9])

##               Year         Lag1        Lag2        Lag3         Lag4
## Year    1.00000000 -0.032289274 -0.03339001 -0.03000649 -0.031127923
## Lag1   -0.03228927  1.000000000 -0.07485305  0.05863568 -0.071273876
## Lag2   -0.03339001 -0.074853051  1.00000000 -0.07572091  0.058381535
## Lag3   -0.03000649  0.058635682 -0.07572091  1.00000000 -0.075395865
## Lag4   -0.03112792 -0.071273876  0.05838153 -0.07539587  1.000000000
## Lag5   -0.03051910 -0.008183096 -0.07249948  0.06065717 -0.075675027
## Volume  0.84194162 -0.064951313 -0.08551314 -0.06928771 -0.061074617
## Today  -0.03245989 -0.075031842  0.05916672 -0.07124364 -0.007825873
##                Lag5      Volume        Today
## Year   -0.030519101  0.84194162 -0.032459894
## Lag1   -0.008183096 -0.06495131 -0.075031842
## Lag2   -0.072499482 -0.08551314  0.059166717
## Lag3    0.060657175 -0.06928771 -0.071243639
## Lag4   -0.075675027 -0.06107462 -0.007825873
## Lag5    1.000000000 -0.05851741  0.011012698
## Volume -0.058517414  1.00000000 -0.033077783
## Today   0.011012698 -0.03307778  1.000000000

pairs(Weekly)

The pattern: Increasing Volume over time (Year). The correlation between Year and Volume is 0.8419 which is highly positive.

Part (b): Logistic Regression with Direction

glm.fit <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, 
               data = Weekly, 
               family = binomial)
summary(glm.fit)

## 
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
##     Volume, family = binomial, data = Weekly)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.26686    0.08593   3.106   0.0019 **
## Lag1        -0.04127    0.02641  -1.563   0.1181   
## Lag2         0.05844    0.02686   2.175   0.0296 * 
## Lag3        -0.01606    0.02666  -0.602   0.5469   
## Lag4        -0.02779    0.02646  -1.050   0.2937   
## Lag5        -0.01447    0.02638  -0.549   0.5833   
## Volume      -0.02274    0.03690  -0.616   0.5377   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1496.2  on 1088  degrees of freedom
## Residual deviance: 1486.4  on 1082  degrees of freedom
## AIC: 1500.4
## 
## Number of Fisher Scoring iterations: 4

It appears to have Lag2 to be statistically significant. The p-val of Lag2 is 0.0296 and is lower than 0.05

Part (c): Confusion Matrix and Overall Fraction

glm.probs <- predict(glm.fit, type = "response")
glm.pred <- rep("Down", nrow(Weekly))
glm.pred[glm.probs > 0.5] <- "Up"
table(glm.pred, Weekly$Direction)

##         
## glm.pred Down  Up
##     Down   54  48
##     Up    430 557

mean(glm.pred == Weekly$Direction)

## [1] 0.5610652

When the market actually goes DOWN: Out of the 484 weeks where the market went down, the model incorrectly predicted it would go “Up” 430 times. This is a Type I Error if we consider “Up” our positive class. It has a massive error rate of roughly 89% (430/484) on down weeks.

When the market actually goes UP: Out of the 605 weeks where the market went up, the model only incorrectly predicted “Down” 48 times. This is a Type II Error. It only has an error rate of about 8% (48/605) on up weeks.

Part (d): Training data period from 1990 to 2008

train <- (Weekly$Year < 2009)
Weekly.test <- Weekly[!train, ]
Direction.test <- Weekly$Direction[!train]
glm.fit2 <- glm(Direction ~ Lag2, 
                data = Weekly, 
                family = binomial, 
                subset = train)
glm.probs2 <- predict(glm.fit2, Weekly.test, type = "response")
glm.pred2 <- rep("Down", length(glm.probs2))
glm.pred2[glm.probs2 > 0.5] <- "Up"
table(glm.pred2, Direction.test)

##          Direction.test
## glm.pred2 Down Up
##      Down    9  5
##      Up     34 56

mean(glm.pred2 == Direction.test)

## [1] 0.625

Part (e): LDA

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:ISLR2':
## 
##     Boston

lda.fit <- lda(Direction ~ Lag2, 
               data = Weekly, 
               subset = train)
lda.pred <- predict(lda.fit, Weekly.test)
lda.class <- lda.pred$class
table(lda.class, Direction.test)

##          Direction.test
## lda.class Down Up
##      Down    9  5
##      Up     34 56

mean(lda.class == Direction.test)

## [1] 0.625

Part (f): QDA

qda.fit <- qda(Direction ~ Lag2, data = Weekly, subset = train)
qda.class <- predict(qda.fit, Weekly.test)$class
table(qda.class, Direction.test)

##          Direction.test
## qda.class Down Up
##      Down    0  0
##      Up     43 61

mean(qda.class == Direction.test)

## [1] 0.5865385

Part (g): KNN with K = 1

library(class)
train.X <- as.matrix(Weekly$Lag2[train])
test.X <- as.matrix(Weekly$Lag2[!train])
train.Direction <- Weekly$Direction[train]
set.seed(1)
knn.pred <- knn(train.X, test.X, train.Direction, k = 1)
table(knn.pred, Direction.test)

##         Direction.test
## knn.pred Down Up
##     Down   21 30
##     Up     22 31

mean(knn.pred == Direction.test)

## [1] 0.5

Part (h): Naive Bayes

library(e1071)

## Warning: package 'e1071' was built under R version 4.4.3

nb.fit <- naiveBayes(Direction ~ Lag2, data = Weekly, subset = train)
nb.class <- predict(nb.fit, Weekly.test)
table(nb.class, Direction.test)

##         Direction.test
## nb.class Down Up
##     Down    0  0
##     Up     43 61

mean(nb.class == Direction.test)

## [1] 0.5865385

Part (i): The best method
The best method might be Logistic Regression and LDA with 62.5% of accuracy.

DI Lab Homework 7

Duc Vinh Hoang

2026-04-15