Assignment 2
library(dplyr)
Mdl_dat <- Weekly |> dplyr::select(Direction, starts_with("Lag"), Volume)
Mdl_dat$Direction <- factor(Mdl_dat$Direction)
log_mdl <- glm(Direction~., family = binomial(),data=Mdl_dat);log_mdl
##
## Call: glm(formula = Direction ~ ., family = binomial(), data = Mdl_dat)
##
## Coefficients:
## (Intercept) Lag1 Lag2 Lag3 Lag4 Lag5
## 0.26686 -0.04127 0.05844 -0.01606 -0.02779 -0.01447
## Volume
## -0.02274
##
## Degrees of Freedom: 1088 Total (i.e. Null); 1082 Residual
## Null Deviance: 1496
## Residual Deviance: 1486 AIC: 1500
## Actual
## Predicted Down Up
## Down 54 48
## Up 430 557
## Actual
## Predicted Down Up
## Down 4.96 4.41
## Up 39.49 51.15
## [1] 56.11
Interpretation : We make the right guess about 60% of the time
In order to answer this we must compare the relative frequency of UP guesses inherently within our sample to our modles Precision Rate as Precision is the strength of our right guesses when our model guesses yes.
precision > tbl$freq[2]
## [1] TRUE
recall > tbl$freq[2]
## [1] TRUE
Therefore : Model identifies positives better than chance & Model captures actual positives better than random.
The threshold is when our predicitons are better than random guessing :
## Warning: Removed 22 rows containing missing values or values outside the scale range
## (`geom_line()`).
The model captures most “Up” weeks at low thresholds (high recall), but recall drops sharply as the threshold increases, showing reduced sensitivity. Precision stays stable, then rises—indicating more confident predictions—before dropping at high thresholds due to overconfident errors.
##
## Call: glm(formula = Direction ~ Lag2, family = binomial(), data = Up_to_2008)
##
## Coefficients:
## (Intercept) Lag2
## 0.2033 0.0581
##
## Degrees of Freedom: 984 Total (i.e. Null); 983 Residual
## Null Deviance: 1355
## Residual Deviance: 1351 AIC: 1355
Confusion Matrix :
## Actual
## Predicted 0 1
## Down 23 20
## Up 418 524
## Actual
## Predicted 0 1
## Down 2.34 2.03
## Up 42.44 53.20
Separate Metrics
## [1] "Accuracy"
## [1] 55.54
## [1] "precision"
## [1] 0.5562526
## [1] "recall"
## [1] 0.9632446
## [1] "F1"
## [1] 0.7052429
Did it work?
## [1] "Does the model uniformly beat random guessing in terms of these performance metrics?"
precision > tbl$freq[2]
## [1] FALSE
recall > tbl$freq[2]
## [1] TRUE