This analysis aims to forecast daily direction of the FTSE100 index (Up/Down) using 5 years of historical data and some technical indicators.
Load required packages
list.packages<-c("quantmod","corrplot", "corrgram","TTR","caret","caTools","ROCR","ROSE","pROC", "MASS", "glmnet")
invisible(lapply(list.packages, require, character.only = TRUE))Download FTSE100 data using quantmod package and clean missing values
## [1] "FTSE"
## [1] 6
Extract open, close, high, low and volume of prices
FTSE <- data.frame(
date = index(FTSE),
open = as.numeric(Op(FTSE)),
close = as.numeric(Cl(FTSE)),
high = as.numeric(Hi(FTSE)),
low = as.numeric(Lo(FTSE)),
volume = as.numeric(Vo(FTSE)))
head(FTSE)## date open close high low volume
## 1 2020-04-24 5826.6 5752.2 5827.2 5728.8 808821600
## 2 2020-04-27 5752.2 5846.8 5855.6 5752.2 792246400
## 3 2020-04-28 5846.8 5958.5 5977.8 5835.9 1196850700
## 4 2020-04-29 5958.5 6115.3 6129.6 5950.8 1288097400
## 5 2020-04-30 6115.3 5901.2 6151.6 5897.9 1933348900
## 6 2020-05-01 5901.2 5763.1 5901.2 5746.1 781328900
Calculate log returns
## date open close high low volume return
## 1 2020-04-24 5826.6 5752.2 5827.2 5728.8 808821600 NA
## 2 2020-04-27 5752.2 5846.8 5855.6 5752.2 792246400 0.01631205
## 3 2020-04-28 5846.8 5958.5 5977.8 5835.9 1196850700 0.01892430
## 4 2020-04-29 5958.5 6115.3 6129.6 5950.8 1288097400 0.02597502
## 5 2020-04-30 6115.3 5901.2 6151.6 5897.9 1933348900 -0.03563804
## 6 2020-05-01 5901.2 5763.1 5901.2 5746.1 781328900 -0.02368021
Calculating lag returns and other indicators
Calculate Lag returns (1 to 5) and other technical indicators such as rolling standard deviation, rolling mean, 10 days moving average, 20 days moving average, 14 days relative strength index and 21 days relative strength index as predictors.
Lagged returns
FTSE$lag1 <-Lag(FTSE$return, k = 1)
FTSE$lag2 <- Lag(FTSE$return, k = 2)
FTSE$lag3 <- Lag(FTSE$return, k = 3)
FTSE$lag4 <- Lag(FTSE$return, k = 4)
FTSE$lag5 <- Lag(FTSE$return, k = 5)Rolling Standard deviation and mean of closing prices
FTSE$roll_sd <- rollapply(FTSE$close, width = 10, FUN = sd,fill = NA,align = "right")
FTSE$roll_mean <- rollapply(FTSE$close, width = 10, FUN = mean,fill = NA,align = "right")10-day and 20-day simple moving averages
14 and 21 day relative strength index
## date open close high low volume return Lag.1
## 1 2020-04-24 5826.6 5752.2 5827.2 5728.8 808821600 NA NA
## 2 2020-04-27 5752.2 5846.8 5855.6 5752.2 792246400 0.01631205 NA
## 3 2020-04-28 5846.8 5958.5 5977.8 5835.9 1196850700 0.01892430 0.01631205
## 4 2020-04-29 5958.5 6115.3 6129.6 5950.8 1288097400 0.02597502 0.01892430
## 5 2020-04-30 6115.3 5901.2 6151.6 5897.9 1933348900 -0.03563804 0.02597502
## 6 2020-05-01 5901.2 5763.1 5901.2 5746.1 781328900 -0.02368021 -0.03563804
## Lag.2 Lag.3 Lag.4 Lag.5 roll_sd roll_mean ma10 ma20 rsi14
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 0.01631205 NA NA NA NA NA NA NA NA
## 5 0.01892430 0.01631205 NA NA NA NA NA NA NA
## 6 0.02597502 0.01892430 0.01631205 NA NA NA NA NA NA
## rsi21
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
Remove NA values
Creating direction into binary target variable
Create direction variables based on the returns of FTSE100. The direction variable indicates whether the stock went up or down a given day. This is the dependent variable we want to predict using lagged returns and other indicators. “Up” is assigned to positive returns and “Down” is assigned to negative returns.
## date open close high low volume return Lag.1
## 22 2020-05-27 6067.8 6144.3 6168.0 6067.8 1559401400 0.012528722 0.012353923
## 23 2020-05-28 6144.3 6218.8 6234.0 6144.3 1210102000 0.012052140 0.012528722
## 24 2020-05-29 6218.8 6076.6 6218.8 6060.2 2246313300 -0.023131586 0.012052140
## 25 2020-06-01 6076.6 6166.4 6179.3 6076.6 742112200 0.014669838 -0.023131586
## 26 2020-06-02 6166.4 6220.1 6238.3 6165.1 1192259900 0.008670816 0.014669838
## 27 2020-06-03 6220.1 6382.4 6394.7 6220.1 1232836600 0.025758187 0.008670816
## Lag.2 Lag.3 Lag.4 Lag.5 roll_sd roll_mean
## 22 -0.003664045 -0.008591055 0.010771145 -0.007700755 126.49935 5978.41
## 23 0.012353923 -0.003664045 -0.008591055 0.010771145 143.90602 6009.88
## 24 0.012528722 0.012353923 -0.003664045 -0.008591055 109.32880 6043.39
## 25 0.012052140 0.012528722 0.012353923 -0.003664045 74.48401 6080.05
## 26 -0.023131586 0.012052140 0.012528722 0.012353923 85.38445 6097.20
## 27 0.014669838 -0.023131586 0.012052140 0.012528722 117.12928 6135.22
## ma10 ma20 rsi14 rsi21 direction
## 22 5978.41 5942.485 61.98998 60.52730 Up
## 23 6009.88 5955.500 64.36032 62.11849 Up
## 24 6043.39 5953.565 57.04761 57.47507 Down
## 25 6080.05 5966.825 60.12855 59.48332 Up
## 26 6097.20 5989.675 61.88904 60.65014 Up
## 27 6135.22 6021.105 66.67786 63.94523 Up
Create a sub data of required indicators
## date volume Lag.1 Lag.2 Lag.3 Lag.4
## 22 2020-05-27 1559401400 0.012353923 -0.003664045 -0.008591055 0.010771145
## 23 2020-05-28 1210102000 0.012528722 0.012353923 -0.003664045 -0.008591055
## 24 2020-05-29 2246313300 0.012052140 0.012528722 0.012353923 -0.003664045
## 25 2020-06-01 742112200 -0.023131586 0.012052140 0.012528722 0.012353923
## 26 2020-06-02 1192259900 0.014669838 -0.023131586 0.012052140 0.012528722
## 27 2020-06-03 1232836600 0.008670816 0.014669838 -0.023131586 0.012052140
## Lag.5 roll_sd roll_mean ma10 ma20 rsi14 rsi21
## 22 -0.007700755 126.49935 5978.41 5978.41 5942.485 61.98998 60.52730
## 23 0.010771145 143.90602 6009.88 6009.88 5955.500 64.36032 62.11849
## 24 -0.008591055 109.32880 6043.39 6043.39 5953.565 57.04761 57.47507
## 25 -0.003664045 74.48401 6080.05 6080.05 5966.825 60.12855 59.48332
## 26 0.012353923 85.38445 6097.20 6097.20 5989.675 61.88904 60.65014
## 27 0.012528722 117.12928 6135.22 6135.22 6021.105 66.67786 63.94523
## direction
## 22 Up
## 23 Up
## 24 Down
## 25 Up
## 26 Up
## 27 Up
Check structure of the data
## 'data.frame': 1240 obs. of 14 variables:
## $ date : Date, format: "2020-05-27" "2020-05-28" ...
## $ volume : num 1.56e+09 1.21e+09 2.25e+09 7.42e+08 1.19e+09 ...
## $ lag1 : num [1:1240, 1] 0.0124 0.0125 0.0121 -0.0231 0.0147 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "Lag.1"
## $ lag2 : num [1:1240, 1] -0.00366 0.01235 0.01253 0.01205 -0.02313 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "Lag.2"
## $ lag3 : num [1:1240, 1] -0.00859 -0.00366 0.01235 0.01253 0.01205 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "Lag.3"
## $ lag4 : num [1:1240, 1] 0.01077 -0.00859 -0.00366 0.01235 0.01253 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "Lag.4"
## $ lag5 : num [1:1240, 1] -0.0077 0.01077 -0.00859 -0.00366 0.01235 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "Lag.5"
## $ roll_sd : num 126.5 143.9 109.3 74.5 85.4 ...
## $ roll_mean: num 5978 6010 6043 6080 6097 ...
## $ ma10 : num 5978 6010 6043 6080 6097 ...
## $ ma20 : num 5942 5955 5954 5967 5990 ...
## $ rsi14 : num 62 64.4 57 60.1 61.9 ...
## $ rsi21 : num 60.5 62.1 57.5 59.5 60.7 ...
## $ direction: chr "Up" "Up" "Down" "Up" ...
convert character data into a factor
Checking for Multicollinearity
The relationship between the predictors helps in understanding how the predictors behave with respect to each other and helps in improving the regression model. Correlation analysis was performed to measure the strength of association between the predictors. Highly correlated predictors can lead to multicollinearity, which reduces model reliability. In one of the models, only one variable from each group of highly correlated predictors was included in the model.
cor.data<-cor(FTSE_sub[,2:12])
corrplot(cor.data, method = "color", type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45, addCoef.col = "black", diag = FALSE)Split the data into training (before 2025) and testing (2025) sets
The data was split into train and test. Data before 2025 was used as the training data while 2025 data was used as the test data. Dependent and Independent variables were then created out of the train and test data
Create separate dependent and independent variables of the split data
Three models were explored. The first model included predictors selected based on correlation analysis. The second model incorporated all predictors, while the third model used stepwise regression to identify statistically relevant predictors. The statistical significance of each predictor was assessed based on p-values, using a 5% significance level. Predictors with p-values below 0.05 were considered statistically significant, while those with p-values above 0.05 were not.
set.seed(100)
#LRM with predictors selected from correlation plot
lrm.corr <- glm(train_y ~lag1+lag2+lag3+lag4+lag5+volume+roll_sd+ma10+rsi14, data = train_x,maxit = 1000, family = "binomial")
#summary of the estimation
summary(lrm.corr)##
## Call:
## glm(formula = train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + volume +
## roll_sd + ma10 + rsi14, family = "binomial", data = train_x,
## maxit = 1000)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.598e+00 9.587e-01 -5.839 5.24e-09 ***
## lag1 -6.760e+01 8.730e+00 -7.743 9.68e-15 ***
## lag2 -5.397e+01 8.433e+00 -6.400 1.56e-10 ***
## lag3 -5.774e+01 8.330e+00 -6.932 4.16e-12 ***
## lag4 -2.223e+01 7.875e+00 -2.823 0.00475 **
## lag5 -3.101e+01 7.814e+00 -3.969 7.23e-05 ***
## volume -4.742e-10 2.322e-10 -2.042 0.04111 *
## roll_sd 1.031e-02 2.024e-03 5.094 3.51e-07 ***
## ma10 -8.395e-05 1.076e-04 -0.780 0.43533
## rsi14 1.158e-01 9.495e-03 12.196 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1604.1 on 1160 degrees of freedom
## Residual deviance: 1390.0 on 1151 degrees of freedom
## AIC: 1410
##
## Number of Fisher Scoring iterations: 4
#calculate fitted probabilities
prob_lrm.corr <- predict(lrm.corr,newdata=test_x,type='response')
pred_lrm.corr <- ifelse(prob_lrm.corr > 0.5,"Up","Down")
#Calculate the accuracy of the model
error_lrm.corr <- mean(pred_lrm.corr != test_y)
accuracy_lrm.corr <- (1-error_lrm.corr)
accuracy_lrm.corr## [1] 0.6582278
##
## test_y FALSE TRUE
## Down 17 17
## Up 10 35
From the first model, all the independent variables are significant, with the exception of 10 day moving average. The accuracy of the model is 65.82%, which is considered acceptable for predicting stock price movements. The confusion matrix breaks down the accuracy of the model. From the confusion matrix, the true positive is 35, suggesting the model correctly predicted 35 instances where the market went up. The true negative, 17, suggests that the model correctly predicted 17 instances where the market went down. The false positive and false negative of 17 and 10 respectively suggests that, the model incorrectly predicted 17 instances as “Up”, when they were actually “Down”, as well as incorrectly predicted 10 instances as “Down”, when the were actually “Up”. In summary, the model correctly predicted 52 (17+35) instances correctly and incorrectly predicted 27 (17+10) instances, giving an accuracy of 65.82%
#LRM with full predictors
lrm.full <- glm(train_y ~lag1+lag2+lag3+lag4+lag5+volume+roll_sd+roll_mean+ma10+ma20+rsi14+rsi21, data = train_x,maxit = 1000, family = "binomial")
#summary of the estimation
summary(lrm.full)##
## Call:
## glm(formula = train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + volume +
## roll_sd + roll_mean + ma10 + ma20 + rsi14 + rsi21, family = "binomial",
## data = train_x, maxit = 1000)
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.079e+00 1.379e+00 -5.860 4.62e-09 ***
## lag1 -1.783e+02 1.387e+01 -12.851 < 2e-16 ***
## lag2 -1.450e+02 1.244e+01 -11.657 < 2e-16 ***
## lag3 -1.287e+02 1.179e+01 -10.908 < 2e-16 ***
## lag4 -6.444e+01 1.012e+01 -6.369 1.90e-10 ***
## lag5 -6.077e+01 9.466e+00 -6.419 1.37e-10 ***
## volume -2.907e-11 2.756e-10 -0.106 0.91598
## roll_sd 7.848e-03 2.491e-03 3.150 0.00163 **
## roll_mean -2.695e-02 2.156e-03 -12.500 < 2e-16 ***
## ma10 NA NA NA NA
## ma20 2.676e-02 2.144e-03 12.484 < 2e-16 ***
## rsi14 8.013e-01 5.771e-02 13.886 < 2e-16 ***
## rsi21 -6.214e-01 6.016e-02 -10.330 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1604.1 on 1160 degrees of freedom
## Residual deviance: 1051.7 on 1149 degrees of freedom
## AIC: 1075.7
##
## Number of Fisher Scoring iterations: 5
#calculate fitted probabilities
prob_lrm.full <- predict(lrm.full,newdata=test_x,type='response')
pred_lrm.full <- ifelse(prob_lrm.full > 0.5,"Up","Down")
#Calculate the accuracy of the model
error_lrm.full <- mean(pred_lrm.full != test_y)
accuracy_lrm.full <- (1-error_lrm.full)
accuracy_lrm.full## [1] 0.7848101
##
## test_y FALSE TRUE
## Down 25 9
## Up 8 37
From the second model, all the predictors have p-value below 0.05, hence significant, except for volume and ma10, which has a NA value. This means that it was perfectly correlated with one other predictor, the rolling mean as indicated by the correlation plot. The accuracy of the model is 78.48%, which is considered acceptable for predicting stock price movements. The confusion matrix breaks down the accuracy of the model. From the confusion matrix, the true positive is 37, suggesting the model correctly predicted 37 instances where the market went up. The true negative, 25, suggests that the model correctly predicted 25 instances where the market went down. The false positive and false negative of 9 and 8 respectively suggests that, the model incorrectly predicted 9 instances as “Up”, when they were actually “Down”, as well as incorrectly predicted 8 instances as “Down”, when they were actually “Up”. In summary, the model correctly predicted 62 (37+25) instances correctly and incorrectly predicted 17 (9+8) instances, giving an accuracy of 78.48%
Stepwise Regression
#Stepwise regression to Identify important variables
model_step <- step(lrm.full, direction = "both")## Start: AIC=1075.74
## train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + volume + roll_sd +
## roll_mean + ma10 + ma20 + rsi14 + rsi21
##
##
## Step: AIC=1075.74
## train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + volume + roll_sd +
## roll_mean + ma20 + rsi14 + rsi21
##
## Df Deviance AIC
## - volume 1 1051.8 1073.8
## <none> 1051.7 1075.7
## - roll_sd 1 1061.7 1083.7
## - lag5 1 1095.2 1117.2
## - lag4 1 1095.8 1117.8
## - rsi21 1 1185.0 1207.0
## - lag3 1 1205.0 1227.0
## - lag2 1 1234.0 1256.0
## - ma20 1 1273.3 1295.3
## - roll_mean 1 1273.7 1295.7
## - lag1 1 1297.9 1319.9
## - rsi14 1 1350.7 1372.7
##
## Step: AIC=1073.75
## train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + roll_sd + roll_mean +
## ma20 + rsi14 + rsi21
##
## Df Deviance AIC
## <none> 1051.8 1073.8
## + volume 1 1051.7 1075.7
## - roll_sd 1 1062.1 1082.1
## - lag5 1 1095.2 1115.2
## - lag4 1 1095.8 1115.8
## - rsi21 1 1186.8 1206.8
## - lag3 1 1205.2 1225.2
## - lag2 1 1234.0 1254.0
## - ma20 1 1274.0 1294.0
## - roll_mean 1 1274.6 1294.6
## - lag1 1 1297.9 1317.9
## - rsi14 1 1354.8 1374.8
##
## Call:
## glm(formula = train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + roll_sd +
## roll_mean + ma20 + rsi14 + rsi21, family = "binomial", data = train_x,
## maxit = 1000)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.080e+00 1.379e+00 -5.861 4.59e-09 ***
## lag1 -1.783e+02 1.387e+01 -12.856 < 2e-16 ***
## lag2 -1.450e+02 1.244e+01 -11.657 < 2e-16 ***
## lag3 -1.287e+02 1.179e+01 -10.913 < 2e-16 ***
## lag4 -6.446e+01 1.012e+01 -6.369 1.91e-10 ***
## lag5 -6.076e+01 9.468e+00 -6.418 1.38e-10 ***
## roll_sd 7.788e-03 2.425e-03 3.211 0.00132 **
## roll_mean -2.696e-02 2.154e-03 -12.516 < 2e-16 ***
## ma20 2.677e-02 2.142e-03 12.496 < 2e-16 ***
## rsi14 8.018e-01 5.752e-02 13.940 < 2e-16 ***
## rsi21 -6.220e-01 5.989e-02 -10.385 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1604.1 on 1160 degrees of freedom
## Residual deviance: 1051.7 on 1150 degrees of freedom
## AIC: 1073.7
##
## Number of Fisher Scoring iterations: 5
Model 3
#LRM with predictors from stepwise regression
lrm.step <- glm(train_y ~lag1+lag2+lag3+lag4+lag5+roll_sd+roll_mean+ma20+rsi14+rsi21, data = train_x,maxit = 1000, family = "binomial")
#summary of the estimation
summary(lrm.step)##
## Call:
## glm(formula = train_y ~ lag1 + lag2 + lag3 + lag4 + lag5 + roll_sd +
## roll_mean + ma20 + rsi14 + rsi21, family = "binomial", data = train_x,
## maxit = 1000)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.080e+00 1.379e+00 -5.861 4.59e-09 ***
## lag1 -1.783e+02 1.387e+01 -12.856 < 2e-16 ***
## lag2 -1.450e+02 1.244e+01 -11.657 < 2e-16 ***
## lag3 -1.287e+02 1.179e+01 -10.913 < 2e-16 ***
## lag4 -6.446e+01 1.012e+01 -6.369 1.91e-10 ***
## lag5 -6.076e+01 9.468e+00 -6.418 1.38e-10 ***
## roll_sd 7.788e-03 2.425e-03 3.211 0.00132 **
## roll_mean -2.696e-02 2.154e-03 -12.516 < 2e-16 ***
## ma20 2.677e-02 2.142e-03 12.496 < 2e-16 ***
## rsi14 8.018e-01 5.752e-02 13.940 < 2e-16 ***
## rsi21 -6.220e-01 5.989e-02 -10.385 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1604.1 on 1160 degrees of freedom
## Residual deviance: 1051.7 on 1150 degrees of freedom
## AIC: 1073.7
##
## Number of Fisher Scoring iterations: 5
#calculate fitted probabilities
prob_lrm.step <- predict(lrm.step,newdata=test_x,type='response')
pred_lrm.step <- ifelse(prob_lrm.step > 0.5,"Up","Down")
#Calculate the accuracy of the model
error_lrm.step <- mean(pred_lrm.step != test_y)
accuracy_lrm.step <- (1-error_lrm.step)
accuracy_lrm.step## [1] 0.7848101
##
## test_y FALSE TRUE
## Down 25 9
## Up 8 37
From the third model, the stepwise regression was used to determine relevant predictors. The relevant predictors were lag1 - lag5, rolling standard deviation, rolling mean, 20 day moving average, 21 and 14 day relative strength index. The accuracy of the model is 78.48%, which is considered acceptable for predicting stock price movements. The confusion matrix breaks down the accuracy of the model. From the confusion matrix, the true positive is 37, suggesting the model correctly predicted 37 instances where the market went up. The true negative, 25, suggests that the model correctly predicted 25 instances where the market went down. The false positive and false negative of 9 and 8 respectively suggests that, the model incorrectly predicted 9 instances as “Up”, when they were actually “Down”, as well as incorrectly predicted 8 instances as “Down”, when they were actually “Up”. In summary, the model correctly predicted 62 (37+25) instances correctly and incorrectly predicted 17 (8+9) instances, giving an accuracy of 78.48%
#Accuracy comparison
accuracy_comparison <- data.frame(
Model = c("lrm.corr", "lrm.full", "lrm.step"),
Accuracy = c(accuracy_lrm.corr, accuracy_lrm.full, accuracy_lrm.step)
)
print(accuracy_comparison)## Model Accuracy
## 1 lrm.corr 0.6582278
## 2 lrm.full 0.7848101
## 3 lrm.step 0.7848101
#ROC Curve for all the models
par(cex.main = 2, cex.axis = 1.5, cex.lab = 1.5)
roc.curve(test_y, prob_lrm.corr,
main = "ROC curve of the 3 models", col = 1, lwd = 2, lty = 1)## Area under the curve (AUC): 0.686
## Area under the curve (AUC): 0.867
## Area under the curve (AUC): 0.866
legend("bottomright", c("LR with Corr. Predictors", "LR with full Predictors", "LR with stepwise regression Predictors"),
col=1:3, lty=1:3, lwd=2)
From the accuracy table of the 3 models, the
second and third models had the highest accuracy with 78.48%, indicating
that both correctly predicted 78.48% of the test data. From the ROC
plot, Area Under the Curve (AUC) of the first model is 69%, while the
second model is an AUC of 87% and the third model has AUC of 87%. This
indicates that, the second and third model is able to better distinguish
between Upward movement and downward movement as shown by their higher
AUC.