New Stocks Lags Counts Groups

This script shows how to grab thousands of available stocks, then select one stock from the set available, and count the number of days in a set time period, if available, that the stock increases and separately decreases based on the stock value a set number of days (lag value) earlier. It then uses that lag value to predict whether the stock will increase by 2% the next day and separately in 30 days. The accuracy could be improved by tuning, but with these default parameters the accuracy was able to predict between 80-95% accuracy the next day, and 70-91% in 30 days whether the stock would increase by two percent based on the time interval specified. To specify the time period, search for this mark ‘%^&%’ in this file and set the dates, lag, lead (amount of days in the future to predict based on the lag set price earlier), and the stock from the list. Kaggle has a large csv file originally made for this script, but the smaller files are at github. Change the read (search for mark ’%^&*’ in of the csv after downloading from Kaggle allStocksGathered1.csv to the file name. Or get the smaller files (yahooStockBasket.csv and StockSwingTradeBotCom.csv) at the github address and run the first part of this script (about 20 minutes more or less).

library(lubridate)
library(dplyr)
library(quantmod)
library(PerformanceAnalytics)
library(tidyr)
library(DT)
library(stringr)
library(e1071)
library(caret)
library(randomForest)
library(MASS)
library(gbm)

setDefaults(getSymbols, src='yahoo')

Enter a date and the number of days to go back for the stock value as the lag value.

date="1999-01-01"
lag = 5
today = today()

lagN = paste('lag',lag,sep='')

newStocks <- read.csv('StockSwingTradeBotCom.csv', header=TRUE, na.strings=c('',' ','NA'),
                      sep=',')

head(newStocks)

##   stockSymbol            stockName dailyClose volatility  avgVolume
## 1           A Agilent Technolog...      65.76     34.988  2,515,221
## 2          AA           Alcoa Inc.       5.83    121.188  7,431,897
## 3         AAC   AAC Holdings, Inc.       0.48     99.106    279,165
## 4        AACG ATA Creativity Gl...       0.74     45.638     25,584
## 5         AAL American Airlines...      10.29     97.428 24,309,238
## 6        AAMC Altisource Asset ...      11.12     91.945      7,226

newStocks <- newStocks[,-c(4:5)]
head(newStocks)

##   stockSymbol            stockName dailyClose
## 1           A Agilent Technolog...      65.76
## 2          AA           Alcoa Inc.       5.83
## 3         AAC   AAC Holdings, Inc.       0.48
## 4        AACG ATA Creativity Gl...       0.74
## 5         AAL American Airlines...      10.29
## 6        AAMC Altisource Asset ...      11.12

tail(newStocks)

##      stockSymbol            stockName dailyClose
## 5525        ZUMZ          Zumiez Inc.      19.28
## 5526         ZUO          Zuora, Inc.       8.40
## 5527         ZVO           Zovio Inc.       1.46
## 5528        ZYME       Zymeworks Inc.      26.31
## 5529        ZYNE Zynerba Pharmaceu...       3.10
## 5530        ZYXI         Zynex, Inc.       10.81

Lets look at the stock symbols or tickers to pull up information on.

dim(newStocks)

## [1] 5530    3

newStocks$stockSymbol[1:25]

##  [1] A    AA   AAC  AACG AAL  AAMC AAME AAN  AAOI AAON AAP  AAPL AAT  AAU  AAWW
## [16] AAXN AB   ABB  ABBV ABC  ABCB ABDC ABEO ABEV ABG 
## 5530 Levels: A AA AAC AACG AAL AAMC AAME AAN AAOI AAON AAP AAPL AAT AAU ... ZYXI

newStocks$stockSymbol[5505:5530]

##  [1] ZGNX  ZGYHU ZION  ZIONL ZIONN ZIONO ZIONP ZIOP  ZIXI  ZKIN  ZLAB  ZM   
## [13] ZN    ZNGA  ZNH   ZOM   ZS    ZSAN  ZTO   ZTS   ZUMZ  ZUO   ZVO   ZYME 
## [25] ZYNE  ZYXI 
## 5530 Levels: A AA AAC AACG AAL AAMC AAME AAN AAOI AAON AAP AAPL AAT AAU ... ZYXI

These are the stocks that we had built the previous lags, group counts, and other stats for of 65 total.

stockNames <- read.csv('yahooStockBasket.csv', header=TRUE, sep=',', 
                       na.strings=c('',' ','NA'))

Get a list of those names of stocks that have the period in the names, recalling that SCE.PB was changed in this 65 stock list, stockNames, to SCE-PB in a previous script.

a1 <- grep('[.]', stockNames$stock)
a2 <- grep('[.]', newStocks$stockSymbol)

stockNames[a1,2]

## factor(0)
## 65 Levels: AAL AAP ADDYY AMC AMZN ARWR ASCCY C COST CSSEP CVX DLTR EPD ... YELP

There aren’t any periods in the stock names of the 65 stock list. List those stock names that are separated with a period in the 5530 stock list.

newStocks[a2,1]

##  [1] AGM.A  AKO.A  AKO.B  BF.A   BF.B   BH.A   BIO.B  BRK.A  BRK.B  BWL.A 
## [11] CBS.A  CCH.U  CCX.U  CIG.C  CRD.A  CRD.B  CWEN.A EBR.B  GEF.B  GIG.U 
## [21] GRP.U  GTN.A  HEI.A  HVT.A  JIH.U  JIH.W  JW.A   JW.B   LEN.B  LGF.A 
## [31] LGF.B  LHC.U  MKC.V  MOG.A  MOG.B  OAC.U  OIBR.C PBR.A  PIC.U  RDS.A 
## [41] RDS.B  RMG.U  SBE.U  STZ.B  TAP.A  TPGH.U WSO.B 
## 5530 Levels: A AA AAC AACG AAL AAMC AAME AAN AAOI AAON AAP AAPL AAT AAU ... ZYXI

See if those stocks we hand picked and analyzed previously are in this larger set.

stocksBothSets <- merge(stockNames, newStocks, by.x='stock', by.y='stockSymbol')

This combines all the 65 stocks and fills in as NAs for the missing stocks not in the 5530 stock set.

stocksNotBothSets <- merge(stockNames, stocksBothSets, by.x='stock',by.y='stock',
                           all.x=TRUE)

The stocks not in the 65 stock set and the 5530 other stocks.

stocksNotBoth <- stocksNotBothSets[is.na(stocksNotBothSets$stockInfo.y),]
stocksNotBoth$stock

## [1] ADDYY  ASCCY  GNBT   NSANY  S      SCE-PB
## 65 Levels: AAL AAP ADDYY AMC AMZN ARWR ASCCY C COST CSSEP CVX DLTR EPD ... YELP

Lets add these stocks to the table, but replace the ‘-’ with ‘.’ in SCE-PB to the newStocks table.

large <- newStocks[,1:2]
small <- stockNames[,c(2,1)]

colnames(large) <- c('stock','company')
colnames(small) <- c('stock','company')

all <- rbind(large,small)
all$stock <- gsub('-','.',all$stock)

allStockTicks <- as.character(all$stock)
length(allStockTicks)

## [1] 5595

There are duplicates in the stocks that need to be removed. This set is the original 5530 plus the six stocks that were not in the set originally but were in the 65 stock sets analyzed already.

allStockTicks1 <- allStockTicks[!duplicated(allStockTicks)]
length(allStockTicks1)

## [1] 5536

These are the stock tickers that aren’t in the list of 5595 stock tickers. They produced errors when ran in a previous script for getSymbols(src=‘yahoo’).

notFound <- as.data.frame(c("AAC","ACIW","AFGB","AGBAR","AGM.A","AKO.A","AKO.B",                 "ALGRR","ANDAR","AREX","AVDR","BF.A","BF.B","BH.A","BIO.B",
              "BRK.A","BRK.B","BROGR","BWL.A","BXRXV","CBS.A","CCH.U","CCX.U",
              "CIG.C","CJ","CMFNL","CRD.A","CRD.B","CTRP","CUR","CVRS",
              "CWEN.A","DEST","DF","DFBHU","DFPHU","DHCNL","DISHR","DISHV",
              "DNJR","EBR.B","ECOLW","FELP","FRAC","GEF.B","GHDX","GIG.U",
              "GLACR","GLACU","GOODO","GOODP","GRP.U","GTN.A","HAIR",
              "HCP","HEI.A","HES","HESM","HPJ","HVT.A","ISRL","JIH.U","JIH.W",
              "JMPB","JW.A","JW.B","KOOL","LEN.B","LGF.A","LHC.U","LTXB",
              "MKC.V","MOG.A","MOG.B","NEWM","NRCG","NVTR","NYNY",
              "OAC.U","OIBR.C","OTIV","PBR.A","PIC.U","RDS.A","RDS.B",
              "RMG.U","RTEC","RVEN","SBE.U","SDR","SDT","SHOS","SSFN",
              "STNL","STNLU","STZ.B","SWJ","SYMC","TAP.A","TBLTU",
              "TMCX","TMCXU","TPGH","TPGH.U","TTS","UBNK","WSO.B","SCE.PB"))

colnames(notFound) <- 'stock'
notFound$stock <- as.character(paste(notFound$stock))

This combines the 5536 stocks and creates a group to subset those stocks that aren’t found in the getSymbols(src=‘yahoo’) search results.

allStockTicks2 <- as.data.frame(allStockTicks1)
colnames(allStockTicks2) <- 'stock'
allStockTicks2$stock <- as.factor(allStockTicks2$stock)

notFound$avail <- 'notAvail'
allStockTicks2$avail <- 'Avail'

availStocks <- merge(allStockTicks2, notFound, by.x='stock',by.y='stock',
                     all.x=TRUE)

availStocks1 <- availStocks[with(availStocks, order(avail.y,avail.x)),]
  
available <- subset(availStocks, is.na(availStocks$avail.y) &
                      availStocks$avail.x=='Avail')

notavail <- subset(availStocks, availStocks$avail.y=='notAvail')

We are interested in the available stocks, of which there are 5,428 stock tickers that are available on yahoo.

available1 <- as.character(available$stock)

tickers1 <- available1[c(1:700)]
tickers2 <- available1[c(701:1300)]
tickers3 <- available1[c(1301:2000)]
tickers4 <- available1[c(2001:2700)]
tickers5 <- available1[c(2701:3300)]
tickers6<- available1[c(3301:4000)]
tickers7 <- available1[c(4001:4700)]
tickers8 <- available1[c(4701:5428)]

All_portfolioPrices1 <- NULL

for (ticker in tickers1){
  All_portfolioPrices1 <- cbind(All_portfolioPrices1, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices2 <- NULL

for (ticker in tickers2){
  All_portfolioPrices2 <- cbind(All_portfolioPrices2, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices3 <- NULL

for (ticker in tickers3){
  All_portfolioPrices3 <- cbind(All_portfolioPrices3, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices4 <- NULL

for (ticker in tickers4){
  All_portfolioPrices4 <- cbind(All_portfolioPrices4, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

names <- as.data.frame(as.character(index(All_portfolioPrices1)))
colnames(names) <- 'Date'

names2 <- as.data.frame(as.character(index(All_portfolioPrices2)))
colnames(names2) <- 'Date'

p1 <- as.data.frame(All_portfolioPrices1)
p2 <- as.data.frame(All_portfolioPrices2)
p3 <- as.data.frame(All_portfolioPrices3)
p4 <- as.data.frame(All_portfolioPrices4)

val1 <- cbind(names,p1)
val2 <- cbind(names2,p2)
val3 <- cbind(names,p3)
val4 <- cbind(names,p4)

colnames(val1) <- gsub('.Close','',colnames(val1))
colnames(val2) <- gsub('.Close','',colnames(val2))
colnames(val3) <- gsub('.Close','',colnames(val3))
colnames(val4) <- gsub('.Close','',colnames(val4))
     
close1 <- gather(val1,'stockName','stockValue',2:701)
close2 <- gather(val2,'stockName','stockValue',2:601)
close3 <- gather(val3,'stockName','stockValue',2:701)
close4 <- gather(val4,'stockName','stockValue',2:701)

All_portfolioPrices5 <- NULL


for (ticker in tickers5){
  All_portfolioPrices5 <- cbind(All_portfolioPrices5, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices6 <- NULL

for (ticker in tickers6){
  All_portfolioPrices6 <- cbind(All_portfolioPrices6, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices7 <- NULL

for (ticker in tickers7){
  All_portfolioPrices7 <- cbind(All_portfolioPrices7, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

All_portfolioPrices8 <- NULL

for (ticker in tickers8){
  All_portfolioPrices8 <- cbind(All_portfolioPrices8, getSymbols(ticker,
  from = date, 
  periodicity='daily', auto.assign=FALSE)[,4])
}

p5 <- as.data.frame(All_portfolioPrices5)
p6 <- as.data.frame(All_portfolioPrices6)
p7 <- as.data.frame(All_portfolioPrices7)
p8 <- as.data.frame(All_portfolioPrices8)

names1 <- as.data.frame(as.character(index(All_portfolioPrices5)))
colnames(names1) <- 'Date'

names2 <- as.data.frame(as.character(index(All_portfolioPrices6)))
colnames(names2) <- 'Date'

val5 <- cbind(names1,p5)
val6 <- cbind(names2,p6)
val7 <- cbind(names2,p7)
val8 <- cbind(names2,p8)

colnames(val5) <- gsub('.Close','',colnames(val5))
colnames(val6) <- gsub('.Close','',colnames(val6))
colnames(val7) <- gsub('.Close','',colnames(val7))
colnames(val8) <- gsub('.Close','',colnames(val8))

close5 <- gather(val5,'stockName','stockValue',2:601)
close6 <- gather(val6,'stockName','stockValue',2:701)
close7 <- gather(val7,'stockName','stockValue',2:701)
close8 <- gather(val8,'stockName','stockValue',2:729)

allStocksGathered <- rbind(close1,close2,close3,close4,
                                close5,close6,close7,close8)

Remove the NAs.

allStocksGathered1 <-
  allStocksGathered[complete.cases(allStocksGathered),]

Save this file to csv.

write.csv(allStocksGathered1,
          paste(paste(paste(paste('allStocksGathered1', date,sep='_'),
                      today,sep='_'),lag,sep='_'), 'csv', sep='.'),
          row.names=FALSE)

%%%%%%%%%%%%%%%

Retrieve the large (approximately 442 mb file size) from Kaggle called allStocksGathered1.csv for the large data processed in the last part of this script and to get the individual stock stats and counts of increasing and decreasing days.This script is in github as newStocksLagsCountsGroups.Rmd that made the csv large data file just mentioned.

%^&*

allStocksGathered1 <- read.csv('allStocksGathered1_1999-01-01_2020-03-25_5.csv',sep=',', header=TRUE,
                               na.strings=c('',' ','NA'))

Lets create the lag and count/group by counts fields to analyze by each stock, using subsets of each stock.

DF <- allStocksGathered1
DF$Date <- as.Date(DF$Date)

%^&%

What stock ticker are you interested in? And what lag are you interested in? You can keep the lag from the top of this script or these defaults or change them here. You need to enter the stock name in the chunk for Rmarkdown below. Also, pick a start date and end date of the time you want to select for running the counts and groups of counts.

stock_1 <- toupper("yelp")
lag <- 3
startDate <- '2012-12-31'
endDate <- '2020-03-25'
lead <- 1

Lets subset our large table to get the dates requested.

sDF <- subset(DF, DF$Date>=startDate & DF$Date<=endDate)

lagN <- paste('lag',lag,sep='')
cat('The number of days to retrieve the stock value compared to each day value listed as an instance is ',lag,'and the stock to look up this information for is ',stock_1)

## The number of days to retrieve the stock value compared to each day value listed as an instance is  3 and the stock to look up this information for is  YELP

stknme <- as.character(newStocks[newStocks$stockSymbol==stock_1,2])
cat('\nThis stock is ',stknme)

## 
## This stock is  Yelp Inc.

Lstock_1 <- subset(sDF, sDF$stockName==stock_1)

cat('The number of days for trading that this time period will provide counts of increasing and decreasing days is ', length(Lstock_1$Date), ' trading days.')

## The number of days for trading that this time period will provide counts of increasing and decreasing days is  1820  trading days.

Generic automation of above stock to look up and the lag to use for generating the counts, group of counts, and lag values to get those counts.

Lstock_1$startDayValue <-Lstock_1$stockValue[1]
Lstock_1$startDayDate <- Lstock_1$Date[1]
Lstock_1$finalDayValue <-Lstock_1$stockValue[length(Lstock_1$stockValue)]
Lstock_1$finalDayDate <- Lstock_1$Date[length(Lstock_1$Date)]

stock_1LN <- lag(Lstock_1$stockValue, lag)
Lstock_1$lagN <- stock_1LN
Lstock_1$today2_lagN <- Lstock_1$stockValue/Lstock_1$lagN
Lstock_1 <- Lstock_1[complete.cases(Lstock_1),]

Lets look at the data we will be adding counts of increasing and decreasing days, for the time interval dates and stock values at the beginning and end of the time interval available or requested.

cat('\nThe lag for this table was for ',lag,'days.','\nThe stock to look up was ', stock_1,'.\nThe start date of this stock and starting value was ',as.character(paste(Lstock_1$startDayDate[1])),' and ','$',Lstock_1$startDayValue[1],'\nThe end date and end date price of this stock analysis is ',as.character(paste(Lstock_1$finalDayDate[1])),' and ','$',Lstock_1$finalDayValue[1])

## 
## The lag for this table was for  3 days. 
## The stock to look up was  YELP .
## The start date of this stock and starting value was  2012-12-31  and  $ 18.85 
## The end date and end date price of this stock analysis is  2020-03-24  and  $ 20.64

roi <- Lstock_1$finalDayValue[1]/Lstock_1$startDayValue[1]
cat('\nThe return on investment as a percentage of the amount invested for this time period is ',roi)

## 
## The return on investment as a percentage of the amount invested for this time period is  1.09496

cat('\n\nIn dollars initially invested your return  is $',Lstock_1$finalDayValue[1]-Lstock_1$startDayValue[1], 'for the dates:',
   as.character(paste(Lstock_1$startDayDate[1])),'through ',
   as.character(paste(Lstock_1$finalDayDate[1])))

## 
## 
## In dollars initially invested your return  is $ 1.79 for the dates: 2012-12-31 through  2020-03-24

Now, lets look at the counts and group counts of increasing and decreasing days for this stock and the time period available.

#assign a 1 to increasing values
Lstock_1$todayGrtrThan_lagN <- ifelse(Lstock_1$today2_lagN>1, 1,0)

Lstock_1$cumulativeSumTodayGrtrThan_lagN <- cumsum(Lstock_1$todayGrtrThan_lagN)

# get the count of how many instances repeat, 
# those counts repeating are counts that measure the days cumulatively decreasing
# those cumulative counts that don't repeat, are counting increasing days.
# These are stock values for today's value to 7 days prior value.
countstock_10 <- Lstock_1 %>% group_by(cumulativeSumTodayGrtrThan_lagN) %>% count(n=n())
countstock_10 <- as.data.frame(countstock_10)
countstock_10 <- countstock_10[,-3]
colnames(countstock_10)[2] <- 'nRepeatsTodayGrtrThan_lagN'

# Count the REPEATS of each number (minus the initial start) 
countstock_10$decrDaysThisCycle <- countstock_10$n-1

# Count the number of times the cycle count repeats in this time span exactly that many days
countstock_10b <- countstock_10 %>% group_by(decrDaysThisCycle) %>% count(n=n())
countstock_10b <- as.data.frame(countstock_10b)
countstock_10b <- countstock_10b[,-3]
colnames(countstock_10b)[2] <- 'nTimesDecrDayCountsOccurs'

#combine these two count matrices of decreasing days
countsstock_tableDecr <- merge(countstock_10, countstock_10b, by.x='decrDaysThisCycle',
                             by.y='decrDaysThisCycle')

#combine the counts to the stock subset
stock_3 <- merge(Lstock_1, countsstock_tableDecr, by.x='cumulativeSumTodayGrtrThan_lagN',
               by.y='cumulativeSumTodayGrtrThan_lagN')

#assign a 1 to decreasing values
stock_3$todayLessThan_lagN <- ifelse(stock_3$today2_lagN>1, 0,1)

stock_3$cumulativeSumTodayLessThan_lagN <- cumsum(stock_3$todayLessThan_lagN)

# get the count of how many instances repeat, 
# those counts repeating are counts that measure the days cumulatively increasing
# those cumulative counts that don't repeat, are counting decreasing days.
# These are stock values for today's value to 7 days prior value.
countstock_11 <- stock_3 %>% group_by(cumulativeSumTodayLessThan_lagN) %>% count(n=n())
countstock_11 <- as.data.frame(countstock_11)
countstock_11 <- countstock_11[,-3]
colnames(countstock_11)[2] <- 'nRepeatsTodayLessThan_lagN'

# Count the REPEATS of each number (minus the initial start) 
countstock_11$incrDaysThisCycle <- countstock_11$n-1

# Count the number of times the cycle count repeats in this time span exactly that many days
countstock_11b <- countstock_11 %>% group_by(incrDaysThisCycle) %>% count(n=n())
countstock_11b <- as.data.frame(countstock_11b)
countstock_11b <- countstock_11b[,-3]
colnames(countstock_11b)[2] <- 'nTimesIncrDayCountsOccurs'

#combine these two count matrices of decreasing days
countsstock_tableIncr <- merge(countstock_11, countstock_11b,
                               by.x='incrDaysThisCycle',
                             by.y='incrDaysThisCycle')

#combine the counts to the stock subset
stock_4 <- merge(stock_3, countsstock_tableIncr,
                 by.x='cumulativeSumTodayLessThan_lagN',
               by.y='cumulativeSumTodayLessThan_lagN')

colnames(stock_4)

##  [1] "cumulativeSumTodayLessThan_lagN" "cumulativeSumTodayGrtrThan_lagN"
##  [3] "Date"                            "stockName"                      
##  [5] "stockValue"                      "startDayValue"                  
##  [7] "startDayDate"                    "finalDayValue"                  
##  [9] "finalDayDate"                    "lagN"                           
## [11] "today2_lagN"                     "todayGrtrThan_lagN"             
## [13] "decrDaysThisCycle"               "nRepeatsTodayGrtrThan_lagN"     
## [15] "nTimesDecrDayCountsOccurs"       "todayLessThan_lagN"             
## [17] "incrDaysThisCycle"               "nRepeatsTodayLessThan_lagN"     
## [19] "nTimesIncrDayCountsOccurs"

stock_5 <- stock_4[,c(3:11,
                      12,2,14,13,15,
                      16,1,18,17,19)]
colnames(stock_5) <- gsub('lagN',lagN,colnames(stock_5))
colnames(stock_5)

##  [1] "Date"                            "stockName"                      
##  [3] "stockValue"                      "startDayValue"                  
##  [5] "startDayDate"                    "finalDayValue"                  
##  [7] "finalDayDate"                    "lag3"                           
##  [9] "today2_lag3"                     "todayGrtrThan_lag3"             
## [11] "cumulativeSumTodayGrtrThan_lag3" "nRepeatsTodayGrtrThan_lag3"     
## [13] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
## [15] "todayLessThan_lag3"              "cumulativeSumTodayLessThan_lag3"
## [17] "nRepeatsTodayLessThan_lag3"      "incrDaysThisCycle"              
## [19] "nTimesIncrDayCountsOccurs"

pretty_headers <- str_to_title(colnames(stock_5))

stock5 <- datatable(data=stock_5,  rownames=FALSE,
                      colnames=pretty_headers,
                      filter=list(position='top'),
                      options=list(
                        dom='Bfrtip',
                        buttons=c('colvis','csv','excel'),
                        language=list(sSearch='Filter:')),
                      extensions=c('Buttons','Responsive')
                      )
stock5

Using this information on one stock of the thousands available in our large csv file and table, lets return the count information and the number of times this stock has seen those exact days of counts.

cat('\nThe number of times this stock has decreased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates\' stock value is ', stock_5$decrDaysThisCycle[length(stock_5$decrDaysThisCycle)],'\n')

## 
## The number of times this stock has decreased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates' stock value is  0

cat('\nThe number of times this stock has increased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates\' stock value is ', stock_5$incrDaysThisCycle[length(stock_5$incrDaysThisCycle)],'\n')

## 
## The number of times this stock has increased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates' stock value is  3

cat('\nThe number of times this stock has decreased exactly this number of days compared to its price ', lag, ' days ago, is ', stock_5$nTimesDecrDayCountsOccurs[length(stock_5$nTimesDecrDayCountsOccurs)],'\n')

## 
## The number of times this stock has decreased exactly this number of days compared to its price  3  days ago, is  702

cat('\nThe number of times this stock has increased exactly this number of days compared to its price ', lag, ' days ago, is ', stock_5$nTimesIncrDayCountsOccurs[length(stock_5$nTimesIncrDayCountsOccurs)],'\n')

## 
## The number of times this stock has increased exactly this number of days compared to its price  3  days ago, is  52

The unique number of days this stock selected (for the time period retrieved) decreased is shown in the table below.

stock_5[unique(stock_5$decrDaysThisCycle),c(1:3,8,9,13,14)]

##          Date stockName stockValue  lag3 today2_lag3 decrDaysThisCycle
## 1  2013-01-04      YELP      21.52 18.85   1.1416446                 0
## 5  2013-01-10      YELP      22.19 21.08   1.0526565                 0
## 3  2013-01-08      YELP      20.83 20.15   1.0337469                 1
## 2  2013-01-07      YELP      21.08 19.70   1.0700508                 0
## 8  2013-01-15      YELP      20.61 22.19   0.9287968                 5
## 7  2013-01-14      YELP      21.97 20.76   1.0582852                 5
## 4  2013-01-09      YELP      20.76 21.52   0.9646840                 1
## 10 2013-01-17      YELP      20.27 21.97   0.9226218                 5
## 6  2013-01-11      YELP      22.12 20.83   1.0619299                 0
## 14 2013-01-24      YELP      20.83 20.30   1.0261084                 0
## 9  2013-01-16      YELP      20.36 22.12   0.9204340                 5
## 11 2013-01-18      YELP      20.30 20.61   0.9849588                 5
## 13 2013-01-23      YELP      20.34 20.27   1.0034534                 0
## 15 2013-01-25      YELP      21.17 19.90   1.0638191                 0
##    nTimesDecrDayCountsOccurs
## 1                        702
## 5                        702
## 3                         87
## 2                        702
## 8                         16
## 7                         16
## 4                         87
## 10                        16
## 6                        702
## 14                       702
## 9                         16
## 11                        16
## 13                       702
## 15                       702

The unique number of days this stock selected (for the time period retrieved) increased is shown in the table below.

stock_5[unique(stock_5$incrDaysThisCycle),c(1:3,8,9,18,19)]

##          Date stockName stockValue  lag3 today2_lag3 incrDaysThisCycle
## 2  2013-01-07      YELP      21.08 19.70   1.0700508                 2
## 3  2013-01-08      YELP      20.83 20.15   1.0337469                 2
## 5  2013-01-10      YELP      22.19 21.08   1.0526565                 3
## 1  2013-01-04      YELP      21.52 18.85   1.1416446                 2
## 10 2013-01-17      YELP      20.27 21.97   0.9226218                 0
## 8  2013-01-15      YELP      20.61 22.19   0.9287968                 0
## 4  2013-01-09      YELP      20.76 21.52   0.9646840                 3
## 13 2013-01-23      YELP      20.34 20.27   1.0034534                 5
## 7  2013-01-14      YELP      21.97 20.76   1.0582852                 3
## 6  2013-01-11      YELP      22.12 20.83   1.0619299                 3
## 9  2013-01-16      YELP      20.36 22.12   0.9204340                 0
## 12 2013-01-22      YELP      19.90 20.36   0.9774067                 5
## 11 2013-01-18      YELP      20.30 20.61   0.9849588                 0
## 15 2013-01-25      YELP      21.17 19.90   1.0638191                 5
##    nTimesIncrDayCountsOccurs
## 2                         21
## 3                         21
## 5                         52
## 1                         21
## 10                       609
## 8                        609
## 4                         52
## 13                        11
## 7                         52
## 6                         52
## 9                        609
## 12                        11
## 11                       609
## 15                        11

%%%%%%%%%%%%%%%%%

Lets do some machine learning to predict the next trading day stock value of being increased by 2% as 1, or not being decreased by 2% as a 0 value.This will use the lag set earlier in the above table, and those counts and other fields already used, but add in a lead price field for 1 day. We can then replay this scenario with the lead price in 2 weeks, 1 month, and so on as needed.

colnames(stock_5)

##  [1] "Date"                            "stockName"                      
##  [3] "stockValue"                      "startDayValue"                  
##  [5] "startDayDate"                    "finalDayValue"                  
##  [7] "finalDayDate"                    "lag3"                           
##  [9] "today2_lag3"                     "todayGrtrThan_lag3"             
## [11] "cumulativeSumTodayGrtrThan_lag3" "nRepeatsTodayGrtrThan_lag3"     
## [13] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
## [15] "todayLessThan_lag3"              "cumulativeSumTodayLessThan_lag3"
## [17] "nRepeatsTodayLessThan_lag3"      "incrDaysThisCycle"              
## [19] "nTimesIncrDayCountsOccurs"

Lets remove the date fields and stock name. We will name the rows of this matrix by the date before deleting the Date field.

stock_6 <- stock_5
row.names(stock_6) <- stock_6$Date
stock_6 <- stock_6[,-c(1,2,4:7)]#removes the date, stock name, and start and final dates/values

stock_6$NextDayPrice <- lead(stock_6$stockValue,lead)
stock_6 <- stock_6[complete.cases(stock_6$NextDayPrice),]
stock_6$NextDay2PercentIncrease <- ifelse(stock_6$NextDayPrice/stock_6$stockValue >1.02, 1,0)

stock_6$NextDay2PercentIncrease <- as.factor(stock_6$NextDay2PercentIncrease)

head(stock_6)

##            stockValue  lag3 today2_lag3 todayGrtrThan_lag3
## 2013-01-04      21.52 18.85    1.141645                  1
## 2013-01-07      21.08 19.70    1.070051                  1
## 2013-01-08      20.83 20.15    1.033747                  1
## 2013-01-09      20.76 21.52    0.964684                  0
## 2013-01-10      22.19 21.08    1.052657                  1
## 2013-01-11      22.12 20.83    1.061930                  1
##            cumulativeSumTodayGrtrThan_lag3 nRepeatsTodayGrtrThan_lag3
## 2013-01-04                               1                          1
## 2013-01-07                               2                          1
## 2013-01-08                               3                          2
## 2013-01-09                               3                          2
## 2013-01-10                               4                          1
## 2013-01-11                               5                          1
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag3
## 2013-01-04                 0                       702                  0
## 2013-01-07                 0                       702                  0
## 2013-01-08                 1                        87                  0
## 2013-01-09                 1                        87                  1
## 2013-01-10                 0                       702                  0
## 2013-01-11                 0                       702                  0
##            cumulativeSumTodayLessThan_lag3 nRepeatsTodayLessThan_lag3
## 2013-01-04                               0                          3
## 2013-01-07                               0                          3
## 2013-01-08                               0                          3
## 2013-01-09                               1                          4
## 2013-01-10                               1                          4
## 2013-01-11                               1                          4
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDayPrice
## 2013-01-04                 2                        21        21.08
## 2013-01-07                 2                        21        20.83
## 2013-01-08                 2                        21        20.76
## 2013-01-09                 3                        52        22.19
## 2013-01-10                 3                        52        22.12
## 2013-01-11                 3                        52        21.97
##            NextDay2PercentIncrease
## 2013-01-04                       0
## 2013-01-07                       0
## 2013-01-08                       0
## 2013-01-09                       1
## 2013-01-10                       0
## 2013-01-11                       0

Remove the next day price field just added so that it doesn’t have multicollinearity or correlation with the predictions of whether or not the next day will increase by 2% (1 value) or not (0 value).

colnames(stock_6)

##  [1] "stockValue"                      "lag3"                           
##  [3] "today2_lag3"                     "todayGrtrThan_lag3"             
##  [5] "cumulativeSumTodayGrtrThan_lag3" "nRepeatsTodayGrtrThan_lag3"     
##  [7] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
##  [9] "todayLessThan_lag3"              "cumulativeSumTodayLessThan_lag3"
## [11] "nRepeatsTodayLessThan_lag3"      "incrDaysThisCycle"              
## [13] "nTimesIncrDayCountsOccurs"       "NextDayPrice"                   
## [15] "NextDay2PercentIncrease"

For this set that is column 14 for ‘NextDayPrice’ to remove before running our predictive analytics algorithms.

set.seed(12356789)

NextDayPrice <- as.vector(stock_6$NextDayPrice)
stock_6 <- stock_6[,-14]

inTrain <- createDataPartition(y=stock_6$NextDay2PercentIncrease, p=0.7, list=FALSE)

trainingSet <- stock_6[inTrain,]
testingSet <- stock_6[-inTrain,]

Here are the first few observations in the training set.

head(trainingSet)

##            stockValue  lag3 today2_lag3 todayGrtrThan_lag3
## 2013-01-04      21.52 18.85   1.1416446                  1
## 2013-01-08      20.83 20.15   1.0337469                  1
## 2013-01-09      20.76 21.52   0.9646840                  0
## 2013-01-10      22.19 21.08   1.0526565                  1
## 2013-01-11      22.12 20.83   1.0619299                  1
## 2013-01-15      20.61 22.19   0.9287968                  0
##            cumulativeSumTodayGrtrThan_lag3 nRepeatsTodayGrtrThan_lag3
## 2013-01-04                               1                          1
## 2013-01-08                               3                          2
## 2013-01-09                               3                          2
## 2013-01-10                               4                          1
## 2013-01-11                               5                          1
## 2013-01-15                               6                          6
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag3
## 2013-01-04                 0                       702                  0
## 2013-01-08                 1                        87                  0
## 2013-01-09                 1                        87                  1
## 2013-01-10                 0                       702                  0
## 2013-01-11                 0                       702                  0
## 2013-01-15                 5                        16                  1
##            cumulativeSumTodayLessThan_lag3 nRepeatsTodayLessThan_lag3
## 2013-01-04                               0                          3
## 2013-01-08                               0                          3
## 2013-01-09                               1                          4
## 2013-01-10                               1                          4
## 2013-01-11                               1                          4
## 2013-01-15                               2                          1
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease
## 2013-01-04                 2                        21                       0
## 2013-01-08                 2                        21                       0
## 2013-01-09                 3                        52                       1
## 2013-01-10                 3                        52                       0
## 2013-01-11                 3                        52                       0
## 2013-01-15                 0                       609                       0

Here are the first few observations in the testing set.

head(testingSet)

##            stockValue  lag3 today2_lag3 todayGrtrThan_lag3
## 2013-01-07      21.08 19.70   1.0700508                  1
## 2013-01-14      21.97 20.76   1.0582852                  1
## 2013-01-18      20.30 20.61   0.9849588                  0
## 2013-01-22      19.90 20.36   0.9774067                  0
## 2013-01-24      20.83 20.30   1.0261084                  1
## 2013-01-28      21.03 20.34   1.0339233                  1
##            cumulativeSumTodayGrtrThan_lag3 nRepeatsTodayGrtrThan_lag3
## 2013-01-07                               2                          1
## 2013-01-14                               6                          6
## 2013-01-18                               6                          6
## 2013-01-22                               6                          6
## 2013-01-24                               8                          1
## 2013-01-28                              10                          1
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag3
## 2013-01-07                 0                       702                  0
## 2013-01-14                 5                        16                  0
## 2013-01-18                 5                        16                  1
## 2013-01-22                 5                        16                  1
## 2013-01-24                 0                       702                  0
## 2013-01-28                 0                       702                  0
##            cumulativeSumTodayLessThan_lag3 nRepeatsTodayLessThan_lag3
## 2013-01-07                               0                          3
## 2013-01-14                               1                          4
## 2013-01-18                               5                          1
## 2013-01-22                               6                          6
## 2013-01-24                               6                          6
## 2013-01-28                               6                          6
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease
## 2013-01-07                 2                        21                       0
## 2013-01-14                 3                        52                       0
## 2013-01-18                 0                       609                       0
## 2013-01-22                 5                        11                       1
## 2013-01-24                 5                        11                       0
## 2013-01-28                 5                        11                       0

Some modifications to the data as a factor instead of numeric when copy and pasting the code above. The random forest is taking a tree approach to classification of the next day’s value as 1 for increased by 2% or a 0 for didn’t increase by 2%. Otherwise, the random forest would have done regression as done in previous two runs with the next day price value guess. Below is the caret random forest or you could use the random forest package. Make sure not to keep the round(,0) modification or all values will be NA. If you just run this, then no worries.

rfMod <- train(NextDay2PercentIncrease ~ ., method='rf', data=(trainingSet), 
               trControl=trainControl(method='cv'), number=5)

predRF <- predict(rfMod, testingSet)

predDF <- data.frame(predRF, NextDay2PercentIncrease=testingSet$NextDay2PercentIncrease)
print(rbind(head(predDF,10),tail(predDF,10)))

##     predRF NextDay2PercentIncrease
## 1        0                       0
## 2        0                       0
## 3        0                       0
## 4        1                       1
## 5        0                       0
## 6        0                       0
## 7        0                       1
## 8        1                       0
## 9        0                       0
## 10       0                       0
## 535      0                       0
## 536      0                       1
## 537      0                       0
## 538      0                       0
## 539      0                       0
## 540      0                       0
## 541      0                       1
## 542      0                       0
## 543      1                       0
## 544      1                       1

sum <- sum(predRF==testingSet$NextDay2PercentIncrease) 
length <- length(testingSet$NextDay2PercentIncrease)
accuracy_rfMod <- (sum/length) 
accuracy_rfMod

## [1] 0.8198529

results <- c(round(accuracy_rfMod,2), round(100,2))
results <- as.factor(results)
results <- t(data.frame(results))

colnames(results) <- colnames(predDF)
Results <- rbind(predDF, results) 
print(rbind(head(Results),tail(Results)))

##         predRF NextDay2PercentIncrease
## 1            0                       0
## 2            0                       0
## 3            0                       0
## 4            1                       1
## 5            0                       0
## 6            0                       0
## 540          0                       0
## 541          0                       1
## 542          0                       0
## 543          1                       0
## 544          1                       1
## results   0.82                     100

From the above, 80% is not bad. You could run this program just using the random forest for if the next day value will increase by 2% and be right 4 out of 5 days. Lets see the rest using the caret package for the classification. The metric was changed to ‘Accuracy’ or ‘Kappa’ to use the train() of the caret package to classify instead of regress the features on the target.

knnMod <- train(NextDay2PercentIncrease ~ .,
                method='knn', preProcess=c('center','scale'),
                tuneLength=10, metric='Accuracy',
                trControl=trainControl(method='cv'), data=trainingSet)

rpartMod <- train(NextDay2PercentIncrease ~ ., 
                        method='rpart', tuneLength=9, metric='Accuracy',
                        data=trainingSet)

glmMod <- train(NextDay2PercentIncrease ~ ., metric='Kappa',
                method='glm', data=trainingSet)

predKNN <- predict(knnMod, testingSet)

predRPART <- predict(rpartMod, testingSet)

predGLM <- predict(glmMod, testingSet)

length=length(testingSet$NextDay2PercentIncrease)
length

## [1] 544

sumKNN <- sum(predKNN==testingSet$NextDay2PercentIncrease)
sumRPart <- sum(predRPART==testingSet$NextDay2PercentIncrease)
sumGLM <- sum(predGLM==testingSet$NextDay2PercentIncrease)

cat('The number correct for KNN is ',sumKNN, ', the number correct for Rpart is ',sumRPart,
    ' and the number correct for GLM is ',sumGLM,'.')

## The number correct for KNN is  437 , the number correct for Rpart is  447  and the number correct for GLM is  441 .

accuracy_KNN <- sumKNN/length 
accuracy_RPART <- sumRPart/length 
accuracy_GLM <- sumGLM/length 

cat('The accuracy for KNN, RPart, and GLM respectively is: ',
    accuracy_KNN,accuracy_RPART,accuracy_GLM,'.')

## The accuracy for KNN, RPart, and GLM respectively is:  0.8033088 0.8216912 0.8106618 .

predDF2 <- data.frame(predRF,predKNN,predRPART,predGLM, 
                      NextDay2PercentIncrease=testingSet$NextDay2PercentIncrease)
colnames(predDF2) <- c('RandomForest','KNN','Rpart','GLM','NextDay2PercentIncrease')

results <- c(round(accuracy_rfMod,2),  
             round(accuracy_KNN,2), 
             round(accuracy_RPART,2),
             round(accuracy_GLM,2), 
             round(100,2))

results <- as.factor(results)
results <- t(data.frame(results))
colnames(results) <- c('RandomForest','KNN','Rpart','GLM',
                       'NextDay2PercentIncrease')
Results <- rbind(predDF2, results) 
Results1 <- datatable(data=Results, rownames=FALSE,
                      filter=list(position='top'),
                      options=list(
                        dom='Bfrtip',
                        buttons=c('csv'),
                        language=list(sSearch='Filter:')),
                      extensions=c('Buttons','Responsive'))
Results1

Lets now align these predicted values to the subset for the stock YELP made earlier.

stock_6$Date <- row.names(stock_6)
testingSet$Date <- row.names(testingSet)
predDF2$Date <- row.names(testingSet)

stock_7 <- merge(testingSet,predDF2, by.x='Date', by.y='Date')
tbl <- rbind(head(stock_7),tail(stock_7))
tbl

##           Date stockValue  lag3 today2_lag3 todayGrtrThan_lag3
## 1   2013-01-07      21.08 19.70   1.0700508                  1
## 2   2013-01-14      21.97 20.76   1.0582852                  1
## 3   2013-01-18      20.30 20.61   0.9849588                  0
## 4   2013-01-22      19.90 20.36   0.9774067                  0
## 5   2013-01-24      20.83 20.30   1.0261084                  1
## 6   2013-01-28      21.03 20.34   1.0339233                  1
## 539 2020-02-14      35.23 34.76   1.0135213                  1
## 540 2020-02-18      34.57 35.36   0.9776584                  0
## 541 2020-03-03      30.59 31.40   0.9742038                  0
## 542 2020-03-11      24.00 28.94   0.8293020                  0
## 543 2020-03-16      17.68 24.00   0.7366667                  0
## 544 2020-03-20      17.80 17.53   1.0154022                  1
##     cumulativeSumTodayGrtrThan_lag3 nRepeatsTodayGrtrThan_lag3
## 1                                 2                          1
## 2                                 6                          6
## 3                                 6                          6
## 4                                 6                          6
## 5                                 8                          1
## 6                                10                          1
## 539                             951                         12
## 540                             951                         12
## 541                             951                         12
## 542                             952                         12
## 543                             952                         12
## 544                             953                          1
##     decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag3
## 1                   0                       702                  0
## 2                   5                        16                  0
## 3                   5                        16                  1
## 4                   5                        16                  1
## 5                   0                       702                  0
## 6                   0                       702                  0
## 539                11                         5                  0
## 540                11                         5                  1
## 541                11                         5                  1
## 542                11                         5                  1
## 543                11                         5                  1
## 544                 0                       702                  0
##     cumulativeSumTodayLessThan_lag3 nRepeatsTodayLessThan_lag3
## 1                                 0                          3
## 2                                 1                          4
## 3                                 5                          1
## 4                                 6                          6
## 5                                 6                          6
## 6                                 6                          6
## 539                             840                          5
## 540                             841                          1
## 541                             851                          2
## 542                             856                          1
## 543                             859                          1
## 544                             862                          4
##     incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease.x
## 1                   2                        21                         0
## 2                   3                        52                         0
## 3                   0                       609                         0
## 4                   5                        11                         1
## 5                   5                        11                         0
## 6                   5                        11                         0
## 539                 4                        42                         0
## 540                 0                       609                         0
## 541                 1                        73                         1
## 542                 0                       609                         0
## 543                 0                       609                         0
## 544                 3                        52                         1
##     RandomForest KNN Rpart GLM NextDay2PercentIncrease.y
## 1              0   0     0   0                         0
## 2              0   0     0   0                         0
## 3              0   0     0   0                         0
## 4              1   1     1   1                         1
## 5              0   0     0   0                         0
## 6              0   0     0   0                         0
## 539            0   0     0   0                         0
## 540            0   0     0   0                         0
## 541            0   0     0   0                         1
## 542            0   0     0   0                         0
## 543            1   0     0   0                         0
## 544            1   0     0   0                         1

stock_7$RandomForest <- as.numeric(paste(stock_7$RandomForest))
stock_7$Rpart <- as.numeric(paste(stock_7$Rpart))
stock_7$GLM <- as.numeric(paste(stock_7$GLM))
stock_7$NextDay2PercentIncrease.y <-
  as.numeric(paste(stock_7$NextDay2PercentIncrease.y))
stock_7$KNN <- as.numeric(paste(stock_7$KNN))

stock_7$mostVoted <- ifelse(rowSums(stock_7[,16:19])>=2,1,0)
                          
sMost <- sum(stock_7$mostVoted==stock_7$NextDay2PercentIncrease.y)
lMost <- length(stock_7$NextDay2PercentIncrease.y)
accMost <- sMost/lMost
results <- as.data.frame(c(round(accuracy_rfMod,2),  
             round(accuracy_KNN,2), 
             round(accuracy_RPART,2),
             round(accuracy_GLM,2), 
             round(accMost,2),
             round(100,2)))
colnames(results) <- 'Results'
row.names(results) <- c('RandomForest','KNN','Rpart','GLM','mostVoted',
                       'NextDay2PercentIncrease')

results

##                         Results
## RandomForest               0.82
## KNN                        0.80
## Rpart                      0.82
## GLM                        0.81
## mostVoted                  0.82
## NextDay2PercentIncrease  100.00

The above accuracy measures for random forest, recursive partitioning trees, k-nearest neighbors, and generalized linear models and the most voted algorithm for this classification were shown. The following is the table of the first 30 predicted results on the testing set of samples. Also, add back in the NextDayPrice extracted earlier so the results weren’t dependent on the next day price when predicting whether the next day would be increased by 2%. Note that because not all stocks have the same time series start dates, the size of each stock time series being selected will vary with some stocks.

stock_6$NextDayPrice <- NextDayPrice #add back in to original complete dataset
stock_6b <- stock_6[,15:16]
stock_8 <- merge(stock_7,stock_6b,by.x='Date',by.y='Date')

print(rbind(head(stock_8,30),tail(stock_8,30)))

##           Date stockValue  lag3 today2_lag3 todayGrtrThan_lag3
## 1   2013-01-07      21.08 19.70   1.0700508                  1
## 2   2013-01-14      21.97 20.76   1.0582852                  1
## 3   2013-01-18      20.30 20.61   0.9849588                  0
## 4   2013-01-22      19.90 20.36   0.9774067                  0
## 5   2013-01-24      20.83 20.30   1.0261084                  1
## 6   2013-01-28      21.03 20.34   1.0339233                  1
## 7   2013-02-04      20.34 20.46   0.9941349                  0
## 8   2013-02-11      21.63 22.38   0.9664879                  0
## 9   2013-02-20      21.89 21.44   1.0209888                  1
## 10  2013-02-25      21.62 21.89   0.9876656                  0
## 11  2013-03-05      23.02 22.19   1.0374042                  1
## 12  2013-03-11      25.13 23.02   1.0916594                  1
## 13  2013-03-13      25.02 25.01   1.0003998                  1
## 14  2013-03-14      25.05 25.13   0.9968166                  0
## 15  2013-03-21      24.00 24.50   0.9795918                  0
## 16  2013-03-25      23.49 24.12   0.9738806                  0
## 17  2013-03-27      23.72 23.79   0.9970576                  0
## 18  2013-03-28      23.71 23.49   1.0093657                  1
## 19  2013-04-02      22.77 23.72   0.9599494                  0
## 20  2013-04-05      25.70 22.77   1.1286781                  1
## 21  2013-04-16      25.71 26.35   0.9757116                  0
## 22  2013-04-24      24.97 25.50   0.9792157                  0
## 23  2013-04-26      25.45 25.29   1.0063266                  1
## 24  2013-05-03      31.12 26.03   1.1955436                  1
## 25  2013-05-07      30.67 32.22   0.9518932                  0
## 26  2013-05-08      29.91 31.12   0.9611183                  0
## 27  2013-05-09      30.67 30.69   0.9993483                  0
## 28  2013-05-10      30.93 30.67   1.0084773                  1
## 29  2013-05-14      30.97 30.67   1.0097815                  1
## 30  2013-05-20      30.97 30.01   1.0319893                  1
## 515 2019-10-07      33.28 33.33   0.9984998                  0
## 516 2019-10-11      32.72 32.75   0.9990840                  0
## 517 2019-10-29      34.32 33.51   1.0241719                  1
## 518 2019-10-30      34.85 34.69   1.0046123                  1
## 519 2019-11-04      34.24 34.85   0.9824964                  0
## 520 2019-11-06      32.71 35.06   0.9329720                  0
## 521 2019-11-12      35.77 30.12   1.1875830                  1
## 522 2019-11-15      34.61 35.77   0.9675706                  0
## 523 2019-12-02      33.47 35.13   0.9527469                  0
## 524 2019-12-04      34.18 34.68   0.9855825                  0
## 525 2019-12-12      33.39 33.08   1.0093712                  1
## 526 2019-12-16      33.78 32.71   1.0327117                  1
## 527 2019-12-18      34.14 33.55   1.0175857                  1
## 528 2019-12-19      34.11 33.78   1.0097691                  1
## 529 2019-12-23      34.21 34.14   1.0020504                  1
## 530 2020-01-03      34.72 34.58   1.0040486                  1
## 531 2020-01-07      34.57 34.77   0.9942479                  0
## 532 2020-01-08      35.54 34.72   1.0236175                  1
## 533 2020-01-14      36.59 35.16   1.0406712                  1
## 534 2020-01-17      36.51 36.59   0.9978136                  0
## 535 2020-01-28      35.41 35.93   0.9855274                  0
## 536 2020-02-03      32.94 35.39   0.9307714                  0
## 537 2020-02-05      34.66 32.60   1.0631902                  1
## 538 2020-02-11      34.76 34.12   1.0187573                  1
## 539 2020-02-14      35.23 34.76   1.0135213                  1
## 540 2020-02-18      34.57 35.36   0.9776584                  0
## 541 2020-03-03      30.59 31.40   0.9742038                  0
## 542 2020-03-11      24.00 28.94   0.8293020                  0
## 543 2020-03-16      17.68 24.00   0.7366667                  0
## 544 2020-03-20      17.80 17.53   1.0154022                  1
##     cumulativeSumTodayGrtrThan_lag3 nRepeatsTodayGrtrThan_lag3
## 1                                 2                          1
## 2                                 6                          6
## 3                                 6                          6
## 4                                 6                          6
## 5                                 8                          1
## 6                                10                          1
## 7                                12                          4
## 8                                15                          2
## 9                                19                          1
## 10                               21                          4
## 11                               25                          1
## 12                               29                          1
## 13                               31                          2
## 14                               31                          2
## 15                               32                          9
## 16                               32                          9
## 17                               32                          9
## 18                               33                          4
## 19                               33                          4
## 20                               35                          1
## 21                               41                          8
## 22                               41                          8
## 23                               43                          1
## 24                               47                          1
## 25                               48                          4
## 26                               48                          4
## 27                               48                          4
## 28                               49                          1
## 29                               51                          4
## 30                               52                          1
## 515                             903                         11
## 516                             903                         11
## 517                             913                          1
## 518                             914                          2
## 519                             915                          5
## 520                             915                          5
## 521                             918                          1
## 522                             919                          4
## 523                             923                         11
## 524                             923                         11
## 525                             924                          1
## 526                             926                          1
## 527                             928                          1
## 528                             929                          1
## 529                             931                          2
## 530                             936                          3
## 531                             936                          3
## 532                             937                          1
## 533                             941                          1
## 534                             943                          5
## 535                             944                          8
## 536                             944                          8
## 537                             945                          1
## 538                             948                          1
## 539                             951                         12
## 540                             951                         12
## 541                             951                         12
## 542                             952                         12
## 543                             952                         12
## 544                             953                          1
##     decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag3
## 1                   0                       702                  0
## 2                   5                        16                  0
## 3                   5                        16                  1
## 4                   5                        16                  1
## 5                   0                       702                  0
## 6                   0                       702                  0
## 7                   3                        48                  1
## 8                   1                        87                  1
## 9                   0                       702                  0
## 10                  3                        48                  1
## 11                  0                       702                  0
## 12                  0                       702                  0
## 13                  1                        87                  0
## 14                  1                        87                  1
## 15                  8                         3                  1
## 16                  8                         3                  1
## 17                  8                         3                  1
## 18                  3                        48                  0
## 19                  3                        48                  1
## 20                  0                       702                  0
## 21                  7                         9                  1
## 22                  7                         9                  1
## 23                  0                       702                  0
## 24                  0                       702                  0
## 25                  3                        48                  1
## 26                  3                        48                  1
## 27                  3                        48                  1
## 28                  0                       702                  0
## 29                  3                        48                  0
## 30                  0                       702                  0
## 515                10                         5                  1
## 516                10                         5                  1
## 517                 0                       702                  0
## 518                 1                        87                  0
## 519                 4                        30                  1
## 520                 4                        30                  1
## 521                 0                       702                  0
## 522                 3                        48                  1
## 523                10                         5                  1
## 524                10                         5                  1
## 525                 0                       702                  0
## 526                 0                       702                  0
## 527                 0                       702                  0
## 528                 0                       702                  0
## 529                 1                        87                  0
## 530                 2                        30                  0
## 531                 2                        30                  1
## 532                 0                       702                  0
## 533                 0                       702                  0
## 534                 4                        30                  1
## 535                 7                         9                  1
## 536                 7                         9                  1
## 537                 0                       702                  0
## 538                 0                       702                  0
## 539                11                         5                  0
## 540                11                         5                  1
## 541                11                         5                  1
## 542                11                         5                  1
## 543                11                         5                  1
## 544                 0                       702                  0
##     cumulativeSumTodayLessThan_lag3 nRepeatsTodayLessThan_lag3
## 1                                 0                          3
## 2                                 1                          4
## 3                                 5                          1
## 4                                 6                          6
## 5                                 6                          6
## 6                                 6                          6
## 7                                 9                          1
## 8                                11                          2
## 9                                13                          6
## 10                               14                          1
## 11                               16                         11
## 12                               16                         11
## 13                               16                         11
## 14                               17                          2
## 15                               21                          1
## 16                               23                          1
## 17                               25                          2
## 18                               25                          2
## 19                               27                          1
## 20                               28                          9
## 21                               29                          1
## 22                               35                          5
## 23                               35                          5
## 24                               36                          4
## 25                               37                          1
## 26                               38                          1
## 27                               39                          4
## 28                               39                          4
## 29                               39                          4
## 30                               42                          3
## 515                             798                          1
## 516                             802                          5
## 517                             804                          8
## 518                             804                          8
## 519                             806                          1
## 520                             808                          1
## 521                             809                          5
## 522                             811                          1
## 523                             817                          1
## 524                             819                          1
## 525                             824                          9
## 526                             824                          9
## 527                             824                          9
## 528                             824                          9
## 529                             824                          9
## 530                             826                          2
## 531                             828                          8
## 532                             828                          8
## 533                             828                          8
## 534                             829                          1
## 535                             834                          1
## 536                             838                          1
## 537                             839                          4
## 538                             840                          5
## 539                             840                          5
## 540                             841                          1
## 541                             851                          2
## 542                             856                          1
## 543                             859                          1
## 544                             862                          4
##     incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease.x
## 1                   2                        21                         0
## 2                   3                        52                         0
## 3                   0                       609                         0
## 4                   5                        11                         1
## 5                   5                        11                         0
## 6                   5                        11                         0
## 7                   0                       609                         1
## 8                   1                        73                         0
## 9                   5                        11                         0
## 10                  0                       609                         0
## 11                 10                         6                         0
## 12                 10                         6                         0
## 13                 10                         6                         0
## 14                  1                        73                         0
## 15                  0                       609                         0
## 16                  0                       609                         0
## 17                  1                        73                         0
## 18                  1                        73                         0
## 19                  0                       609                         0
## 20                  8                        13                         0
## 21                  0                       609                         0
## 22                  4                        42                         0
## 23                  4                        42                         1
## 24                  3                        52                         0
## 25                  0                       609                         0
## 26                  0                       609                         1
## 27                  3                        52                         0
## 28                  3                        52                         0
## 29                  3                        52                         0
## 30                  2                        21                         0
## 515                 0                       609                         0
## 516                 4                        42                         0
## 517                 7                        14                         0
## 518                 7                        14                         0
## 519                 0                       609                         0
## 520                 0                       609                         0
## 521                 4                        42                         0
## 522                 0                       609                         0
## 523                 0                       609                         0
## 524                 0                       609                         0
## 525                 8                        13                         0
## 526                 8                        13                         0
## 527                 8                        13                         0
## 528                 8                        13                         0
## 529                 8                        13                         0
## 530                 1                        73                         0
## 531                 7                        14                         1
## 532                 7                        14                         0
## 533                 7                        14                         0
## 534                 0                       609                         0
## 535                 0                       609                         0
## 536                 0                       609                         1
## 537                 3                        52                         0
## 538                 4                        42                         0
## 539                 4                        42                         0
## 540                 0                       609                         0
## 541                 1                        73                         1
## 542                 0                       609                         0
## 543                 0                       609                         0
## 544                 3                        52                         1
##     RandomForest KNN Rpart GLM NextDay2PercentIncrease.y mostVoted NextDayPrice
## 1              0   0     0   0                         0         0        20.83
## 2              0   0     0   0                         0         0        20.61
## 3              0   0     0   0                         0         0        19.90
## 4              1   1     1   1                         1         1        20.34
## 5              0   0     0   0                         0         0        21.17
## 6              0   0     0   0                         0         0        20.84
## 7              0   0     0   0                         1         0        21.18
## 8              1   1     0   1                         0         1        21.66
## 9              0   0     0   0                         0         0        22.00
## 10             0   0     0   0                         0         0        21.29
## 11             0   1     0   0                         0         0        23.02
## 12             0   0     0   0                         0         0        23.73
## 13             0   0     0   0                         0         0        25.05
## 14             0   1     0   1                         0         1        25.10
## 15             0   0     0   0                         0         0        23.79
## 16             0   0     0   0                         0         0        23.68
## 17             0   0     0   0                         0         0        23.71
## 18             0   0     0   0                         0         0        22.89
## 19             0   0     0   0                         0         0        23.05
## 20             0   0     0   0                         0         0        25.00
## 21             0   0     0   0                         0         0        26.10
## 22             1   1     1   1                         0         1        25.42
## 23             0   0     0   0                         1         0        26.62
## 24             0   0     0   0                         0         0        30.69
## 25             0   0     0   0                         0         0        29.91
## 26             0   0     0   0                         1         0        30.67
## 27             1   1     1   1                         0         1        30.93
## 28             0   0     0   0                         0         0        30.92
## 29             0   0     0   0                         0         0        30.01
## 30             0   0     0   0                         0         0        31.58
## 515            0   0     0   0                         0         0        32.75
## 516            1   1     1   0                         0         1        32.81
## 517            0   0     0   0                         0         0        34.85
## 518            0   0     0   0                         0         0        34.51
## 519            0   0     0   0                         0         0        33.16
## 520            0   0     0   0                         0         0        30.12
## 521            0   0     0   0                         0         0        35.56
## 522            0   0     0   0                         0         0        34.78
## 523            0   0     0   0                         0         0        33.66
## 524            0   0     0   0                         0         0        33.42
## 525            0   0     0   0                         0         0        33.55
## 526            0   0     0   0                         0         0        34.00
## 527            0   0     0   0                         0         0        34.11
## 528            0   0     0   0                         0         0        34.03
## 529            0   0     0   0                         0         0        34.06
## 530            0   0     0   0                         0         0        34.44
## 531            0   0     1   0                         1         0        35.54
## 532            0   0     0   0                         0         0        35.16
## 533            0   0     0   0                         0         0        36.95
## 534            0   0     0   0                         0         0        35.33
## 535            0   0     0   0                         0         0        35.39
## 536            0   0     0   0                         1         0        34.02
## 537            0   0     0   0                         0         0        34.12
## 538            0   0     0   0                         0         0        35.36
## 539            0   0     0   0                         0         0        34.57
## 540            0   0     0   0                         0         0        34.04
## 541            0   0     0   0                         1         0        31.39
## 542            0   0     0   0                         0         0        21.46
## 543            1   0     0   0                         0         0        17.53
## 544            1   0     0   0                         1         0        18.51

The above is the first and last 30 of the testing set samples with the next day predictions in predicting whether the value would be 2% more than the previous day.The stock you asked for when running the script at the counts section is the stock that was analyzed. you can rerun this script for any of the stocks in that dataset where you search for these markers ‘%^&%’ in this script within RStudio.

%%%%%%%%%%%%%%%%%%%%%%%%%% ***

The above script can be run with a different lead value to see how well the algorithms use this time series with the same or varying lag prices to also predict the value of the stock increasing more than 2% in the future if the stock has enough information for how far into the future you want to forecast or predict. Lets do that but with a 1 day lag and a prediction for whether or not the stock value will increase in 30 days by 2%. We just change the lag and lead values to 1 and 30 respectively. There will be missing information, that will shrink our data for the last 30 days, if there aren’t more dates available after the end date specified.

stock_1 <- toupper("yelp")
lag <- 1
startDate <- '2012-12-31'
endDate <- '2020-03-25'
lead <- 30

Lets subset our large table to get the dates requested.

sDF <- subset(DF, DF$Date>=startDate & DF$Date<=endDate)

lagN <- paste('lag',lag,sep='')
cat('The number of days to retrieve the stock value compared to each day value listed as an instance is ',lag,'and the stock to look up this information for is ',stock_1)

## The number of days to retrieve the stock value compared to each day value listed as an instance is  1 and the stock to look up this information for is  YELP

stknme <- as.character(newStocks[newStocks$stockSymbol==stock_1,2])
cat('\nThis stock is ',stknme)

## 
## This stock is  Yelp Inc.

Lstock_1 <- subset(sDF, sDF$stockName==stock_1)

cat('The number of days for trading that this time period will provide counts of increasing and decreasing days is ', length(Lstock_1$Date), ' trading days.')

## The number of days for trading that this time period will provide counts of increasing and decreasing days is  1820  trading days.

Generic automation of above stock to look up and the lag to use for generating the counts, group of counts, and lag values to get those counts.

Lstock_1$startDayValue <-Lstock_1$stockValue[1]
Lstock_1$startDayDate <- Lstock_1$Date[1]
Lstock_1$finalDayValue <-Lstock_1$stockValue[length(Lstock_1$stockValue)]
Lstock_1$finalDayDate <- Lstock_1$Date[length(Lstock_1$Date)]

stock_1LN <- lag(Lstock_1$stockValue, lag)
Lstock_1$lagN <- stock_1LN
Lstock_1$today2_lagN <- Lstock_1$stockValue/Lstock_1$lagN
Lstock_1 <- Lstock_1[complete.cases(Lstock_1),]

Lets look at the data we will be adding counts of increasing and decreasing days, for the time interval dates and stock values at the beginning and end of the time interval available or requested.

cat('\nThe lag for this table was for ',lag,'days.','\nThe stock to look up was ', stock_1,'.\nThe start date of this stock and starting value was ',as.character(paste(Lstock_1$startDayDate[1])),' and ','$',Lstock_1$startDayValue[1],'\nThe end date and end date price of this stock analysis is ',as.character(paste(Lstock_1$finalDayDate[1])),' and ','$',Lstock_1$finalDayValue[1])

## 
## The lag for this table was for  1 days. 
## The stock to look up was  YELP .
## The start date of this stock and starting value was  2012-12-31  and  $ 18.85 
## The end date and end date price of this stock analysis is  2020-03-24  and  $ 20.64

roi <- Lstock_1$finalDayValue[1]/Lstock_1$startDayValue[1]
cat('\nThe return on investment as a percentage of the amount invested for this time period is ',roi)

## 
## The return on investment as a percentage of the amount invested for this time period is  1.09496

cat('\n\nIn dollars initially invested your return  is $',Lstock_1$finalDayValue[1]-Lstock_1$startDayValue[1], 'for the dates:',
   as.character(paste(Lstock_1$startDayDate[1])),'through ',
   as.character(paste(Lstock_1$finalDayDate[1])))

## 
## 
## In dollars initially invested your return  is $ 1.79 for the dates: 2012-12-31 through  2020-03-24

Now, lets look at the counts and group counts of increasing and decreasing days for this stock and the time period available.

#assign a 1 to increasing values
Lstock_1$todayGrtrThan_lagN <- ifelse(Lstock_1$today2_lagN>1, 1,0)

Lstock_1$cumulativeSumTodayGrtrThan_lagN <- cumsum(Lstock_1$todayGrtrThan_lagN)

# get the count of how many instances repeat, 
# those counts repeating are counts that measure the days cumulatively decreasing
# those cumulative counts that don't repeat, are counting increasing days.
# These are stock values for today's value to 7 days prior value.
countstock_10 <- Lstock_1 %>% group_by(cumulativeSumTodayGrtrThan_lagN) %>% count(n=n())
countstock_10 <- as.data.frame(countstock_10)
countstock_10 <- countstock_10[,-3]
colnames(countstock_10)[2] <- 'nRepeatsTodayGrtrThan_lagN'

# Count the REPEATS of each number (minus the initial start) 
countstock_10$decrDaysThisCycle <- countstock_10$n-1

# Count the number of times the cycle count repeats in this time span exactly that many days
countstock_10b <- countstock_10 %>% group_by(decrDaysThisCycle) %>% count(n=n())
countstock_10b <- as.data.frame(countstock_10b)
countstock_10b <- countstock_10b[,-3]
colnames(countstock_10b)[2] <- 'nTimesDecrDayCountsOccurs'

#combine these two count matrices of decreasing days
countsstock_tableDecr <- merge(countstock_10, countstock_10b, by.x='decrDaysThisCycle',
                             by.y='decrDaysThisCycle')

#combine the counts to the stock subset
stock_3 <- merge(Lstock_1, countsstock_tableDecr, by.x='cumulativeSumTodayGrtrThan_lagN',
               by.y='cumulativeSumTodayGrtrThan_lagN')

#assign a 1 to decreasing values
stock_3$todayLessThan_lagN <- ifelse(stock_3$today2_lagN>1, 0,1)

stock_3$cumulativeSumTodayLessThan_lagN <- cumsum(stock_3$todayLessThan_lagN)

# get the count of how many instances repeat, 
# those counts repeating are counts that measure the days cumulatively increasing
# those cumulative counts that don't repeat, are counting decreasing days.
# These are stock values for today's value to 7 days prior value.
countstock_11 <- stock_3 %>% group_by(cumulativeSumTodayLessThan_lagN) %>% count(n=n())
countstock_11 <- as.data.frame(countstock_11)
countstock_11 <- countstock_11[,-3]
colnames(countstock_11)[2] <- 'nRepeatsTodayLessThan_lagN'

# Count the REPEATS of each number (minus the initial start) 
countstock_11$incrDaysThisCycle <- countstock_11$n-1

# Count the number of times the cycle count repeats in this time span exactly that many days
countstock_11b <- countstock_11 %>% group_by(incrDaysThisCycle) %>% count(n=n())
countstock_11b <- as.data.frame(countstock_11b)
countstock_11b <- countstock_11b[,-3]
colnames(countstock_11b)[2] <- 'nTimesIncrDayCountsOccurs'

#combine these two count matrices of decreasing days
countsstock_tableIncr <- merge(countstock_11, countstock_11b,
                               by.x='incrDaysThisCycle',
                             by.y='incrDaysThisCycle')

#combine the counts to the stock subset
stock_4 <- merge(stock_3, countsstock_tableIncr,
                 by.x='cumulativeSumTodayLessThan_lagN',
               by.y='cumulativeSumTodayLessThan_lagN')

colnames(stock_4)

##  [1] "cumulativeSumTodayLessThan_lagN" "cumulativeSumTodayGrtrThan_lagN"
##  [3] "Date"                            "stockName"                      
##  [5] "stockValue"                      "startDayValue"                  
##  [7] "startDayDate"                    "finalDayValue"                  
##  [9] "finalDayDate"                    "lagN"                           
## [11] "today2_lagN"                     "todayGrtrThan_lagN"             
## [13] "decrDaysThisCycle"               "nRepeatsTodayGrtrThan_lagN"     
## [15] "nTimesDecrDayCountsOccurs"       "todayLessThan_lagN"             
## [17] "incrDaysThisCycle"               "nRepeatsTodayLessThan_lagN"     
## [19] "nTimesIncrDayCountsOccurs"

stock_5 <- stock_4[,c(3:11,
                      12,2,14,13,15,
                      16,1,18,17,19)]
colnames(stock_5) <- gsub('lagN',lagN,colnames(stock_5))
colnames(stock_5)

##  [1] "Date"                            "stockName"                      
##  [3] "stockValue"                      "startDayValue"                  
##  [5] "startDayDate"                    "finalDayValue"                  
##  [7] "finalDayDate"                    "lag1"                           
##  [9] "today2_lag1"                     "todayGrtrThan_lag1"             
## [11] "cumulativeSumTodayGrtrThan_lag1" "nRepeatsTodayGrtrThan_lag1"     
## [13] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
## [15] "todayLessThan_lag1"              "cumulativeSumTodayLessThan_lag1"
## [17] "nRepeatsTodayLessThan_lag1"      "incrDaysThisCycle"              
## [19] "nTimesIncrDayCountsOccurs"

pretty_headers <- str_to_title(colnames(stock_5))

stock5 <- datatable(data=stock_5,  rownames=FALSE,
                      colnames=pretty_headers,
                      filter=list(position='top'),
                      options=list(
                        dom='Bfrtip',
                        buttons=c('colvis','csv','excel'),
                        language=list(sSearch='Filter:')),
                      extensions=c('Buttons','Responsive')
                      )
stock5

cat('\nThe number of times this stock has decreased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates\' stock value is ', stock_5$decrDaysThisCycle[length(stock_5$decrDaysThisCycle)],'\n')

## 
## The number of times this stock has decreased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates' stock value is  0

cat('\nThe number of times this stock has increased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates\' stock value is ', stock_5$incrDaysThisCycle[length(stock_5$incrDaysThisCycle)],'\n')

## 
## The number of times this stock has increased in the current cycle from the start of this time period retrieved in price comparison of the number of days in lags retrieved prior to each instance dates' stock value is  4

cat('\nThe number of times this stock has decreased exactly this number of days compared to its price ', lag, ' days ago, is ', stock_5$nTimesDecrDayCountsOccurs[length(stock_5$nTimesDecrDayCountsOccurs)],'\n')

## 
## The number of times this stock has decreased exactly this number of days compared to its price  1  days ago, is  438

cat('\nThe number of times this stock has increased exactly this number of days compared to its price ', lag, ' days ago, is ', stock_5$nTimesIncrDayCountsOccurs[length(stock_5$nTimesIncrDayCountsOccurs)],'\n')

## 
## The number of times this stock has increased exactly this number of days compared to its price  1  days ago, is  26

The unique number of days this stock selected (for the time period retrieved) decreased is shown in the table below.

stock_5[unique(stock_5$decrDaysThisCycle),c(1:3,8,9,13,14)]

##          Date stockName stockValue  lag1 today2_lag1 decrDaysThisCycle
## 3  2013-01-04      YELP      21.52 20.15   1.0679901                 3
## 5  2013-01-08      YELP      20.83 21.08   0.9881404                 3
## 1  2013-01-02      YELP      19.70 18.85   1.0450928                 0
## 2  2013-01-03      YELP      20.15 19.70   1.0228426                 0
## 6  2013-01-09      YELP      20.76 20.83   0.9966395                 3
## 4  2013-01-07      YELP      21.08 21.52   0.9795539                 3
## 7  2013-01-10      YELP      22.19 20.76   1.0688825                 5
## 9  2013-01-14      YELP      21.97 22.12   0.9932188                 5
## 11 2013-01-16      YELP      20.36 20.61   0.9878700                 5
## 8  2013-01-11      YELP      22.12 22.19   0.9968454                 5
## 10 2013-01-15      YELP      20.61 21.97   0.9380974                 5
##    nTimesDecrDayCountsOccurs
## 3                         67
## 5                         67
## 1                        438
## 2                        438
## 6                         67
## 4                         67
## 7                          9
## 9                          9
## 11                         9
## 8                          9
## 10                         9

The unique number of days this stock selected (for the time period retrieved) increased is shown in the table below.

stock_5[unique(stock_5$incrDaysThisCycle),c(1:3,8,9,18,19)]

##          Date stockName stockValue  lag1 today2_lag1 incrDaysThisCycle
## 2  2013-01-03      YELP      20.15 19.70   1.0228426                 2
## 1  2013-01-02      YELP      19.70 18.85   1.0450928                 2
## 3  2013-01-04      YELP      21.52 20.15   1.0679901                 2
## 4  2013-01-07      YELP      21.08 21.52   0.9795539                 0
## 6  2013-01-09      YELP      20.76 20.83   0.9966395                 1
## 9  2013-01-14      YELP      21.97 22.12   0.9932188                 0
## 7  2013-01-10      YELP      22.19 20.76   1.0688825                 1
## 5  2013-01-08      YELP      20.83 21.08   0.9881404                 0
## 12 2013-01-17      YELP      20.27 20.36   0.9955796                 1
##    nTimesIncrDayCountsOccurs
## 2                        121
## 1                        121
## 3                        121
## 4                        435
## 6                        241
## 9                        435
## 7                        241
## 5                        435
## 12                       241

%%%%%%%%%%%%%%%%%

colnames(stock_5)

##  [1] "Date"                            "stockName"                      
##  [3] "stockValue"                      "startDayValue"                  
##  [5] "startDayDate"                    "finalDayValue"                  
##  [7] "finalDayDate"                    "lag1"                           
##  [9] "today2_lag1"                     "todayGrtrThan_lag1"             
## [11] "cumulativeSumTodayGrtrThan_lag1" "nRepeatsTodayGrtrThan_lag1"     
## [13] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
## [15] "todayLessThan_lag1"              "cumulativeSumTodayLessThan_lag1"
## [17] "nRepeatsTodayLessThan_lag1"      "incrDaysThisCycle"              
## [19] "nTimesIncrDayCountsOccurs"

Lets remove the date fields and stock name. We will name the rows of this matrix by the date before deleting the Date field.

stock_6 <- stock_5
row.names(stock_6) <- stock_6$Date
stock_6 <- stock_6[,-c(1,2,4:7)]#removes the date, stock name, and start and final dates/values

stock_6$NextDayPrice <- lead(stock_6$stockValue,lead)
stock_6 <- stock_6[complete.cases(stock_6$NextDayPrice),]
stock_6$NextDay2PercentIncrease <- ifelse(stock_6$NextDayPrice/stock_6$stockValue >1.02, 1,0)

stock_6$NextDay2PercentIncrease <- as.factor(stock_6$NextDay2PercentIncrease)

head(stock_6)

##            stockValue  lag1 today2_lag1 todayGrtrThan_lag1
## 2013-01-02      19.70 18.85   1.0450928                  1
## 2013-01-03      20.15 19.70   1.0228426                  1
## 2013-01-04      21.52 20.15   1.0679901                  1
## 2013-01-07      21.08 21.52   0.9795539                  0
## 2013-01-08      20.83 21.08   0.9881404                  0
## 2013-01-09      20.76 20.83   0.9966395                  0
##            cumulativeSumTodayGrtrThan_lag1 nRepeatsTodayGrtrThan_lag1
## 2013-01-02                               1                          1
## 2013-01-03                               2                          1
## 2013-01-04                               3                          4
## 2013-01-07                               3                          4
## 2013-01-08                               3                          4
## 2013-01-09                               3                          4
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag1
## 2013-01-02                 0                       438                  0
## 2013-01-03                 0                       438                  0
## 2013-01-04                 3                        67                  0
## 2013-01-07                 3                        67                  1
## 2013-01-08                 3                        67                  1
## 2013-01-09                 3                        67                  1
##            cumulativeSumTodayLessThan_lag1 nRepeatsTodayLessThan_lag1
## 2013-01-02                               0                          3
## 2013-01-03                               0                          3
## 2013-01-04                               0                          3
## 2013-01-07                               1                          1
## 2013-01-08                               2                          1
## 2013-01-09                               3                          2
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDayPrice
## 2013-01-02                 2                       121        21.44
## 2013-01-03                 2                       121        21.96
## 2013-01-04                 2                       121        21.88
## 2013-01-07                 0                       435        21.89
## 2013-01-08                 0                       435        22.00
## 2013-01-09                 1                       241        22.18
##            NextDay2PercentIncrease
## 2013-01-02                       1
## 2013-01-03                       1
## 2013-01-04                       0
## 2013-01-07                       1
## 2013-01-08                       1
## 2013-01-09                       1

colnames(stock_6)

##  [1] "stockValue"                      "lag1"                           
##  [3] "today2_lag1"                     "todayGrtrThan_lag1"             
##  [5] "cumulativeSumTodayGrtrThan_lag1" "nRepeatsTodayGrtrThan_lag1"     
##  [7] "decrDaysThisCycle"               "nTimesDecrDayCountsOccurs"      
##  [9] "todayLessThan_lag1"              "cumulativeSumTodayLessThan_lag1"
## [11] "nRepeatsTodayLessThan_lag1"      "incrDaysThisCycle"              
## [13] "nTimesIncrDayCountsOccurs"       "NextDayPrice"                   
## [15] "NextDay2PercentIncrease"

For this set that is column 14 for ‘NextDayPrice’ to remove before running our predictive analytics algorithms.

set.seed(12356789)

NextDayPrice <- as.vector(stock_6$NextDayPrice)
stock_6 <- stock_6[,-14]

inTrain <- createDataPartition(y=stock_6$NextDay2PercentIncrease, p=0.7, list=FALSE)

trainingSet <- stock_6[inTrain,]
testingSet <- stock_6[-inTrain,]

Here are the first few observations in the training set.

head(trainingSet)

##            stockValue  lag1 today2_lag1 todayGrtrThan_lag1
## 2013-01-04      21.52 20.15   1.0679901                  1
## 2013-01-07      21.08 21.52   0.9795539                  0
## 2013-01-08      20.83 21.08   0.9881404                  0
## 2013-01-09      20.76 20.83   0.9966395                  0
## 2013-01-10      22.19 20.76   1.0688825                  1
## 2013-01-14      21.97 22.12   0.9932188                  0
##            cumulativeSumTodayGrtrThan_lag1 nRepeatsTodayGrtrThan_lag1
## 2013-01-04                               3                          4
## 2013-01-07                               3                          4
## 2013-01-08                               3                          4
## 2013-01-09                               3                          4
## 2013-01-10                               4                          6
## 2013-01-14                               4                          6
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag1
## 2013-01-04                 3                        67                  0
## 2013-01-07                 3                        67                  1
## 2013-01-08                 3                        67                  1
## 2013-01-09                 3                        67                  1
## 2013-01-10                 5                         9                  0
## 2013-01-14                 5                         9                  1
##            cumulativeSumTodayLessThan_lag1 nRepeatsTodayLessThan_lag1
## 2013-01-04                               0                          3
## 2013-01-07                               1                          1
## 2013-01-08                               2                          1
## 2013-01-09                               3                          2
## 2013-01-10                               3                          2
## 2013-01-14                               5                          1
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease
## 2013-01-04                 2                       121                       0
## 2013-01-07                 0                       435                       1
## 2013-01-08                 0                       435                       1
## 2013-01-09                 1                       241                       1
## 2013-01-10                 1                       241                       0
## 2013-01-14                 0                       435                       0

Here are the first few observations in the testing set.

head(testingSet)

##            stockValue  lag1 today2_lag1 todayGrtrThan_lag1
## 2013-01-02      19.70 18.85   1.0450928                  1
## 2013-01-03      20.15 19.70   1.0228426                  1
## 2013-01-11      22.12 22.19   0.9968454                  0
## 2013-01-23      20.34 19.90   1.0221106                  1
## 2013-01-25      21.17 20.83   1.0163226                  1
## 2013-02-11      21.63 21.85   0.9899314                  0
##            cumulativeSumTodayGrtrThan_lag1 nRepeatsTodayGrtrThan_lag1
## 2013-01-02                               1                          1
## 2013-01-03                               2                          1
## 2013-01-11                               4                          6
## 2013-01-23                               6                          1
## 2013-01-25                               8                          4
## 2013-02-11                              12                          2
##            decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag1
## 2013-01-02                 0                       438                  0
## 2013-01-03                 0                       438                  0
## 2013-01-11                 5                         9                  1
## 2013-01-23                 0                       438                  0
## 2013-01-25                 3                        67                  0
## 2013-02-11                 1                       251                  1
##            cumulativeSumTodayLessThan_lag1 nRepeatsTodayLessThan_lag1
## 2013-01-02                               0                          3
## 2013-01-03                               0                          3
## 2013-01-11                               4                          1
## 2013-01-23                               9                          4
## 2013-01-25                               9                          4
## 2013-02-11                              16                          3
##            incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease
## 2013-01-02                 2                       121                       1
## 2013-01-03                 2                       121                       1
## 2013-01-11                 0                       435                       0
## 2013-01-23                 3                        65                       1
## 2013-01-25                 3                        65                       1
## 2013-02-11                 2                       121                       1

rfMod <- train(NextDay2PercentIncrease ~ ., method='rf', data=(trainingSet), 
               trControl=trainControl(method='cv'), number=5)

predRF <- predict(rfMod, testingSet)

predDF <- data.frame(predRF, NextDay2PercentIncrease=testingSet$NextDay2PercentIncrease)
print(rbind(head(predDF,10),tail(predDF,10)))

##     predRF NextDay2PercentIncrease
## 1        1                       1
## 2        1                       1
## 3        0                       0
## 4        1                       1
## 5        1                       1
## 6        1                       1
## 7        1                       1
## 8        1                       1
## 9        1                       1
## 10       1                       1
## 526      0                       0
## 527      0                       0
## 528      0                       0
## 529      0                       0
## 530      0                       1
## 531      1                       1
## 532      0                       1
## 533      0                       0
## 534      0                       0
## 535      0                       0

sum <- sum(predRF==testingSet$NextDay2PercentIncrease) 
length <- length(testingSet$NextDay2PercentIncrease)
accuracy_rfMod <- (sum/length) 
accuracy_rfMod

## [1] 0.9121495

results <- c(round(accuracy_rfMod,2), round(100,2))
results <- as.factor(results)
results <- t(data.frame(results))

colnames(results) <- colnames(predDF)
Results <- rbind(predDF, results) 
print(rbind(head(Results),tail(Results)))

##         predRF NextDay2PercentIncrease
## 1            1                       1
## 2            1                       1
## 3            0                       0
## 4            1                       1
## 5            1                       1
## 6            1                       1
## 531          1                       1
## 532          0                       1
## 533          0                       0
## 534          0                       0
## 535          0                       0
## results   0.91                     100

knnMod <- train(NextDay2PercentIncrease ~ .,
                method='knn', preProcess=c('center','scale'),
                tuneLength=10, metric='Accuracy',
                trControl=trainControl(method='cv'), data=trainingSet)

rpartMod <- train(NextDay2PercentIncrease ~ ., 
                        method='rpart', tuneLength=9, metric='Accuracy',
                        data=trainingSet)

glmMod <- train(NextDay2PercentIncrease ~ ., metric='Kappa',
                method='glm', data=trainingSet)

predKNN <- predict(knnMod, testingSet)

predRPART <- predict(rpartMod, testingSet)

predGLM <- predict(glmMod, testingSet)

length=length(testingSet$NextDay2PercentIncrease)
length

## [1] 535

sumKNN <- sum(predKNN==testingSet$NextDay2PercentIncrease)
sumRPart <- sum(predRPART==testingSet$NextDay2PercentIncrease)
sumGLM <- sum(predGLM==testingSet$NextDay2PercentIncrease)

cat('The number correct for KNN is ',sumKNN, ', the number correct for Rpart is ',sumRPart,
    ' and the number correct for GLM is ',sumGLM,'.')

## The number correct for KNN is  363 , the number correct for Rpart is  464  and the number correct for GLM is  349 .

accuracy_KNN <- sumKNN/length 
accuracy_RPART <- sumRPart/length 
accuracy_GLM <- sumGLM/length 

cat('The accuracy for KNN, RPart, and GLM respectively is: ',
    accuracy_KNN,accuracy_RPART,accuracy_GLM,'.')

## The accuracy for KNN, RPart, and GLM respectively is:  0.6785047 0.8672897 0.6523364 .

predDF2 <- data.frame(predRF,predKNN,predRPART,predGLM, 
                      NextDay2PercentIncrease=testingSet$NextDay2PercentIncrease)
colnames(predDF2) <- c('RandomForest','KNN','Rpart','GLM','NextDay2PercentIncrease')

results <- c(round(accuracy_rfMod,2),  
             round(accuracy_KNN,2), 
             round(accuracy_RPART,2),
             round(accuracy_GLM,2), 
             round(100,2))

results <- as.factor(results)
results <- t(data.frame(results))
colnames(results) <- c('RandomForest','KNN','Rpart','GLM',
                       'NextDay2PercentIncrease')
Results <- rbind(predDF2, results) 
Results1 <- datatable(data=Results, rownames=FALSE,
                      filter=list(position='top'),
                      options=list(
                        dom='Bfrtip',
                        buttons=c('csv'),
                        language=list(sSearch='Filter:')),
                      extensions=c('Buttons','Responsive'))
Results1

Lets now align these predicted values to the subset for this stock made earlier.

stock_6$Date <- row.names(stock_6)
testingSet$Date <- row.names(testingSet)
predDF2$Date <- row.names(testingSet)

stock_7 <- merge(testingSet,predDF2, by.x='Date', by.y='Date')
tbl <- rbind(head(stock_7),tail(stock_7))
tbl

##           Date stockValue  lag1 today2_lag1 todayGrtrThan_lag1
## 1   2013-01-02      19.70 18.85   1.0450928                  1
## 2   2013-01-03      20.15 19.70   1.0228426                  1
## 3   2013-01-11      22.12 22.19   0.9968454                  0
## 4   2013-01-23      20.34 19.90   1.0221106                  1
## 5   2013-01-25      21.17 20.83   1.0163226                  1
## 6   2013-02-11      21.63 21.85   0.9899314                  0
## 530 2019-11-27      35.21 35.13   1.0022773                  1
## 531 2019-12-10      33.03 33.08   0.9984885                  0
## 532 2019-12-16      33.78 33.55   1.0068554                  1
## 533 2019-12-26      34.66 34.06   1.0176160                  1
## 534 2019-12-27      34.85 34.66   1.0054818                  1
## 535 2020-01-03      34.72 34.77   0.9985620                  0
##     cumulativeSumTodayGrtrThan_lag1 nRepeatsTodayGrtrThan_lag1
## 1                                 1                          1
## 2                                 2                          1
## 3                                 4                          6
## 4                                 6                          1
## 5                                 8                          4
## 6                                12                          2
## 530                             872                          3
## 531                             874                          6
## 532                             877                          1
## 533                             881                          1
## 534                             882                          2
## 535                             883                          4
##     decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag1
## 1                   0                       438                  0
## 2                   0                       438                  0
## 3                   5                         9                  1
## 4                   0                       438                  0
## 5                   3                        67                  0
## 6                   1                       251                  1
## 530                 2                       111                  0
## 531                 5                         9                  1
## 532                 0                       438                  0
## 533                 0                       438                  0
## 534                 1                       251                  0
## 535                 3                        67                  1
##     cumulativeSumTodayLessThan_lag1 nRepeatsTodayLessThan_lag1
## 1                                 0                          3
## 2                                 0                          3
## 3                                 4                          1
## 4                                 9                          4
## 5                                 9                          4
## 6                                16                          3
## 530                             868                          3
## 531                             874                          1
## 532                             875                          6
## 533                             878                          3
## 534                             878                          3
## 535                             881                          1
##     incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease.x
## 1                   2                       121                         1
## 2                   2                       121                         1
## 3                   0                       435                         0
## 4                   3                        65                         1
## 5                   3                        65                         1
## 6                   2                       121                         1
## 530                 2                       121                         1
## 531                 0                       435                         1
## 532                 5                         9                         1
## 533                 2                       121                         0
## 534                 2                       121                         0
## 535                 0                       435                         0
##     RandomForest KNN Rpart GLM NextDay2PercentIncrease.y
## 1              1   1     1   1                         1
## 2              1   1     1   1                         1
## 3              0   1     1   1                         0
## 4              1   1     1   1                         1
## 5              1   1     1   1                         1
## 6              1   1     1   1                         1
## 530            0   0     0   0                         1
## 531            1   1     1   0                         1
## 532            0   0     0   0                         1
## 533            0   0     0   0                         0
## 534            0   0     0   0                         0
## 535            0   0     0   0                         0

stock_7$RandomForest <- as.numeric(paste(stock_7$RandomForest))
stock_7$Rpart <- as.numeric(paste(stock_7$Rpart))
stock_7$GLM <- as.numeric(paste(stock_7$GLM))
stock_7$NextDay2PercentIncrease.y <-
  as.numeric(paste(stock_7$NextDay2PercentIncrease.y))
stock_7$KNN <- as.numeric(paste(stock_7$KNN))

stock_7$mostVoted <- ifelse(rowSums(stock_7[,16:19])>=2,1,0)
                          
sMost <- sum(stock_7$mostVoted==stock_7$NextDay2PercentIncrease.y)
lMost <- length(stock_7$NextDay2PercentIncrease.y)
accMost <- sMost/lMost
results <- as.data.frame(c(round(accuracy_rfMod,2),  
             round(accuracy_KNN,2), 
             round(accuracy_RPART,2),
             round(accuracy_GLM,2), 
             round(accMost,2),
             round(100,2)))
colnames(results) <- 'Results'
row.names(results) <- c('RandomForest','KNN','Rpart','GLM','mostVoted',
                       'NextDay2PercentIncrease')

results

##                         Results
## RandomForest               0.91
## KNN                        0.68
## Rpart                      0.87
## GLM                        0.65
## mostVoted                  0.84
## NextDay2PercentIncrease  100.00

The above accuracy measures for random forest, recursive partitioning trees, k-nearest neighbors, and generalized linear models and the most voted algorithm for this classification were shown. When analyzing YELP for predicting whether the stock would increase by more than 2% in the next 30 days, the highest accuracy was with Random Forest with 91% accuracy, then Rpart with 87% accuracy. The time interval was also 2012-12-31 through 2020-03-25, with missing information the last 30 days of the time span to predict, as this script was made 3/25/2020, and 30 days in the future isn’t available. The lag to compare and make counts was 1 day.

The following is the table of the first 30 predicted results on the testing set of samples. Also, add back in the NextDayPrice extracted earlier so the results weren’t dependent on the next day price when predicting whether the next day would be increased by 2%. Note that because not all stocks have the same time series start dates, the size of each stock time series being selected will vary with some stocks.

stock_6$NextDayPrice <- NextDayPrice #add back in to original complete dataset
stock_6b <- stock_6[,15:16]
stock_8 <- merge(stock_7,stock_6b,by.x='Date',by.y='Date')

print(rbind(head(stock_8,30),tail(stock_8,30)))

##           Date stockValue  lag1 today2_lag1 todayGrtrThan_lag1
## 1   2013-01-02      19.70 18.85   1.0450928                  1
## 2   2013-01-03      20.15 19.70   1.0228426                  1
## 3   2013-01-11      22.12 22.19   0.9968454                  0
## 4   2013-01-23      20.34 19.90   1.0221106                  1
## 5   2013-01-25      21.17 20.83   1.0163226                  1
## 6   2013-02-11      21.63 21.85   0.9899314                  0
## 7   2013-02-14      21.44 21.79   0.9839376                  0
## 8   2013-02-19      21.88 21.96   0.9963570                  0
## 9   2013-02-21      22.00 21.89   1.0050251                  1
## 10  2013-02-25      21.62 22.18   0.9747520                  0
## 11  2013-02-28      22.19 21.86   1.0150961                  1
## 12  2013-03-06      23.02 23.02   1.0000000                  0
## 13  2013-03-07      23.46 23.02   1.0191138                  1
## 14  2013-03-15      25.10 25.05   1.0019960                  1
## 15  2013-03-18      24.50 25.10   0.9760956                  0
## 16  2013-03-19      24.28 24.50   0.9910204                  0
## 17  2013-03-20      24.12 24.28   0.9934102                  0
## 18  2013-03-22      23.79 24.00   0.9912500                  0
## 19  2013-03-25      23.49 23.79   0.9873897                  0
## 20  2013-04-01      22.89 23.71   0.9654154                  0
## 21  2013-04-09      24.62 25.00   0.9848000                  0
## 22  2013-04-10      25.75 24.62   1.0458976                  1
## 23  2013-04-11      26.35 25.75   1.0233010                  1
## 24  2013-04-16      25.71 25.83   0.9953542                  0
## 25  2013-04-22      25.32 25.50   0.9929412                  0
## 26  2013-04-24      24.97 25.29   0.9873468                  0
## 27  2013-04-30      26.03 26.62   0.9778362                  0
## 28  2013-05-08      29.91 30.67   0.9752201                  0
## 29  2013-05-09      30.67 29.91   1.0254096                  1
## 30  2013-05-14      30.97 30.92   1.0016171                  1
## 506 2019-07-24      35.44 34.48   1.0278422                  1
## 507 2019-07-26      35.42 34.89   1.0151906                  1
## 508 2019-07-29      35.43 35.42   1.0002823                  1
## 509 2019-08-02      35.34 34.93   1.0117378                  1
## 510 2019-08-05      33.53 35.34   0.9487832                  0
## 511 2019-08-19      32.92 33.17   0.9924631                  0
## 512 2019-08-22      32.89 33.23   0.9897683                  0
## 513 2019-08-23      31.77 32.89   0.9659471                  0
## 514 2019-08-26      32.03 31.77   1.0081838                  1
## 515 2019-09-04      32.84 33.21   0.9888588                  0
## 516 2019-09-13      37.44 36.52   1.0251917                  1
## 517 2019-09-26      35.08 34.96   1.0034325                  1
## 518 2019-09-27      34.42 35.08   0.9811859                  0
## 519 2019-09-30      34.75 34.42   1.0095874                  1
## 520 2019-10-02      33.33 34.86   0.9561102                  0
## 521 2019-10-23      33.53 33.84   0.9908392                  0
## 522 2019-11-01      35.06 34.51   1.0159374                  1
## 523 2019-11-04      34.24 35.06   0.9766115                  0
## 524 2019-11-05      33.16 34.24   0.9684579                  0
## 525 2019-11-06      32.71 33.16   0.9864294                  0
## 526 2019-11-18      34.78 34.61   1.0049119                  1
## 527 2019-11-20      34.23 34.35   0.9965066                  0
## 528 2019-11-21      34.14 34.23   0.9973707                  0
## 529 2019-11-26      35.13 35.01   1.0034276                  1
## 530 2019-11-27      35.21 35.13   1.0022773                  1
## 531 2019-12-10      33.03 33.08   0.9984885                  0
## 532 2019-12-16      33.78 33.55   1.0068554                  1
## 533 2019-12-26      34.66 34.06   1.0176160                  1
## 534 2019-12-27      34.85 34.66   1.0054818                  1
## 535 2020-01-03      34.72 34.77   0.9985620                  0
##     cumulativeSumTodayGrtrThan_lag1 nRepeatsTodayGrtrThan_lag1
## 1                                 1                          1
## 2                                 2                          1
## 3                                 4                          6
## 4                                 6                          1
## 5                                 8                          4
## 6                                12                          2
## 7                                14                          2
## 8                                15                          2
## 9                                17                          1
## 10                               18                          3
## 11                               20                          1
## 12                               22                          3
## 13                               23                          1
## 14                               28                          7
## 15                               28                          7
## 16                               28                          7
## 17                               28                          7
## 18                               28                          7
## 19                               28                          7
## 20                               30                          4
## 21                               33                          3
## 22                               34                          1
## 23                               35                          1
## 24                               36                          3
## 25                               38                          4
## 26                               38                          4
## 27                               41                          3
## 28                               42                          5
## 29                               43                          1
## 30                               45                          2
## 506                             828                          2
## 507                             829                          1
## 508                             830                          1
## 509                             832                          2
## 510                             832                          2
## 511                             838                          3
## 512                             839                          3
## 513                             839                          3
## 514                             840                          1
## 515                             843                          4
## 516                             849                          1
## 517                             852                          2
## 518                             852                          2
## 519                             853                          1
## 520                             854                          3
## 521                             861                          3
## 522                             865                          5
## 523                             865                          5
## 524                             865                          5
## 525                             865                          5
## 526                             869                          4
## 527                             869                          4
## 528                             869                          4
## 529                             871                          1
## 530                             872                          3
## 531                             874                          6
## 532                             877                          1
## 533                             881                          1
## 534                             882                          2
## 535                             883                          4
##     decrDaysThisCycle nTimesDecrDayCountsOccurs todayLessThan_lag1
## 1                   0                       438                  0
## 2                   0                       438                  0
## 3                   5                         9                  1
## 4                   0                       438                  0
## 5                   3                        67                  0
## 6                   1                       251                  1
## 7                   1                       251                  1
## 8                   1                       251                  1
## 9                   0                       438                  0
## 10                  2                       111                  1
## 11                  0                       438                  0
## 12                  2                       111                  1
## 13                  0                       438                  0
## 14                  6                         6                  0
## 15                  6                         6                  1
## 16                  6                         6                  1
## 17                  6                         6                  1
## 18                  6                         6                  1
## 19                  6                         6                  1
## 20                  3                        67                  1
## 21                  2                       111                  1
## 22                  0                       438                  0
## 23                  0                       438                  0
## 24                  2                       111                  1
## 25                  3                        67                  1
## 26                  3                        67                  1
## 27                  2                       111                  1
## 28                  4                        20                  1
## 29                  0                       438                  0
## 30                  1                       251                  0
## 506                 1                       251                  0
## 507                 0                       438                  0
## 508                 0                       438                  0
## 509                 1                       251                  0
## 510                 1                       251                  1
## 511                 2                       111                  1
## 512                 2                       111                  1
## 513                 2                       111                  1
## 514                 0                       438                  0
## 515                 3                        67                  1
## 516                 0                       438                  0
## 517                 1                       251                  0
## 518                 1                       251                  1
## 519                 0                       438                  0
## 520                 2                       111                  1
## 521                 2                       111                  1
## 522                 4                        20                  0
## 523                 4                        20                  1
## 524                 4                        20                  1
## 525                 4                        20                  1
## 526                 3                        67                  0
## 527                 3                        67                  1
## 528                 3                        67                  1
## 529                 0                       438                  0
## 530                 2                       111                  0
## 531                 5                         9                  1
## 532                 0                       438                  0
## 533                 0                       438                  0
## 534                 1                       251                  0
## 535                 3                        67                  1
##     cumulativeSumTodayLessThan_lag1 nRepeatsTodayLessThan_lag1
## 1                                 0                          3
## 2                                 0                          3
## 3                                 4                          1
## 4                                 9                          4
## 5                                 9                          4
## 6                                16                          3
## 7                                17                          2
## 8                                18                          4
## 9                                18                          4
## 10                               19                          1
## 11                               20                          5
## 12                               22                          4
## 13                               22                          4
## 14                               23                          4
## 15                               24                          1
## 16                               25                          1
## 17                               26                          1
## 18                               28                          1
## 19                               29                          3
## 20                               31                          1
## 21                               34                          4
## 22                               34                          4
## 23                               34                          4
## 24                               36                          2
## 25                               38                          1
## 26                               40                          4
## 27                               41                          1
## 28                               46                          3
## 29                               46                          3
## 30                               47                          2
## 506                             823                          3
## 507                             824                          4
## 508                             824                          4
## 509                             826                          2
## 510                             827                          5
## 511                             831                          1
## 512                             833                          1
## 513                             834                          5
## 514                             834                          5
## 515                             837                          2
## 516                             838                          7
## 517                             844                          3
## 518                             845                          3
## 519                             845                          3
## 520                             846                          1
## 521                             854                          1
## 522                             857                          2
## 523                             858                          1
## 524                             859                          1
## 525                             860                          1
## 526                             864                          3
## 527                             866                          1
## 528                             867                          2
## 529                             868                          3
## 530                             868                          3
## 531                             874                          1
## 532                             875                          6
## 533                             878                          3
## 534                             878                          3
## 535                             881                          1
##     incrDaysThisCycle nTimesIncrDayCountsOccurs NextDay2PercentIncrease.x
## 1                   2                       121                         1
## 2                   2                       121                         1
## 3                   0                       435                         0
## 4                   3                        65                         1
## 5                   3                        65                         1
## 6                   2                       121                         1
## 7                   1                       241                         1
## 8                   3                        65                         1
## 9                   3                        65                         1
## 10                  0                       435                         1
## 11                  4                        26                         1
## 12                  3                        65                         1
## 13                  3                        65                         1
## 14                  3                        65                         1
## 15                  0                       435                         1
## 16                  0                       435                         1
## 17                  0                       435                         1
## 18                  0                       435                         1
## 19                  2                       121                         1
## 20                  0                       435                         1
## 21                  3                        65                         1
## 22                  3                        65                         1
## 23                  3                        65                         1
## 24                  1                       241                         1
## 25                  0                       435                         1
## 26                  3                        65                         1
## 27                  0                       435                         1
## 28                  2                       121                         1
## 29                  2                       121                         0
## 30                  1                       241                         1
## 506                 2                       121                         0
## 507                 3                        65                         0
## 508                 3                        65                         0
## 509                 1                       241                         1
## 510                 4                        26                         1
## 511                 0                       435                         1
## 512                 0                       435                         0
## 513                 4                        26                         1
## 514                 4                        26                         1
## 515                 1                       241                         1
## 516                 6                         8                         0
## 517                 2                       121                         0
## 518                 2                       121                         0
## 519                 2                       121                         1
## 520                 0                       435                         1
## 521                 0                       435                         0
## 522                 1                       241                         0
## 523                 0                       435                         0
## 524                 0                       435                         1
## 525                 0                       435                         1
## 526                 2                       121                         0
## 527                 0                       435                         0
## 528                 1                       241                         0
## 529                 2                       121                         0
## 530                 2                       121                         1
## 531                 0                       435                         1
## 532                 5                         9                         1
## 533                 2                       121                         0
## 534                 2                       121                         0
## 535                 0                       435                         0
##     RandomForest KNN Rpart GLM NextDay2PercentIncrease.y mostVoted NextDayPrice
## 1              1   1     1   1                         1         1        21.44
## 2              1   1     1   1                         1         1        21.96
## 3              0   1     1   1                         0         1        21.29
## 4              1   1     1   1                         1         1        23.46
## 5              1   1     1   1                         1         1        25.13
## 6              1   1     1   1                         1         1        23.68
## 7              1   1     1   1                         1         1        22.89
## 8              1   1     1   1                         1         1        23.05
## 9              1   1     1   1                         1         1        25.70
## 10             1   1     1   1                         1         1        24.62
## 11             1   1     1   1                         1         1        26.96
## 12             1   1     1   1                         1         1        25.33
## 13             1   1     1   1                         1         1        25.50
## 14             1   1     1   1                         1         1        26.62
## 15             1   1     1   1                         1         1        26.03
## 16             1   1     1   1                         1         1        25.30
## 17             1   1     1   1                         1         1        32.22
## 18             1   1     1   1                         1         1        30.69
## 19             1   1     1   1                         1         1        30.67
## 20             1   1     1   1                         1         1        30.92
## 21             1   1     1   1                         1         1        31.58
## 22             1   1     1   1                         1         1        29.40
## 23             1   1     1   1                         1         1        29.46
## 24             1   1     1   1                         1         1        30.09
## 25             1   1     1   1                         1         1        27.46
## 26             1   1     1   1                         1         1        29.09
## 27             1   1     1   1                         1         1        29.39
## 28             1   1     1   1                         1         1        31.00
## 29             1   1     1   1                         0         1        31.24
## 30             0   1     1   1                         1         1        34.87
## 506            0   0     0   0                         0         0        33.24
## 507            0   0     0   0                         0         0        33.23
## 508            0   0     0   0                         0         0        33.96
## 509            0   0     0   0                         1         0        38.21
## 510            0   0     0   0                         1         0        36.98
## 511            1   1     1   0                         1         1        34.86
## 512            1   1     1   0                         0         1        33.40
## 513            1   0     1   0                         1         1        33.28
## 514            1   1     1   0                         1         1        32.75
## 515            1   1     1   0                         1         1        33.61
## 516            0   0     0   0                         0         0        34.69
## 517            0   0     0   0                         0         0        30.12
## 518            0   0     0   0                         0         0        34.77
## 519            0   1     0   0                         1         0        36.06
## 520            1   1     0   0                         1         1        35.56
## 521            0   1     0   0                         0         0        33.42
## 522            0   0     0   0                         0         0        33.78
## 523            0   0     0   0                         0         0        34.00
## 524            1   0     1   0                         1         1        34.14
## 525            1   0     1   0                         1         1        34.11
## 526            0   0     0   0                         0         0        34.77
## 527            0   0     0   0                         0         0        34.44
## 528            0   1     0   0                         0         0        34.57
## 529            0   1     0   0                         0         0        35.28
## 530            0   0     0   0                         1         0        36.23
## 531            1   1     1   0                         1         1        36.12
## 532            0   0     0   0                         1         0        34.77
## 533            0   0     0   0                         0         0        34.20
## 534            0   0     0   0                         0         0        34.76
## 535            0   0     0   0                         0         0        34.57

%%%%%%%%%%%%%%%%%%%%%%

New Stocks Lags Counts Groups

Janis Corona

3/20/2020