Linear Regression Model

When we think of Linear Regression, date and time are not the kind of predictors that come to mind. However, for the purposes of this discussion a simple linear regression model will be developed. The stock I have chosen is EPAM, an Enerprise Design and Consulting firm. the date rnage for the stock price data is July 2 2018 to June 30 2020.

library(quantmod)
library(xts)
library(forecast)
library(fpp2)
library(dplyr)
library(magrittr)

# Reading downloaded EPAM stock prices
e1<- read.csv("C:/Users/hunai/Desktop/EPAM.csv", stringsAsFactors = FALSE)

Next we split the dataset into Training and Test data. Using a simple model, the forecast should be over only a small time period. Therefore the training data is from the start date of the data set, July 2 2018, to June 23 2020, and we will attempt to forecast stock price for the last week of June 2020.

We select the Date and Adjusted Closing Price columns from the data. SInce linear regression does not work with factors - R reads the Date as 500 levels of factos - we add a date_index column. The Date index essentially is a sequence of days starting from 0.

###SPLIT INTO TRAIN AND TEST DATA###
e1.train<- e1 %>% filter(Date>= as.Date('2018-07-02') & Date< as.Date('2020-06-23'))  %>% select(Date,Adj.Close) %>% mutate(date_index = difftime(Date,"2018-07-02", units = "days"))


e1.test<- e1 %>% filter(Date>= as.Date('2020-06-24')) %>% select(Date,Adj.Close) %>% mutate(date_index = difftime(Date,"2020-06-24", units = "days"))

The regression model looks as follows:

fit.train<- lm(Adj.Close ~ date_index,e1.train)
#summary(fit.train)

e1.pred<- e1.test %>% mutate(Pred.Close = predict(fit.train,newdata = e1.test))

e1.pred
##         Date Adj.Close date_index Pred.Close
## 1 2020-06-24    247.50     0 days   117.8058
## 2 2020-06-25    250.99     1 days   117.9626
## 3 2020-06-26    244.38     2 days   118.1194
## 4 2020-06-29    246.09     5 days   118.5898
rmse<- sqrt(sum((e1.pred$Pred.Close - e1.pred$Adj.Close)^2)/4)
rmse
## [1] 129.1462
#> rmse
#[1] 129.1462

The fit of the model is not one that any investor would prefer, as linear regression only predicts a straight line. Hence the stock shows constant increase. And the Root Mean Square Error is too high.

We next employ some methods from Hyndman’s text, namely teh Drift, Mean and Naive forecasting methods. This time we use R package ‘quantmod’ to get the data directly from the Yahoo Finance website. And instead of daily prices, we will go up a level and extract weekly Adjusted Closing price for EPAM.

the package ‘quantmod’ downloads it as an ‘xts’ object, which is very useful to work with in R. Initially to chart it according to the textbook, we need to convert it to a ts() object:

start <- as.Date("2018-07-02")
end <- as.Date("2020-05-31")

epam<- getSymbols.yahoo('EPAM', from = start, periodicity = 'weekly', auto.assign = FALSE)[,6]
chartSeries(epam, TA=NULL)

getSymbols.yahoo('EPAM', from = start, periodicity = 'weekly', auto.assign = FALSE)[,6] %>%
  na.omit() %>% ts(start = 2018, frequency = 52) -> epam.ts

head(epam.ts)
## Time Series:
## Start = c(2018, 1) 
## End = c(2018, 6) 
## Frequency = 52 
##      EPAM.Adjusted
## [1,]        128.26
## [2,]        133.56
## [3,]        133.75
## [4,]        130.57
## [5,]        125.14
## [6,]        130.27
## attr(,"index")
##   [1] 1530489600 1531094400 1531699200 1532304000 1532908800 1533513600
##   [7] 1534118400 1534723200 1535328000 1535932800 1536537600 1537142400
##  [13] 1537747200 1538352000 1538956800 1539561600 1540166400 1540771200
##  [19] 1541376000 1541980800 1542585600 1543190400 1543795200 1544400000
##  [25] 1545004800 1545609600 1546214400 1546819200 1547424000 1548028800
##  [31] 1548633600 1549238400 1549843200 1550448000 1551052800 1551657600
##  [37] 1552262400 1552867200 1553472000 1554076800 1554681600 1555286400
##  [43] 1555891200 1556496000 1557100800 1557705600 1558310400 1558915200
##  [49] 1559520000 1560124800 1560729600 1561334400 1561939200 1562544000
##  [55] 1563148800 1563753600 1564358400 1564963200 1565568000 1566172800
##  [61] 1566777600 1567382400 1567987200 1568592000 1569196800 1569801600
##  [67] 1570406400 1571011200 1571616000 1572220800 1572825600 1573430400
##  [73] 1574035200 1574640000 1575244800 1575849600 1576454400 1577059200
##  [79] 1577664000 1578268800 1578873600 1579478400 1580083200 1580688000
##  [85] 1581292800 1581897600 1582502400 1583107200 1583712000 1584316800
##  [91] 1584921600 1585526400 1586131200 1586736000 1587340800 1587945600
##  [97] 1588550400 1589155200 1589760000 1590364800 1590969600 1591574400
## [103] 1592179200 1592784000 1593388800 1593561600
## attr(,"index")attr(,"tzone")
## [1] UTC
## attr(,"index")attr(,"tclass")
## [1] Date
## attr(,"src")
## [1] yahoo
## attr(,"updated")
## [1] 2020-07-02 00:32:58 EDT
str(epam.ts)
##  Time-Series [1:106, 1] from 2018 to 2020: 128 134 134 131 125 ...
##  - attr(*, "index")= num [1:106] 1.53e+09 1.53e+09 1.53e+09 1.53e+09 1.53e+09 ...
##   ..- attr(*, "tzone")= chr "UTC"
##   ..- attr(*, "tclass")= chr "Date"
##  - attr(*, "src")= chr "yahoo"
##  - attr(*, "updated")= POSIXct[1:1], format: "2020-07-02 00:32:58"
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "EPAM.Adjusted"
autoplot(epam.ts) +
  autolayer(meanf(epam.ts, h=4),
            series="Mean", PI=FALSE) +
  autolayer(rwf(epam.ts, h=4),
            series="Na攼㹦ve", PI=FALSE) +
  autolayer(rwf(epam.ts, drift=TRUE, h=4),
            series="Drift", PI=FALSE) +
  ggtitle("EPAM stock (Weekly ending 31 May 2020)") +
  xlab("Day") + ylab("Closing Price (US$)") +
  guides(colour=guide_legend(title="Forecast"))

It should be noted that these methods are non-seasonal. However, these are very simple forecasts with the ‘forecast’ package.

Stock prices are dependent on many factors, which highly-paid Investment Analysts spend their life trying to analyze and predict: the general economy, seasonality and trend, job numbers, law and order, and - as we have recently seen - global events like wars and pandemics. These factors will need to be considered when using regression methods to predict Stock prices. Further into the course, we will learn about models like ARIMA, which have a strong potential for short term prediction.