Introduction

This is for my esteemed friend who wants to use analytics to predict stock price. I just want to let him (Sim Boon Hwa) know using price variable alone cannot predict up or down.

Downloaded UOB price April 1 2016 to Aug 29 2016 - weekly price.

Or https://sg.finance.yahoo.com/q/hp?s=U11.SI&a=00&b=3&c=2016&d=07&e=30&f=2016&g=w

## 'data.frame':    35 obs. of  7 variables:
##  $ Date     : chr  "1/4/2016" "1/11/2016" "1/18/2016" "1/25/2016" ...
##  $ Open     : num  19.7 18.3 17.4 17.9 18.1 ...
##  $ High     : num  19.7 18.4 17.8 18.2 18.3 ...
##  $ Low      : num  18.3 17.6 17 17.4 17.1 ...
##  $ Close    : num  18.4 17.6 17.6 18.1 17.9 ...
##  $ Volume   : int  3127100 3247100 4308600 2999700 2398600 1688000 4846800 3847600 4693200 2215500 ...
##  $ Adj.Close: num  17.7 16.9 17 17.4 17.2 ...
##         Date  Open  High   Low Close  Volume Adj.Close
## 1   1/4/2016 19.69 19.69 18.26 18.41 3127100     17.72
## 2  1/11/2016 18.30 18.35 17.55 17.60 3247100     16.94
## 3  1/18/2016 17.35 17.83 17.01 17.62 4308600     16.96
## 4  1/25/2016 17.94 18.17 17.42 18.09 2999700     17.42
## 5   2/1/2016 18.09 18.26 17.10 17.87 2398600     17.20
## 6   2/8/2016 17.87 17.87 17.14 17.56 1688000     16.91
## 7  2/15/2016 17.66 18.06 17.10 17.24 4846800     16.60
## 8  2/22/2016 17.17 17.35 16.80 17.05 3847600     16.42
## 9  2/29/2016 17.19 18.64 16.91 18.50 4693200     17.81
## 10  3/7/2016 18.51 18.70 18.05 18.65 2215500     17.96
## 11 3/14/2016 18.79 19.49 18.68 19.29 2427400     18.57
## 12 3/21/2016 19.32 19.32 18.52 18.65 2347200     17.96
## 13 3/28/2016 18.65 19.10 18.39 18.76 2685000     18.06
## 14  4/4/2016 18.85 18.95 18.30 18.53 2694900     17.84
## 15 4/11/2016 18.38 19.75 18.36 19.63 2799600     18.90
## 16 4/18/2016 19.40 20.00 19.35 19.65 2573800     18.92
## 17 4/25/2016 19.55 19.73 18.59 18.60 2400500     18.24
## 18  5/2/2016 18.60 18.80 17.70 17.79 2782600     17.44
## 19  5/9/2016 17.83 17.90 17.45 17.77 2427600     17.42
## 20 5/16/2016 17.72 18.07 17.52 17.94 2060200     17.59
## 21 5/23/2016 18.02 18.28 17.78 18.21 1745400     17.85
## 22 5/30/2016 18.19 18.57 18.04 18.30 1905900     17.94
## 23  6/6/2016 18.47 19.17 18.45 18.61 3091100     18.25
## 24 6/13/2016 18.25 18.34 17.88 17.92 2599100     17.57
## 25 6/20/2016 18.18 18.45 17.74 17.85 2379300     17.50
## 26 6/27/2016 17.74 18.72 17.41 18.52 4006700     18.16
## 27  7/4/2016 18.60 18.69 18.01 18.15 2422900     17.80
## 28 7/11/2016 18.37 18.89 18.28 18.74 2097100     18.37
## 29 7/18/2016 18.80 19.10 18.61 19.05 2099700     18.68
## 30 7/25/2016 19.07 19.11 18.19 18.20 3247900     17.84
## 31  8/1/2016 18.30 18.40 17.88 17.93 4976000     17.58
## 32  8/8/2016 18.02 18.22 17.72 17.92 2872900     17.57
## 33 8/15/2016 17.67 17.85 17.51 17.56 2296000     17.56
## 34 8/22/2016 17.55 18.17 17.51 18.05 2385500     18.05
## 35 8/29/2016 17.97 18.18 17.97 18.00 2798500     18.00

I use linear regression to predict the NEXT 1 to 2 WEEK price using past prices.

# Create the weeks vector since got 35 weeks
weeks <- 1:nrow(prices)

# Fit a linear model to predict
price_lm <- lm(prices$Adj.Close ~ weeks)


#plot original graph
plot(prices$Adj.Close, type="l", col="blue", lwd=2, ylab="Closing Prices", main="Price", xlab = "Weeks")

# Predict next 1 to 2 week price, isn't this what you want, week 36 and week 37

future_weeks <- data.frame(weeks = 36:37)
price_pred <- predict(price_lm, future_weeks)


# Plot historical data and predictions
plot(prices$Adj.Close ~ weeks, type="l", col="blue", lwd=2, ylab="Closing Prices", main="Price", xlab = "Weeks", xlim = c(1, 40))
points(36:37, price_pred, col = "green")

summary(price_lm)
## 
## Call:
## lm(formula = prices$Adj.Close ~ weeks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.12011 -0.38796 -0.06182  0.35486  1.20597 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 17.364202   0.188815  91.964   <2e-16 ***
## weeks        0.021989   0.009148   2.404    0.022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5466 on 33 degrees of freedom
## Multiple R-squared:  0.149,  Adjusted R-squared:  0.1232 
## F-statistic: 5.777 on 1 and 33 DF,  p-value: 0.02201

As you can see the GREEN DOTS in the graphs, it is going upwards even if i add more points, it is simply a straight line ! How can price of a stock goes up for perfect straight line ?

R-squared model : 12.32%

We have to first define the problem, yes ! Everyone wants to know the price up or down. Then we have to sit down and ask, WHAT IMPACTS PRICE ? List down possible variables that impact prices (of course not all information are available), that is why these are the difficult tasks. If so easy, i can sit down here and predict prices at my analytics keyboard without doing work.

Predict ACTUAL PRICES (better than say UP or DOWN ..i am GOD and give a number for next week price)

suppressWarnings(suppressMessages(library(h2o)))
                 
localH2O <- h2o.init(nthreads = -1)
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     C:\Users\admin\AppData\Local\Temp\RtmpOMHDUV/h2o_admin_started_from_r.out
##     C:\Users\admin\AppData\Local\Temp\RtmpOMHDUV/h2o_admin_started_from_r.err
## 
## 
## Starting H2O JVM and connecting:  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 251 milliseconds 
##     H2O cluster version:        3.8.3.3 
##     H2O cluster name:           H2O_started_from_R_admin_mum924 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   7.10 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     R Version:                  R version 3.3.0 (2016-05-03)
h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 486 milliseconds 
##     H2O cluster version:        3.8.3.3 
##     H2O cluster name:           H2O_started_from_R_admin_mum924 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   7.10 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     R Version:                  R version 3.3.0 (2016-05-03)
#convert to H2O frame
train.h2o <- as.h2o(prices)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%
### values below for columns
y.dep <- 7 #interested in adjusted close COLUMNS
x.indep <- c(2:6) # use all varibles COLUMNS 

#GBM

gbm.model <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 1000, max_depth = 4, learn_rate = 0.01, seed = 1122)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |===========================                                                               |  30%
  |                                                                                                
  |======================================================                                    |  60%
  |                                                                                                
  |========================================================================                  |  80%
  |                                                                                                
  |==========================================================================================| 100%
#see which variables are important. 
h2o.varimp(gbm.model)
## Variable Importances: 
##   variable relative_importance scaled_importance percentage
## 1    Close          386.340607          1.000000   0.751719
## 2   Volume           52.520115          0.135943   0.102191
## 3     High           42.168064          0.109147   0.082048
## 4      Low           23.390825          0.060545   0.045512
## 5     Open            9.523388          0.024650   0.018530
myprice <- data.frame(Close=18.23)

#convert to h20 frame
result <- as.h2o(myprice)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%
predict.gbm <- as.data.frame(h2o.predict(gbm.model, result))
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%

Estimated Stock Price is 18.

Conclusion : no logic !