CAPSTONE PROJECT ON STOCK PRICE PREDICTION OF APPLE INC. BASED ON HISTORICAL STOCK MARKET DATA FROM 1980-2018 LISTED ON NASDAQ

Made under the guidance of Prof. Sameer Mathur,IIM Lucknow as a part of the internship on Data Analytics with Managerial Applications.

1.INTRODUCTION

Predicting the Stock Market has been the bane and goal of investors since its existence. Everyday billions of dollars are traded on the exchange, and behind each dollar is an investor hoping to profit in one way or another. Entire companies rise and fall daily based on the behaviour of the market. Should an investor be able to accurately predict market movements, it offers a tantalizing promises of wealth and influence.

The efficient-market hypothesis suggests that stock prices reflect all currently available information and any price changes that are not based on newly revealed information thus are inherently unpredictable.A hypothesis which can near about predict the closing price of a stock on a particular day can become very handy for successful trading.

2.OVERVIEW OF STUDY

Apple Inc. is an American multinational technology company headquartered in Cupertino, California that designs, develops, and sells consumer electronics, computer software, and online services.

Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976 to develop and sell Wozniak’s Apple I personal computer. It was incorporated as Apple Computer, Inc. in January 1977, and sales of its computers, including the Apple II, saw significant momentum and revenue growth for the company.

Apple went public(IPO) in 1980 to instant financial success. Over the next few years, Apple shipped new computers featuring innovative graphical user interfaces, and Apple’s marketing commercials for its products received widespread critical acclaim.

Apple is the world’s largest information technology company by revenue and the world’s second-largest mobile phone manufacturer. In February 2015, Apple became the first U.S. company to be valued at over US$700 billion. Apple’s worldwide annual revenue totaled $229 billion for the 2017 fiscal year.

One of the most popular and fastest growing stock today is that of google.Thus in tis project I am trying to predict the future stock pricing of google by hypothesing a model based on the past stock data.

3.DESCRIPTION OF DATASET

The dataset has been sourced from ana online website. It is a comprehensive collection of daily records of stocks belonging to Apple INC listed on NASDAQ from 12-12-1980 to 27-02-2018.The meaning of the columns has been explained below:

1.Date

The particular date for which the stock details are listed.

2.Open

The opening price of stock on that date.

3.High

The highest price to which the stock rose on that date.

4.Low

The lowest price to which the stock went down on that date.

5.Close

The closing price of stock on that date.

6.Volume

The amount of stock traded on that date.

6.Dividend

A dividend is defined as a payment made by a corporation to its shareholders. Usually these payouts are made in cash (called “cash dividends”), but sometimes companies will also distribute stock dividends, whereby additional stock shares are distributed to shareholders. Stock dividends are also known as stock splits.

6.Split

When a company declares a stock split, the number of shares of that company increases, but the market cap remains the same. Existing shares split, but the underlying value remains the same. As the number of shares increases, price per share goes down. The split ratio is given in the dataset

7.Adjusted Open

An adjusted opening price is a stock’s opening price on any given day of trading that has been amended to include any distributions and corporate actions that occurred at any time prior to the last day’s close.

8.Adjusted High

The highest price to which the stock rose on that day adjusted for splits.

9.Adjusted Low

The lowest price to which the stock went down on that day adjusted for splits.

10.Adjusted Close

An adjusted closing price is a stock’s closing price on any given day of trading that has been amended to include any distributions and corporate actions that occurred at any time prior to the next day’s open

11.Adjusted Volume

When adjusting historical data for stock-splits, StockFetcher only adjusts the price, not volume data. Adjusting volume data creates “spikes” which can skew measures and create “false positives” in stock screens; when, in actuality, the volume “pattern” may have been continuous. Same as the volume.

4.Model(Hypothesis)

Based on the t- tests and chi square tests performed it was observed that the closing price of the stock depeneded on:-

  1. Volume
  2. Adjusted High
  3. Adjusted Low
  4. Adjusted Open
  5. Adjusted volume

5.Result

After running a linear regression on the hypotehsis stated it was observed that the hypothesis actually gained emeprical support from the model. And a linear regression model was formulated based on the hypothesis which can predict the closing price of stock using past data. The p-value was found to be <0.05 for the independent variables tested.

Further, the Multiple R-squared= 0.4657, Adjusted R-squared= 0.4654

6.Conclusion

Thus it can be concluded that the pricing of the stock is depenedent on the adjusted high,adjusted low,adjusted Open, volume and adjusted volume of the stock.

Thus the linear regression model can predict the price of Apple INC’s stock with the help of the historical stock data.

7.R Code and analysis

stock <- read.csv(paste("apple dataset.csv", sep=""))
summary(stock)
##          Date           Open             High             Low        
##  1980-12-12:   1   Min.   : 11.12   Min.   : 11.12   Min.   : 11.00  
##  1980-12-15:   1   1st Qu.: 27.00   1st Qu.: 27.50   1st Qu.: 26.50  
##  1980-12-16:   1   Median : 43.75   Median : 44.50   Median : 43.00  
##  1980-12-17:   1   Mean   :101.09   Mean   :102.32   Mean   : 99.75  
##  1980-12-18:   1   3rd Qu.:109.70   3rd Qu.:111.17   3rd Qu.:108.37  
##  1980-12-19:   1   Max.   :702.41   Max.   :705.07   Max.   :699.57  
##  (Other)   :9376                                                     
##      Close            Volume             Dividend            Split      
##  Min.   : 11.00   Min.   :     4471   Min.   :0.000000   Min.   :1.000  
##  1st Qu.: 27.00   1st Qu.:  1230750   1st Qu.:0.000000   1st Qu.:1.000  
##  Median : 43.75   Median :  3759200   Median :0.000000   Median :1.000  
##  Mean   :101.05   Mean   : 11962851   Mean   :0.003774   Mean   :1.001  
##  3rd Qu.:109.78   3rd Qu.: 17908300   3rd Qu.:0.000000   3rd Qu.:1.000  
##  Max.   :702.10   Max.   :189560600   Max.   :3.290000   Max.   :7.000  
##                                                                         
##     Adj_Open           Adj_High           Adj_Low        
##  Min.   :  0.1623   Min.   :  0.1623   Min.   :  0.1605  
##  1st Qu.:  0.9157   1st Qu.:  0.9329   1st Qu.:  0.8966  
##  Median :  1.4256   Median :  1.4539   Median :  1.3969  
##  Mean   : 21.1167   Mean   : 21.3165   Mean   : 20.9023  
##  3rd Qu.: 19.6046   3rd Qu.: 19.9391   3rd Qu.: 19.2787  
##  Max.   :179.1000   Max.   :180.4800   Max.   :178.1600  
##                                                          
##    Adj_Close          Adj_Volume       
##  Min.   :  0.1605   Min.   :2.504e+05  
##  1st Qu.:  0.9151   1st Qu.:3.475e+07  
##  Median :  1.4248   Median :6.080e+07  
##  Mean   : 21.1149   Mean   :8.873e+07  
##  3rd Qu.: 19.6084   3rd Qu.:1.111e+08  
##  Max.   :178.9700   Max.   :1.855e+09  
## 
View(stock)
attach(stock)
library(psych)
describe(stock)
##            vars    n        mean          sd      median     trimmed
## Date*         1 9382     4691.50     2708.49     4691.50     4691.50
## Open          2 9382      101.09      135.26       43.75       65.94
## High          3 9382      102.32      136.38       44.50       66.93
## Low           4 9382       99.75      133.92       43.00       64.90
## Close         5 9382      101.05      135.18       43.75       65.93
## Volume        6 9382 11962850.54 16636897.77  3759200.00  8522664.50
## Dividend      7 9382        0.00        0.09        0.00        0.00
## Split         8 9382        1.00        0.06        1.00        1.00
## Adj_Open      9 9382       21.12       38.44        1.43       11.41
## Adj_High     10 9382       21.32       38.75        1.45       11.54
## Adj_Low      11 9382       20.90       38.12        1.40       11.26
## Adj_Close    12 9382       21.11       38.45        1.42       11.40
## Adj_Volume   13 9382 88725799.58 87086110.15 60796366.50 72971312.39
##                    mad       min          max        range  skew kurtosis
## Date*          3477.44      1.00 9.382000e+03 9.381000e+03  0.00    -1.20
## Open             32.81     11.12 7.024100e+02 6.912900e+02  2.42     5.23
## High             33.27     11.12 7.050700e+02 6.939500e+02  2.42     5.22
## Low              32.25     11.00 6.995700e+02 6.885700e+02  2.42     5.24
## Close            32.81     11.00 7.021000e+02 6.911000e+02  2.42     5.23
## Volume      4731940.29   4471.00 1.895606e+08 1.895561e+08  2.34     8.13
## Dividend          0.00      0.00 3.290000e+00 3.290000e+00 31.44  1040.79
## Split             0.00      1.00 7.000000e+00 6.000000e+00 87.07  8006.29
## Adj_Open          1.47      0.16 1.791000e+02 1.789400e+02  2.07     3.45
## Adj_High          1.50      0.16 1.804800e+02 1.803200e+02  2.07     3.44
## Adj_Low           1.44      0.16 1.781600e+02 1.780000e+02  2.08     3.48
## Adj_Close         1.47      0.16 1.789700e+02 1.788100e+02  2.08     3.46
## Adj_Volume 47243592.37 250376.00 1.855410e+09 1.855160e+09  3.42    28.71
##                   se
## Date*          27.96
## Open            1.40
## High            1.41
## Low             1.38
## Close           1.40
## Volume     171761.03
## Dividend        0.00
## Split           0.00
## Adj_Open        0.40
## Adj_High        0.40
## Adj_Low         0.39
## Adj_Close       0.40
## Adj_Volume 899085.88
mytable <- with(stock, table(Dividend))
mytable
## Dividend
##    0 0.08  0.1 0.11 0.12 0.47 0.52 0.57 0.63 2.65 3.05 3.29 
## 9326    4    4    4   21    3    4    4    4    3    4    1
mytable1 <- with(stock, table(Split))
mytable1
## Split
##    1    2    7 
## 9378    3    1
attach(stock)
## The following objects are masked from stock (pos = 4):
## 
##     Adj_Close, Adj_High, Adj_Low, Adj_Open, Adj_Volume, Close,
##     Date, Dividend, High, Low, Open, Split, Volume
library(lattice)
histogram(~Open) 

histogram(~Close) 

boxplot(Open)

boxplot(Close)

boxplot(High)

boxplot(Low) 

histogram(~Volume)

plot(~Close + Date , main = "Closing price of stock with day number" , pch =1)

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplotMatrix(formula= ~ Close + Open + High + Low + Volume,cex=0.8)

cor(stock[, c(2:6 , 9:13)])
##                 Open      High       Low     Close    Volume    Adj_Open
## Open       1.0000000 0.9999281 0.9999002 0.9998257 0.2882580  0.53307853
## High       0.9999281 1.0000000 0.9998790 0.9999201 0.2895081  0.53210141
## Low        0.9999002 0.9998790 1.0000000 0.9999135 0.2860077  0.53424426
## Close      0.9998257 0.9999201 0.9999135 1.0000000 0.2878256  0.53316338
## Volume     0.2882580 0.2895081 0.2860077 0.2878256 1.0000000  0.61196816
## Adj_Open   0.5330785 0.5321014 0.5342443 0.5331634 0.6119682  1.00000000
## Adj_High   0.5334196 0.5325140 0.5346118 0.5335876 0.6139130  0.99995700
## Adj_Low    0.5320958 0.5311451 0.5333616 0.5322755 0.6090673  0.99994498
## Adj_Close  0.5326582 0.5317641 0.5339189 0.5329171 0.6113204  0.99990481
## Adj_Volume 0.1916405 0.1944722 0.1874925 0.1911254 0.5403455 -0.05206478
##               Adj_High     Adj_Low   Adj_Close  Adj_Volume
## Open        0.53341960  0.53209578  0.53265821  0.19164055
## High        0.53251400  0.53114505  0.53176410  0.19447224
## Low         0.53461177  0.53336156  0.53391886  0.18749247
## Close       0.53358757  0.53227548  0.53291712  0.19112537
## Volume      0.61391304  0.60906726  0.61132042  0.54034546
## Adj_Open    0.99995700  0.99994498  0.99990481 -0.05206478
## Adj_High    1.00000000  0.99993011  0.99995486 -0.05071799
## Adj_Low     0.99993011  1.00000000  0.99995482 -0.05426231
## Adj_Close   0.99995486  0.99995482  1.00000000 -0.05249642
## Adj_Volume -0.05071799 -0.05426231 -0.05249642  1.00000000
library(corrgram)
corrgram(stock[, c(2:6 , 9:13)] , order = T, text.panel=panel.txt,lower.panel = panel.shade,upper.panel = panel.pie, main="Corrgram of all variables")

t.test(Open, Close)
## 
##  Welch Two Sample t-test
## 
## data:  Open and Close
## t = 0.018981, df = 18762, p-value = 0.9849
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.832215  3.907160
## sample estimates:
## mean of x mean of y 
##  101.0878  101.0503
t.test(Low , Close)
## 
##  Welch Two Sample t-test
## 
## data:  Low and Close
## t = -0.66079, df = 18760, p-value = 0.5088
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.148646  2.552448
## sample estimates:
## mean of x mean of y 
##  99.75224 101.05034
t.test(High , Close)
## 
##  Welch Two Sample t-test
## 
## data:  High and Close
## t = 0.6408, df = 18761, p-value = 0.5217
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.615459  5.156211
## sample estimates:
## mean of x mean of y 
##  102.3207  101.0503
t.test(Volume , Close)
## 
##  Welch Two Sample t-test
## 
## data:  Volume and Close
## t = 69.648, df = 9381, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  11626061 12299438
## sample estimates:
##    mean of x    mean of y 
## 1.196285e+07 1.010503e+02
t.test(Adj_High , Close)
## 
##  Welch Two Sample t-test
## 
## data:  Adj_High and Close
## t = -54.921, df = 10912, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -82.57958 -76.88808
## sample estimates:
## mean of x mean of y 
##  21.31651 101.05034
t.test(Adj_Low, Close)
## 
##  Welch Two Sample t-test
## 
## data:  Adj_Low and Close
## t = -55.274, df = 10864, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -82.99038 -77.30578
## sample estimates:
## mean of x mean of y 
##  20.90226 101.05034
t.test(Adj_Open , Close)
## 
##  Welch Two Sample t-test
## 
## data:  Adj_Open and Close
## t = -55.092, df = 10888, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -82.77773 -77.08962
## sample estimates:
## mean of x mean of y 
##  21.11667 101.05034
t.test(Adj_Volume, Close)
## 
##  Welch Two Sample t-test
## 
## data:  Adj_Volume and Close
## t = 98.684, df = 9381, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  86963295 90488102
## sample estimates:
##    mean of x    mean of y 
## 8.872580e+07 1.010503e+02

FROM THE T-TESTS PERFORMED ABOVE IT IS INFERRED THAT THE PRICING OF THE STOCK PRIMARILY DEPENDS ON ADJUSTED HIGH, ADJUSTED LOW, ADJUSTED OPEN, VOLUME, ADJUSTED VOLUME.

model <- lm(formula = Close ~ Volume + Adj_High + Adj_Open + Adj_Low + Adj_Volume)
summary(model)
## 
## Call:
## lm(formula = Close ~ Volume + Adj_High + Adj_Open + Adj_Low + 
##     Adj_Volume)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -859.61  -34.41  -10.76   11.93  429.87 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.357e+01  1.634e+00   8.307   <2e-16 ***
## Volume      -6.064e-06  1.264e-07 -47.963   <2e-16 ***
## Adj_High     7.037e+01  3.274e+00  21.493   <2e-16 ***
## Adj_Open     6.232e+00  3.371e+00   1.849   0.0645 .  
## Adj_Low     -7.418e+01  3.017e+00 -24.590   <2e-16 ***
## Adj_Volume   8.915e-07  1.716e-08  51.944   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 98.84 on 9376 degrees of freedom
## Multiple R-squared:  0.4657, Adjusted R-squared:  0.4654 
## F-statistic:  1634 on 5 and 9376 DF,  p-value: < 2.2e-16
plot(model)