By Akul Mahajan
The main objective of this project is to explore the New York Stock Exchange Kaggle set and perform data analysis on the stock prices of apple.
Investing in stocks is always a good option as it is one of the best way your money grows, the bigger problem in this regard is there is no good stock all the time, and investing at a bad time can be disasterous, while investment in a low value stock at the right time can make your huge profits.
To determine if a stock is good or bad it becomes necessary to see the patterns based on which we can make a conclusion in regard to how a good stock behaves. Here we will analyse the patterns in the stock price of apple.
Solutions Overview: After the cleaning and exploratory analysis we will draw a conclusion about trends for a good stock based the analysis.
library(readr)
library(dplyr)
library(DT)
library(knitr)
library(forecast)
library(fpp2)
library(ggfortify)
The data has been taken from New York Stock Exchange dataset from Kaggle. This data set contains 7 variables with 951264 observations. The data set has the values of the opening and closing price of the stocks of companies listed on NYSE over a period of 6 years from 2010 to 2016.
directory <- "/Users/akul/Desktop/Kaggle -NYSE/prices.csv"
fundamentals_dir <- "/Users/akul/Desktop/Kaggle -NYSE/fundamentals.csv"
nyse_data <- read_csv(directory)
fundamentals_data <- read_csv(fundamentals_dir)
The imported dataset has 7 variables with 951264 observations
dim(nyse_data)
## [1] 851264 7
*Names of variables in the dataset
names(nyse_data)
## [1] "date" "symbol" "open" "close" "low" "high" "volume"
Overview of the imported dataset.
glimpse(nyse_data)
## Observations: 851,264
## Variables: 7
## $ date <dttm> 2016-01-05, 2016-01-06, 2016-01-07, 2016-01-08, 2016-0...
## $ symbol <chr> "WLTW", "WLTW", "WLTW", "WLTW", "WLTW", "WLTW", "WLTW",...
## $ open <dbl> 123.43, 125.24, 116.38, 115.48, 117.01, 115.51, 116.46,...
## $ close <dbl> 125.84, 119.98, 114.95, 116.62, 114.97, 115.55, 112.85,...
## $ low <dbl> 122.31, 119.94, 114.93, 113.50, 114.09, 114.50, 112.59,...
## $ high <dbl> 126.25, 125.54, 119.74, 117.44, 117.33, 116.06, 117.07,...
## $ volume <dbl> 2163600, 2386400, 2489500, 2006300, 1408600, 1098000, 9...
The first part of the data cleaning process involves checking if there are any variable names in the rows. We can see from the overview of the dataset, that the looks fairly clean. The next step is checking the data for missing values.
sum(is.na(nyse_data))
Here, we find there are no missing values in the dataset.
Now, we move on to the selecting the variables of interest, since our analysis is based on the price of stock of ‘Apple Inc’, we create a new dataset containing this information. In addition a new column name fluctuation is introduced which gives us the rise or drop in price of the stock for that day.
nyse_apple <- nyse_data %>%
mutate(fluctuation = close - open) %>%
filter(symbol == "AAPL") %>%
arrange(date)
names(fundamentals_data)[2] = "symbol"
fundamentals_data_apple <- filter(fundamentals_data,symbol == "AAPL")
dim(nyse_apple)
## [1] 1762 8
dim(fundamentals_data_apple)
## [1] 4 79
The new table created for the stock prices of apple has 1762 observations and 8 variables.
a new table fundamentals_data_apple is created that contains the features on basis of which the stock prices are evaluated.
Now we take the summary statistics of the data based on which we can identify if there are any abnormal values that are present.
kable(summary(nyse_apple))
| date | symbol | open | close | low | high | volume | fluctuation | |
|---|---|---|---|---|---|---|---|---|
| Min. :2010-01-04 00:00:00 | Length:1762 | Min. : 90.0 | Min. : 90.28 | Min. : 89.47 | Min. : 90.7 | Min. : 11475900 | Min. :-30.11999 | |
| 1st Qu.:2011-09-30 18:00:00 | Class :character | 1st Qu.:115.2 | 1st Qu.:115.19 | 1st Qu.:114.00 | 1st Qu.:116.4 | 1st Qu.: 49174775 | 1st Qu.: -1.97000 | |
| Median :2013-07-04 00:00:00 | Mode :character | Median :318.2 | Median :318.24 | Median :316.55 | Median :320.6 | Median : 80503850 | Median : 0.04499 | |
| Mean :2013-07-02 22:20:17 | NA | Mean :313.1 | Mean :312.93 | Mean :309.83 | Mean :315.9 | Mean : 94225776 | Mean : -0.14925 | |
| 3rd Qu.:2015-04-05 00:00:00 | NA | 3rd Qu.:470.9 | 3rd Qu.:472.59 | 3rd Qu.:467.97 | 3rd Qu.:478.1 | 3rd Qu.:121081625 | 3rd Qu.: 1.70001 | |
| Max. :2016-12-30 00:00:00 | NA | Max. :702.4 | Max. :702.10 | Max. :699.57 | Max. :705.1 | Max. :470249500 | Max. : 30.76001 |
From the summary table it is clear that there are no negative values for the columns of open, close, high, low, volume. Thus we eliminate any possibility of abnormal values though we might have some outliers.
datatable(head(nyse_apple,50))
In this section, I have performed analysis on the opening stock price of apple from 2010 to 2013. The plots include time series plots to study the correlation and lags that the stock price might follow, which will later on help us in determining the appropriate model for the data. I have also included seasonal plots to uncover if there are any patterns in the price of the stock.
nyse_apple_open_price <- nyse_apple[,c("open")]
nyse_apple_open_price <- nyse_apple_open_price[1:1050,]
appl.ts <- ts(nyse_apple_open_price, start = c(2010,1, 1), frequency = 365)
autoplot(appl.ts,ts.geom = 'bar', fill = 'blue') +
labs(x = "Time in Years", y = "Stock Price in $") +
ggtitle("Apple Stock Price")
The share price of apple is plotted for the years 2010 till 2013, here we see a gradual increase in share price over time but a sharp decrease in the price in 2012. A deeper analysis on the reasons for the sharp decline gave some interesting results about the performance of the company. The stock price fell by nearly 40%, vaporizing $300 billion dollars of market value for apple during this time period. The reasons for the fall were two fold, firstly Apple’s Profit margin dropped drastically, secondly iPhone sales grew by only 7% against the forecasted growth of 30%. This was primary attributed to the market saturation in developed countries and growth moving to emerging markets which Apple was unable to leverage on. Apple’s distribution was too limited in these markets and its products were too costly. For more comprehensive reading, please refer to the following [report] (http://www.businessinsider.com/two-charts-show-why-apple-stock-dropped-2013-4)
ggseasonplot(appl.ts,col=rainbow(3), year.labels=TRUE)+
labs(x = "Time in Months", y = "Stock Price in $") +
scale_x_discrete(1:12) + ggtitle("Season Plot for Apple Share")
The Seasonal plot unveils an interesting pattern in the price of the stock, we see a constant increase in share price around June in 2010, October in 2011 and September in 2012, which can be attributed to the launch of the new products for these respective years. The stock price is said to be the measure of performance of the company, which is evident from these patterns when the company launched innovative products there was an increase in the stock price and when there was a decline in the stock price, other parameters such as poor marketing strategy or decreased revenues followed the same trend.
ggAcf(appl.ts, lag.max = 20) +
ggtitle("Auto Correlation Function for Apple stock for 20 days")
Here we try to interpret the suitable forecasting model for the share price. One of the ways to do so is to study the correlation between the share price for a sequence of days. The lag plot provides a very comprehensive analysis in this regard. Here we try to compare the correlation of the share price with different lags. Lag 1 refers to calculating the correlation for share price on that day with the share price of the next day for all the days in the series, Lag 2 refers to comparing the price with the price of the share two days ahead and hence forth.
Here I have plotted an AcF plot for 20 days and a very high correlation for the entire set can be seen although it decreases gradually as the number of lags increase. The correlation associated with the lag plot helps us in understanding the autocorrelation function in the time series process.
To illustrate it more clearly, we plot the price of stocks for 6 days lags and observe that they follow a linear relationship, a consequence of the high correlation.
forecast::gglagplot(appl.ts, lags = 6) +
labs(x = "Stock Price in $", y = "Stock Price in $") +
ggtitle("Lag Plot for 6 days")
logreturns <- diff(log(appl.ts))
qplot(x = logreturns, fill=..count.., geom="histogram") +
ggtitle("Distribution for Log Returns")
One of the major ways to estimate if a stock is good or bad is by estimating the return on the stock, here I have ventured into the analysis of calculating the return on daily basis. Analysis of the natural logarithm of the differences of stock price, aids in comparing the values on a common scale. The log returns of the Apple share have a mean of 0.0008720862 and a standard deviation of 0.0186493. The 95% confidence level for the mean is calculated to be (0.03817069, -0.03642651). Thus, it becomes highly inclusive to say if investing in the stock today will bolster your chances of making profit or not based on this data.
The randomness and the risk of the investing is also evident when we visualize the log returns against time.
autoplot(logreturns, ts.colour = 'red', ts.linetype = 'dashed' )
We wanted to analyze the key features and trends of the stock based on which we could determine what would be a good time to invest in the stock. According to words by Benjamin Franklin - “An investment in knowledge pays the best interest.” In my analysis, I have tried to add some truth to this statement that knowledge about stocks and their assessment has the potential to be the best investment.
I chose the price of the Apple’s stock as it is said to be one of the best stocks on the market in terms of performance (“The Chosen One”). An insight into the stock’s price patterns will certainly help us understand the metrics involved in this process. In my venture, I have utilized various Time-Series based techniques to analyze the patterns and developed an autoregressive model (AR model) to predict the price.
Based on the analysis of the apple stock price, it is evident that the stock price of a company is key metric in assessing the performance of the company as any successful ventures or failed strategy is reflected on the price and the pattern of the stock.
Investing in a stock of a high-performance company like Apple can surely be a good choice, but from our analysis it is evident that short term gains might be risky but long-term gains are more likely.
In terms of short term gains it is always helpful to see the recent pattern in the price of the stock, the price of the stock in our case had a very high correlation value, so if the price has seen an upward trend recently it might be a good choice to invest in Apple’s stock, on the other hand if it’s a downward trend, it might be time to contact your custodian to sell it.
Another good way to determine the right time to invest in the stock is to assess what the company visions to achieve in the near future and what their products might have to offer. As, we see in Apple’s case the price of the stock grew around the time of their products’ launch.
Limitations: