Abstract:

I describe the stock analysis of 31 companies from the period of Jan. 2006 through Dec. 2017. The goals of this descriptive analysis are to understand the trends in the stock values of these companies and identify patterns and irregularities if any. The scope of this analysis is solely descriptive and is to pave a way which leads to asking questions that would encourage further investigations.

Introduction:

The dataset is obtained from https://www.kaggle.com/szrlee/stock-time-series-20050101-to-20171231/data. The data set has Open, High, Low, Close and Volume of each company recorded on the day the stocks traded. The dataset records values from Jan. 2006 to Dec. 2017. The dataset consists of 7 columns, namely - Date, Open, High, Low, Close, Volume and Name. The names of the 31 companies are enlisted by finding the unique entries in the Name column. There are 93612 observations and 3020 per company.

stockdata <- read.csv('~/Documents/RDataset/stockdata.csv')
summary(stockdata)
##          Date            Open              High              Low         
##  2006-01-03:   31   Min.   :   6.75   Min.   :   7.17   Min.   :   0.00  
##  2006-01-04:   31   1st Qu.:  33.95   1st Qu.:  34.29   1st Qu.:  33.60  
##  2006-01-05:   31   Median :  60.04   Median :  60.63   Median :  59.49  
##  2006-01-06:   31   Mean   :  85.62   Mean   :  86.39   Mean   :  84.84  
##  2006-01-09:   31   3rd Qu.:  94.00   3rd Qu.:  94.74   3rd Qu.:  93.25  
##  2006-01-10:   31   Max.   :1204.88   Max.   :1213.41   Max.   :1191.15  
##  (Other)   :93426   NA's   :25        NA's   :10        NA's   :20       
##      Close             Volume               Name      
##  Min.   :   6.66   Min.   :        0   AXP    : 3020  
##  1st Qu.:  33.96   1st Qu.:  5040180   BA     : 3020  
##  Median :  60.05   Median :  9701142   CAT    : 3020  
##  Mean   :  85.64   Mean   : 20156670   CVX    : 3020  
##  3rd Qu.:  94.01   3rd Qu.: 20752222   DIS    : 3020  
##  Max.   :1195.83   Max.   :843264044   GE     : 3020  
##                                        (Other):75492
names(stockdata)
## [1] "Date"   "Open"   "High"   "Low"    "Close"  "Volume" "Name"
unique(stockdata$Name)
##  [1] MMM   AXP   AAPL  BA    CAT   CVX   CSCO  KO    DIS   XOM   GE   
## [12] GS    HD    IBM   INTC  JNJ   JPM   MCD   MRK   MSFT  NKE   PFE  
## [23] PG    TRV   UTX   UNH   VZ    WMT   GOOGL AMZN  AABA 
## 31 Levels: AABA AAPL AMZN AXP BA CAT CSCO CVX DIS GE GOOGL GS HD ... XOM
stockdata$Date <- as.Date(stockdata$Date)
## Warning in strptime(xx, f <- "%Y-%m-%d", tz = "GMT"): unknown timezone
## 'zone/tz/2017c.1.0/zoneinfo/Asia/Kolkata'

Analysis:

The dataset being a time series dataset, I have plotted the Volume of stocks that have been traded daily from 2006 to 2017. The plot shows some spikes which are zoomed in - in the subsequent graphs. This serves as a pointer to investigate the reasons contributing to such irregular peaks in case of the four companies viz. AAPL, GE, TRV, CSCO.

library(ggplot2)
library(GGally)

ggplot(aes(x = as.Date(Date), y = Volume), data = stockdata) +
  geom_line(aes(color = Name)) + 
  xlab("Year")

library(gridExtra)

v1 <- ggplot(aes(x = as.Date(Date), y = Volume), 
       data = subset(stockdata, (stockdata$Name == "AAPL")  & 
                       (as.Date(stockdata$Date) >= "2006-01-01" & as.Date(stockdata$Date) <= "2008-12-31"))) +
  geom_line(colour = 'orange') +
  xlab("AAPL")

v2 <- ggplot(aes(x = as.Date(Date), y = Volume), 
       data = subset(stockdata, (stockdata$Name == "GE")  & 
                       (as.Date(stockdata$Date) >= "2008-03-01" & as.Date(stockdata$Date) <= "2009-06-30"))) +
  geom_line(colour = 'green') +
  xlab("GE")

v3 <- ggplot(aes(x = as.Date(Date), y = Volume), 
       data = subset(stockdata, (stockdata$Name == "TRV")  & 
                       (as.Date(stockdata$Date) >= "2014-01-01" & as.Date(stockdata$Date) <= "2017-03-31"))) +
  geom_line(colour = 'violet') +
  xlab("TRV")

v4 <- ggplot(aes(x = as.Date(Date), y = Volume), 
       data = subset(stockdata, (stockdata$Name == "CSCO")  & 
                       (as.Date(stockdata$Date) >= "2010-06-01" & as.Date(stockdata$Date) <= "2011-06-30"))) +
  geom_line(colour = 'yellow') +
  xlab("CSCO")

grid.arrange(v1, v2, v3, v4, ncol = 2)

The next step would be to plot the Open, High, Low and Close values of the stocks and try to make out some pattern that are visible from those plots. The fours plots corresponds to the same for all companies and the entire time period from 2006 to 2017.

ggplot(aes(x = as.Date(Date), y = Open), data = stockdata) +
  geom_line(aes(color = Name)) + 
  xlab("Year")

ggplot(aes(x = as.Date(Date), y = High), data = stockdata) +
  geom_line(aes(color = Name)) + 
  xlab("Year")

ggplot(aes(x = as.Date(Date), y = Low), data = stockdata) +
  geom_line(aes(color = Name)) + 
  xlab("Year")

ggplot(aes(x = as.Date(Date), y = Close), data = stockdata) +
  geom_line(aes(color = Name)) + 
  xlab("Year")

Observation :

All the graphs show exponential or at least near exponential rise in two companies.

Plotting the data for those two companies separately.

The two companies whose stock prices have rose consistently following an exponential at least near exponential path are GOOGL and AMZN. The average of Open and Close are plotted against the timeline.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:GGally':
## 
##     nasa
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
googlAllStock <- filter(stockdata, stockdata$Name == "GOOGL")
googlAllStock$AvgOC <- ( googlAllStock$Open + googlAllStock$Close ) / 2

amznAllStock <- filter(stockdata, stockdata$Name == "AMZN")
amznAllStock$AvgOC <- ( amznAllStock$Open + amznAllStock$Close ) / 2

a1 <- ggplot(aes(x = as.Date(Date), y = AvgOC), data = googlAllStock) +
  geom_line(colour = 'blue') +
  xlab("GOOGL : Year")

a2 <- ggplot(aes(x = as.Date(Date), y = AvgOC), data = amznAllStock) +
  geom_line(colour = 'red') +
  xlab("AMZN : Year")

grid.arrange(a1, a2, ncol = 2)

Candlestick chart :

The candlestick chart is used to describe the price movement of stocks. Each candlestick represents one day. It is a combination of line graph and bar graph. Each bar represents all four important pieces of information for that day: The Open, the Close, the High and the Low. I have used conventional coloring for the candles. The green color signifies that the Close price was greater than the Open price of the stock for that day. The red color signifies that the Open price was greater than the Close price for that day. In an attempt to enhance the readability of graph, I have used hover text indicating all the details - Date, Open, High, Low, Close, Volume and the name of the company as well. The hover can be used for comparison of two consecutive trading days by clicking on the ‘Compare data on hover’ icon of the candlestick graph.

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)

aaplStock <- filter(stockdata, stockdata$Name == "AAPL")

aaplStock$Date <- as.Date(aaplStock$Date)

hovertxt <- Map(function(x, y)paste0(x, ":", y), names(aaplStock), aaplStock)
hovertxt <- Reduce(function(x, y)paste0(x, "<br&gt;", y), hovertxt)

plot_ly(data = aaplStock, x = ~Date, xend = ~Date, color = ~Close > Open, colors = c("red","green")) %>%
  
  add_segments(y = ~Low, yend = ~High, line = list(width = 1, color = "black")) %>%
  
  add_segments(y = ~Open, yend = ~Close, line = list(width = 3)) %>%
  
  add_markers(y = ~(Low + High)/2, hoverinfo = "text",
              text = hovertxt, marker = list(color = "transparent")) %>% 
  
  layout(title = "Basic Candlestick Chart") %>%

  layout(showlegend = FALSE, 
         yaxis = list(title = "Price", domain = c(0, 0.9)),
         annotations = list(
           list(xref = "paper", yref = "paper", 
                x = 0, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste0("<b>AAPL</b>")),
           
           list(xref = "paper", yref = "paper", 
                x = 0.75, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste(range(aaplStock$Date), collapse = " : "),
                font = list(size = 8))),
         plot_bgcolor = "#f2f2f2")

Observations:

The majority of the graph is green in color implying that the Close value is greater than the Open value. This hints to a bullish trend. There are three major observable bearish trends - first during the year end of 2012 and beginning of 2013, second during the year 2015 to 2016 and the third one during the end of 2017. Other than these three bearish trends the graph projects an overall bullish trend. Of course, this is a generalized behavior over time and not individual days.

Returns:

1. Yearly Returns:

The motivation behind investing in stocks is the returns they give over time. The dataset consists of stock data for twelve years and hence annual returns can be calculated along with other time-specific intervals. The typical time intervals for calculating the returns are yearly, quarterly, monthly, weekly and daily. The following graphs depict the returns over these intervals either for all companies and a group of up to four companies showing irregular spikes. The first graphs gives the yearly returns. It is interesting to observe how the scales of the Y-axis i.e. returns change for different time intervals.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tidyquant)
## Loading required package: PerformanceAnalytics
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
## Loading required package: tidyverse
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## <U+221A> tibble  1.4.1     <U+221A> purrr   0.2.4
## <U+221A> readr   1.1.1     <U+221A> stringr 1.2.0
## <U+221A> tibble  1.4.1     <U+221A> forcats 0.2.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x dplyr::combine()         masks gridExtra::combine()
## x lubridate::date()        masks base::date()
## x plotly::filter()         masks dplyr::filter(), stats::filter()
## x xts::first()             masks dplyr::first()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x xts::last()              masks dplyr::last()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()
## 
## Attaching package: 'tidyquant'
## The following object is masked from 'package:tibble':
## 
##     as_tibble
## The following object is masked from 'package:dplyr':
## 
##     as_tibble
stockData_yearly_returns <- stockdata %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "yearly", 
                 type       = "arithmetic")
#stockData_yearly_returns

ggplot(data = stockData_yearly_returns, aes(x = Date, y = yearly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Yearly Returns") + 
  ggtitle(" Yearly Returns of all companies: 2006 - 2017")

Observation (Yearly Returns):

The graph shows a very clear dip in returns in the year 2009. This seems quite natural given the financial scenario during that period. It is obvious that the impact of the financial crisis of 2008 would be visible thereafter. The crisis had long lasting impact after bouncing back from the crisis as can be seen by the linear returns on stocks for majority of the companies. The prominent spikes of four companies are plotted separately.

spike_stockdata_yly_ret <- subset(stockdata, 
                                  stockdata$Name == "AABA" | stockdata$Name == "AAPL" | stockdata$Name == "AMZN" | stockdata$Name == "BA")

spike_stockdata_yly_ret <- spike_stockdata_yly_ret %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "yearly", 
                 type       = "arithmetic")
#spike_stockdata_yly_ret

ggplot(data = spike_stockdata_yly_ret, aes(x = Date, y = yearly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Yearly Returns") + 
  ggtitle(" Yearly Returns of visible Spikes: 2006 - 2017")

2. Quarterly Returns:

The second graph of the set series is the quarterly returns. The time period considered is 2008 -2009. The objective of choosing different time periods and companies is to facilitate in subsetting the data by Name and Date in the hope of finding patterns and irregularities in those periods. These should be considered as only representational.

stockData_qly_ret_sub <- subset(stockdata,stockdata$Date >= "2008-01-01" & stockdata$Date <= "2009-12-31")

stockData_qly_ret_sub <- stockData_qly_ret_sub %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "quarterly", 
                 type       = "arithmetic")
#stockData_qly_ret_sub

ggplot(data = stockData_qly_ret_sub, aes(x = Date, y = quarterly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Quarterly Returns") + 
  ggtitle(" Quarterly Returns of all companies: 2008-09")

### Observation (Quarterly Returns): The returns are fairly linear from Jan. 2008 to October 2008. Thereafter the returns diminish till February 2009 approximately. Then there is a overall increase in returns of all companies till December 2009. The four companies with highest spikes are separately shown below combined in a single graph.

spike_stockData_qly_ret_sub <- subset(stockdata, 
                                        (stockdata$Name == "AMZN" | stockdata$Name == "AXP" | stockdata$Name == "CAT" | stockdata$Name == "JPM") 
                                        & (stockdata$Date >= "2008-01-01" & stockdata$Date <= "2009-12-31"))

spike_stockData_qly_ret_sub <- spike_stockData_qly_ret_sub %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "quarterly", 
                 type       = "arithmetic")
#spike_stockData_qly_ret_sub

ggplot(data = spike_stockData_qly_ret_sub, aes(x = Date, y = quarterly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Quarterly Returns") + 
  ggtitle(" Quarterly Returns of visible spikes: 2008-09")

3. Monthly returns:

The monthly returns of all the companies for the year 2010 - January through December, are shown in the following graph. The graph is more of a sinusoidal wave with ups and downs. Most of the companies follow this behavior but there are one or two incidences where there is an observable deviation but that turns out to be temporary.

mnthly_StockData <- subset(stockdata, stockdata$Date >= "2010-01-01" & stockdata$Date <= "2010-12-31")

mnthly_StockData <- mnthly_StockData %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "monthly", 
                 type       = "arithmetic")
#mnthly_StockData

ggplot(data = mnthly_StockData, aes(x = Date, y = monthly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Monthly Returns") +
  ggtitle("Monthly Returns : 2010")

Observation (Monthly Returns):

The graph is more of a sinusoidal wave with ups and downs. Most of the companies follow this behavior but there are one or two incidences where there is an observable deviation but that turns out to be temporary.

mnthly_StockData_ret <- subset(stockdata, 
                               (stockdata$Name == "HD" | stockdata$Name == "CSCO" | stockdata$Name == "GS" | stockdata$Name == "MSFT") 
                               & (stockdata$Date >= "2010-01-01" & stockdata$Date <= "2010-12-31"))

mnthly_StockData_ret <- mnthly_StockData_ret %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "monthly", 
                 type       = "arithmetic")
#mnthly_StockData_ret

ggplot(data = mnthly_StockData_ret, aes(x = Date, y = monthly.returns)) + 
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Monthly Returns") +
  ggtitle("Monthly Returns : 2010")

4. Weekly Returns

The fourth graph of the returns series is the weekly graph where the returns are calculated on weekly basis. I have chosen a time period of one year which approximately 50 weeks. The randomly chosen time period is from June 2008 to June 2009.

wkly_Stock <- subset(stockdata, (stockdata$Date >= "2008-06-01" & stockdata$Date <= "2009-06-30"))

wkly_Stock <- wkly_Stock %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "weekly", 
                 type       = "arithmetic")
#wkly_Stock

ggplot(data = wkly_Stock, aes(x = Date, y = weekly.returns)) +
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Weekly Returns") +
  ggtitle("Weekly Returns : June 2008 - June 2009")

Observation (Weekly Returns) :

The returns are confined mostly within the range -0.2 to 0.2 but there are certain spikes which go well beyond this range and such spikes are again plotted separately. The major focus is on the returns that at least touch the mark of 0.4. There are three observations that hit the mark of which two go beyond significantly beyond the mark.

wkly_Stock_ret <- subset(stockdata, 
                         (stockdata$Name == "JPM" | stockdata$Name == "UNH" | stockdata$Name == "GS" | stockdata$Name == "CVX") 
                         & (stockdata$Date >= "2008-06-01" & stockdata$Date <= "2009-06-30"))

wkly_Stock_ret <- wkly_Stock_ret %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "weekly", 
                 type       = "arithmetic")
#wkly_Stock_ret

ggplot(data = wkly_Stock_ret, aes(x = Date, y = weekly.returns)) +
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Weekly Returns") +
  ggtitle("Weekly Returns : June 2008 - June 2009")

5. Daily Returns :

The last of the series is the daily returns. The randomly chosen time period is of one year from January 2017 to December 2017. The spikes in the plot are observed and highlighted in the next graph.

dly_Stock <- subset(stockdata, stockdata$Date >= "2017-01-01")

dly_Stock <- dly_Stock %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "daily", 
                 type       = "arithmetic")
#dly_Stock

ggplot(data = dly_Stock, aes(x = Date, y = daily.returns)) +
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Daily Returns") +
  ggtitle("Daily Returns : 2017")

Observation (Daily Returns):

The spikes in daily returns are significant for three to four companies between the months of October and December. The spikes are towards the end of October to be more precise. Two graphs are plotted as a part to detailed view of the daily analysis. First part zooms in on the last week of October i.e 23 - 31 October, 2017. The daily returns of all the companies are plotted. The second graph is the observed spikes plotted separately.

spike_dly_Stock <- subset(stockdata, (stockdata$Date >= "2017-10-23" & stockdata$Date <= "2017-10-31"))

spike_dly_Stock <- spike_dly_Stock %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "daily", 
                 type       = "arithmetic")
#spike_dly_Stock

#d1 <- 
ggplot(data = spike_dly_Stock, aes(x = Date, y = daily.returns)) +
  geom_line(aes(color = Name)) +
  xlab("Years") +
  ylab("Daily Returns") +
  ggtitle("Daily Returns for Oct 23 - 31, 2017")

spike_dly_Stock_4 <- subset(stockdata, 
                            (stockdata$Name == "CVX" | stockdata$Name == "MRK" | stockdata$Name == "INTC" | stockdata$Name == "MMM")
                            & (stockdata$Date >= "2017-10-23" & stockdata$Date <= "2017-10-31"))

spike_dly_Stock_4 <- spike_dly_Stock_4 %>%
    group_by(Name) %>%
    tq_transmute(select     = Close, 
                 mutate_fun = periodReturn, 
                 period     = "daily", 
                 type       = "arithmetic")
#spike_dly_Stock_4

#d2 <- 
ggplot(data = spike_dly_Stock_4, aes(x = Date, y = daily.returns)) +
  geom_line(aes(color = Name)) +
  xlab("Days") +
  ylab("Daily Returns") +
  ggtitle("Sampling Daily Returns")

#grid.arrange(d1, d2, ncol = 2)

NOTE :

The time periods and detailed spikes observations are meant to showcase the ability to zoom in on a particular time scale and thereafter selectively choosing the data that deviates from the normal behavior. This is done as a precursor to call attention to such deviations which serve as a stepping stone in further investigations attempting to explain the oddities in the behavior. The investigations and explanations are not explored in this analysis.

TECHNICAL INDICATORS :

The financial data is extremely voluminous and there are jargon in the financial terms that are popularly used to make specific sense of the data. There are predefined terminologies and corresponding packages in R which help in implementing them graphically. The four types of technical indicators have multiple examples. Here is the list of Technical Indicators and the examples belonging to each category.

I have plotted one graph each for four types of technical indicators.

1. Trend : MACD

The Moving Average Convergence Divergence (MACD) graph is plotted as an example for ‘Trend’. The MACD graph for company maned DIS for the year 2011 is plotted below. The company names are chosen in such a way that so as to cover most of the dataset.

About MACD :

The MACD plot is supposed to reveal changes in the strength, direction, momentum, and duration of a trend in a stock’s price.

disStock <- subset(stockdata, Name == "DIS")

disStock_macd <- disStock %>%
    group_by(Name) %>%
    tq_mutate(select     = Close, 
              mutate_fun = MACD, 
              nFast      = 12, 
              nSlow      = 26, 
              nSig       = 9, 
              maType     = SMA) %>%
    mutate(diff = macd - signal) %>%
    select(-(Open:Volume))
#disStock_macd

disStock_macd %>%
    filter(Date >= "2011-01-01" & Date <= "2011-12-31") %>%
    ggplot(aes(x = Date)) + 
    geom_hline(yintercept = 0, color = palette_light()[[1]]) +
    geom_line(aes(y = macd, col = Name)) +
    geom_line(aes(y = signal), color = "blue", linetype = 2) +
    geom_bar(aes(y = diff), stat = "identity", color = palette_light()[[1]]) +
    labs(title = "DIS: Moving Average Convergence Divergence - 2011",
         y = "MACD", x = "", color = "") +
    theme_tq() +
    scale_color_tq()

2. Momentum: RSI

The Relative Strength Index (RSI) is plotted as an example for ‘Momentum’. The RSI graph for the company WMT for the year 2014 is plotted below.

About RSI :

The 2-period RSI strategy is a mean-reversion trading strategy designed to buy or sell securities after a corrective period. The strategy is rather simple. Connors suggests looking for buying opportunities when 2-period RSI moves below 10, which is considered deeply oversold. Conversely, traders can look for short-selling opportunities when 2-period RSI moves above 90. This is a rather aggressive short-term strategy designed to participate in an ongoing trend. It is not designed to identify major tops or bottoms.

library(TTR)

wmt_Stock <- subset(stockdata, (stockdata$Name == "WMT")
                    & (stockdata$Date >= "2014-01-01" & stockdata$Date <= "2014-12-31"))

# Calculate the RSI indicator

rsi <- RSI(Cl(wmt_Stock),2) 
sigup <- ifelse(rsi < 10, 1, 0)
sigdn <- ifelse(rsi > 90, -1, 0)


sigup <- lag(sigup,1) 
sigdn <- lag(sigdn,1) 

sigup[is.na(sigup)] <- 0
sigdn[is.na(sigdn)] <- 0 
sig <- sigup + sigdn 
ret <- ROC(Cl(wmt_Stock))
ret[1] <- 0 
eq_up <- cumprod(1+ret*sigup)
eq_dn <- cumprod(1+ret*sigdn*-1)
eq_all <- cumprod(1+ret*sig) 
plot.zoo( cbind(eq_up, eq_dn), ylab=c("Long","Short"), col=c("green", "red"), main="Simple RSI(2) Strategy: WMT: 2014")

3. Volatality : Bollinger Bands

The Bollinger Bands (BB) is plotted as an example for ‘Volatility’. The BB graph for the company JNJ for the year 2015 is plotted below.

About BB :

Bollinger Bands can be used to measure the “highness” or “lowness” of the price relative to previous trades.The purpose of Bollinger Bands is to provide a relative definition of high and low. By definition, prices are high at the upper band and low at the lower band. This definition can aid in rigorous pattern recognition and is useful in comparing price action to the action of indicators to arrive at systematic trading decisions.

jnj_Stock <- subset(stockdata, Name == "JNJ" & (Date >= "2015-01-01" & Date <= "2015-12-31"))
jnj_Stock[is.na(jnj_Stock)] <- 0

jnj_Stock$Date <- strptime(jnj_Stock$Date, format = "%Y-%m-%d")
JNJ <- xts(jnj_Stock$Close, order.by = jnj_Stock$Date)

chartSeries(JNJ, subset = jnj_Stock)

addBBands()

4. Volume : Rate of Change - Volume

The Rate of Change (ROCV) is plotted as an example for ‘Volume’ type. The ROCV graph for the company IBM for the year 2010 is plotted below.

About ROCV :

The Rate-of-Change (ROC) indicator, which is a pure momentum oscillator that measures the percent change in price from one period to the next. The ROCV is the indicator that shows whether or not a volume trend is developing in either an up or down direction. Shorter periods tend to produce a chart that is more jagged and difficult to analyze as on the other hand longer periods makes the chart look round and smooth.

ibm_Stock <- subset(stockdata, Name == "MSFT" & (Date >= "2010-01-01" & Date <= "2010-12-31"))
ibm_Stock[is.na(ibm_Stock)] <- 0

ibm_Stock$Date <- strptime(ibm_Stock$Date, format = "%Y-%m-%d")
IBM <- xts(ibm_Stock$Close, order.by = ibm_Stock$Date)

chartSeries(IBM, subset = ibm_Stock)

addROC()

Simple Moving Average : SMA

The Simple Moving Average (SMA) for MCD for the year 2017 is plotted below.

About SMA :

Moving Averages are an important analytical tool used to identify current price trends and the potential for a change in an established trend. The simplest form of using a simple moving average in analysis is using it to quickly identify if a security is in an uptrend or downtrend.

mcdStock <- subset(stockdata, Name == "MCD" & (Date >= "2017-01-01" & Date <= "2017-12-31"))
mcdStock[is.na(mcdStock)] <- 0

mcdStock$Date <- strptime(mcdStock$Date, format = "%Y-%m-%d")
mcdStock <- xts(mcdStock$Close, order.by = mcdStock$Date)

candleChart(mcdStock, up.col = "black", dn.col = "red", theme = "white")

addSMA(n = c(20,50,100))

Observation:

The period over which the SMA is averaged is 20, 50 and 100 days. There is one crossover at the beginning of October. The stocks for MCD in the year 2017 are clearly an uptrend.

Data Manipulation:

The dataset is of 31 companies over a period of 12 years. The data can be nested by Name. The following chunk of code does exactly that and creates a new dataset with 31 rows each corresponding to a company. There are two columns - one is the Name of the company and the other is a tibble which hold the entire data of that company. The tibble consists of the Date, OHLC and Volume of the 12 years.

by_company <- stockdata %>%
  group_by(Name) %>%
  nest()

by_company
## # A tibble: 31 x 2
##    Name  data                
##    <fct> <list>              
##  1 MMM   <tibble [3,020 x 6]>
##  2 AXP   <tibble [3,020 x 6]>
##  3 AAPL  <tibble [3,019 x 6]>
##  4 BA    <tibble [3,020 x 6]>
##  5 CAT   <tibble [3,020 x 6]>
##  6 CVX   <tibble [3,020 x 6]>
##  7 CSCO  <tibble [3,019 x 6]>
##  8 KO    <tibble [3,020 x 6]>
##  9 DIS   <tibble [3,020 x 6]>
## 10 XOM   <tibble [3,020 x 6]>
## # ... with 21 more rows

FINAL PLOTS AND SUMMARY

1. Candlestick Chart

The candlestick chart is used to describe the price movement of stocks. Each candlestick represents one day. It is a combination of line graph and bar graph. Each bar represents all four important pieces of information for that day: The Open, the Close, the High and the Low. I have used conventional coloring for the candles. The green color signifies that the Close price was greater than the Open price of the stock for that day. The red color signifies that the Open price was greater than the Close price for that day. In an attempt to enhance the readability of graph, I have used hover text indicating all the details - Date, Open, High, Low, Close, Volume and the name of the company as well. The hover can be used for comparison of two consecutive trading days by clicking on the ‘Compare data on hover’ icon of the candlestick graph.

#library(plotly)
#library(tidyr)

aaplStock <- filter(stockdata, stockdata$Name == "AAPL")

aaplStock$Date <- as.Date(aaplStock$Date)

hovertxt <- Map(function(x, y)paste0(x, ":", y), names(aaplStock), aaplStock)
hovertxt <- Reduce(function(x, y)paste0(x, "<br&gt;", y), hovertxt)

plot_ly(data = aaplStock, x = ~Date, xend = ~Date, color = ~Close > Open, colors = c("red","green")) %>%
  
  add_segments(y = ~Low, yend = ~High, line = list(width = 1, color = "black")) %>%
  
  add_segments(y = ~Open, yend = ~Close, line = list(width = 3)) %>%
  
  add_markers(y = ~(Low + High)/2, hoverinfo = "text",
              text = hovertxt, marker = list(color = "transparent")) %>% 
  
  layout(title = "Basic Candlestick Chart") %>%

  layout(showlegend = FALSE, 
         yaxis = list(title = "Price", domain = c(0, 0.9)),
         annotations = list(
           list(xref = "paper", yref = "paper", 
                x = 0, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste0("<b>AAPL</b>")),
           
           list(xref = "paper", yref = "paper", 
                x = 0.75, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste(range(aaplStock$Date), collapse = " : "),
                font = list(size = 8))),
         plot_bgcolor = "#f2f2f2")

Observation :

The majority of the graph is green in color implying that the Close value is greater than the Open value. This hints to a bullish trend. There are three major observable bearish trends - first during the year end of 2012 and beginning of 2013, second during the year 2015 to 2016 and the third one during the end of 2017. Other than these three bearish trends the graph projects an overall bullish trend. Of course, this is a generalized behavior over time and not individual days.

2. Bollinger Bands

The Bollinger Bands (BB) is plotted as an example for ‘Volatility’. The BB graph for the company JNJ for the year 2015 is plotted below. Bollinger Bands can be used to measure the “highness” or “lowness” of the price relative to previous trades.The purpose of Bollinger Bands is to provide a relative definition of high and low. By definition, prices are high at the upper band and low at the lower band. This definition can aid in rigorous pattern recognition and is useful in comparing price action to the action of indicators to arrive at systematic trading decisions.

jnj_Stock <- subset(stockdata, Name == "JNJ" & (Date >= "2015-01-01" & Date <= "2015-12-31"))
jnj_Stock[is.na(jnj_Stock)] <- 0

jnj_Stock$Date <- strptime(jnj_Stock$Date, format = "%Y-%m-%d")
JNJ <- xts(jnj_Stock$Close, order.by = jnj_Stock$Date)

chartSeries(JNJ, subset = jnj_Stock)

addBBands()

Observation :

The behavior seems to down-trending and has hit quite low values in the interval from August and September. From October onward, the stocks have shoot up to recover and approximately reach back to the values at the beginning of the year.

3. Simple Moving Averages

The Simple Moving Average (SMA) for MCD for the year 2017 is plotted below. Moving Averages are an important analytical tool used to identify current price trends and the potential for a change in an established trend. The simplest form of using a simple moving average in analysis is using it to quickly identify if a security is in an uptrend or downtrend.

mcdStock <- subset(stockdata, Name == "MCD" & (Date >= "2017-01-01" & Date <= "2017-12-31"))
mcdStock[is.na(mcdStock)] <- 0

mcdStock$Date <- strptime(mcdStock$Date, format = "%Y-%m-%d")
mcdStock <- xts(mcdStock$Close, order.by = mcdStock$Date)

candleChart(mcdStock, up.col = "black", dn.col = "red", theme = "white")

addSMA(n = c(20,50,100))

Observation :

The period over which the SMA is averaged is 20, 50 and 100 days. There is one crossover at the beginning of October. The stocks for MCD in the year 2017 are clearly an uptrend.

REFLECTIONS

The stock dataset contains information on 93612 observations of 31 companies. The dataset has 7 columns including Date, Open, High, Low, Close, Volume and Name. The data spans from January 2006 to December 2017. I started by analyzing the table by getting the summaries. The nest step is to try and understand the behavior of how the stocks have traded in terms of Volume and Prices. Ergo, plotting the volume and price against time would be the fist step. I plotted volume against time first and then OHLC - individually, against time. The details, oddities and general behavior is seen from those plots. To get a detailed view I explored by zooming in for particular companies.

Candlestick chart, being one the basic and most important chart in financial analysis have been plotted for one company and understand the behavior of stock over time. The typical colors have been adhered to for a better understanding. I faced errors related to the Date attribute of the table owing to its datatype and compatibility while using in graphs. I converted the format to resolve the error.

The investors in the stock markets are interested in getting returns on what they have invested. The next step I took was to calculate the returns based on the given data. I have calculated returns and split them in five segments - yearly, quarterly, monthly, weekly and daily. This step facilitated me in subsetting the data for a particular company and further for a specific time period. The returns in these five segments are shown in graphs and those are further zoomed to comprehend the behavior of individual companies. The plots on returns are useful in finding the trend for companies as a group or even an individual company. The trends are discussed in short immediately after the graphs. It is not feasible not calculate the returns of all the companies over the entire time period since this is a beginner’s analysis.

Being a analyst, it is important to understand and use the jargons of the financial domain. I have explored the types of indicators used in financial analysis for trend, momentum, volatility and volume. In order to understand and represent the data in financial terms, one example of each type of indicator has been plotted. The Moving Average Convergence Divergence (MACD) for trend, the Relative Strength Index (RSI) for momentum, Bollinger Bands for volatility and finally Rate of Change in Volume (ROCV) for Volume have been plotted. Another important indicator is the Moving Average used to get the overall uptrend or downtrend and has been plotted as well. All the above mentioned graphs are plotted by first subsetting it for one company and further, if required, for a particular interval.

The analysis presented here is introductory and there are areas in which it can be explored further. For example, finding the returns in details, getting the moving averages for individual companies, predicting the values of stock so on and so forth. This analysis is descriptive in nature and the future research on it can accommodate predictive and prescriptive analysis for the dataset.