This document was created to demonstrate the ability of R to scrape HTML data from websites. For this assignment, I chose to scrape the 6-month historical stock price data on Yahoo Finance for Citrix Systems. This is extremely interesting data to analyze because of the imapcts COVID-19 has had on the markets in recent months.
Citrix Systems (CTXS) is an enterprise software system that helps businesses of all sizes with cloud-based digitilization and remote workplace transitions As a result, they are one of the few companies that has experienced immense growth and prosperity due to the outbreak of international pandemic. With this data, I will answering questions regarding their performance during this period, as well as before it.
The following packages are needed in order to run our analysis:
Allows R to connect and communicate with the internet to transfer information between the two.
Gives us a bunch of analysis tools combined into one seamless package. Overall, this package allows us to manipulate, wrangle, and visualize data for its intended purpose.
This package gives us a lot of data wrangling capabilities in order to narrow down observations to create in-depth analyses
A package that helps with authenticating websites’ http data
Provides tools for managing XML and HTML through R
Allows the user to work extensively with dates and helps with managing them
library(XML)
library(tidyverse)
library(dplyr)
library(httr)
library(rvest)
library(lubridate)
I tried all three of the methods shown in the module for importing the data from Yahoo Finance. Similar to the examples, the first two didn’t work, so I had to save the web page and extract my table from that. I then had to pull the historical stock prices dataframe from the test tables that were created. There were 100 rows of data, listingCitrix’s stock price data each day the market was open in the past 6 months.
citrix <- "https://finance.yahoo.com/quote/CTXS/history?period1=1576195200&period2=1589328000&interval=1d&filter=history&frequency=1d"
readHTMLTable(citrix, stringsAsFactors = FALSE)
## named list()
# Doesn't work...
citrix_http <- "https://finance.yahoo.com/quote/CTXS/history?period1=1576195200&period2=1589328000&interval=1d&filter=history&frequency=1d"
readHTMLTable(citrix_http, stringsAsFactors = FALSE)
## named list()
# Also doesn't work...
rm(citrix_http)
url_doc <-
GET(citrix)$content %>%
rawToChar() %>%
htmlParse()
test_tables <- readHTMLTable(url_doc, stringsAsFactors = FALSE)
citrix_table <- data.frame(test_tables[["NULL"]])
# Ah... Finally
My data had to undergo several aspects of cleaning in order to work with it. One of the rows was completely occupied with a line that declared that a dividend had been issues on a certain date, leaving the rest of the columns NA. I had to omit it from my data.
colSums(is.na(citrix_table))
## Date Open High Low Close. Adj.Close..
## 0 0 1 1 1 1
## Volume
## 1
citrix_table <- na.omit(citrix_table)
Also, I wanted to change the name of some of the columns to make them look cleaner. “Close.” had a period after it that I wanted to remove. “Adj.Close..” had even more periods that I wanted gone.
citrix_table <- citrix_table %>%
rename(AdjClose = Adj.Close..,
Close = Close.)
The date column loaded in the format of “May 12, 2020” which was not recognized by R. I used the lubridate package to change it to the default date format in R.
citrix_table$Date <- mdy(citrix_table$Date)
Finally, none of the numbers that were loaded into columns were formatted as numeric, so I needed to change them so I could run my analysis. I also wanted to remove commas in the Volume column for the numbers that were in the millions.
citrix_table <- citrix_table %>%
mutate(Open = as.numeric(as.character(Open)),
High = as.numeric(as.character(High)),
Low = as.numeric(as.character(Low)),
Close = as.numeric(as.character(Close)),
AdjClose = as.numeric(as.character(AdjClose)),
Volume = as.numeric(gsub(",","", as.character(Volume))))
First, I wanted to explore the stock’s performance before the United States started to become gravely concerned about the virus. I marked this being before the first confirmed death in the country. This way, we can develop an understanding of how the stock normally behaved prior to the bear market brought on by the outbreak. In order to do so, I filtered the data for only dates in the file before February 29th, 2020.
Citrix’s stock performance was moving in-line with the S&P 500 and its sector for most of December and the first half of January. Around mid-January, it saw a surge as perhaps some of the more keen investors hopped aboard. This was before a sell-off due to however as less informed investors dumped their shares. This was in reponse to mounting fears and surrounding the greater market sell-off and drop for this period. This analysis could be further explored by overlaying a column chart indicating the percent change for each single week period shown in the X-axis.
Next, it is pertinent to explore the stock’s performance following the first reported death in the country. Since this the nature of Citrix’s business to help businesses implement the digitalization of workplaces and improve cloud-based applications, it is reasonable to assume that smart investors would see the paniced reaction and companies moving remote as an opportunity to buy. For this, I filtered the data for to only account for trading days occuring after February 29th, 2020.
As anticipated, the stock’s performance sky-rockets after this date. In fact, it results in being one of the best performing tickers out of any companies in the S&P 500. Despite experiencing the misinformed sell-off in its shares, Citrix saw significant investment as one of the most important companies to continue operation during this time. Similar to question 1, this analysis could be further explored by overlaying a column chart indicating the percent change for each single week period shown in the X-axis.
So what kind of returns did these opportunistic investors get from this amazing stock pick? I explored this question by finding the total percent return on investment if the investor had purchased it the first trading day after the first confirmed COVID-19 death in the United States and held on until the end of April. I had to filter exclusively for this time period and pull the Open price for March 2nd, 2020 and Adjusted Close price for April 30th, 2020.
## [1] 0.393523
If you had invested in Citrix during this time period, you would have witnessed a 39.35% return! To put that in perspective, if you invested 100,000 dollars, in two months you would have gained nearly 40,000 dollars for doing absolutely nothing. So yes, those investors were extremely savvy. Now if only I had 1 million dollars and a time travel device… This analysis could be broken down further and improved by analyzing the returns for week by week, so show just how volatile the stock price was. You could also get a sense for the emotions experineced by the investors and how tempted many of them must have been to sell-out early.
Speaking of volatility, as an investor, it is one of the many avenues you have to navigate in the world of investing. Resisting your emotions to sell off when your holdings experience positive or negatives turns is all a part of the process. Investors who have had positions in Citrix, among other stocks, have certainly seen their fair share of intraday losses and gains. This is especially true considering COVID-19 headlines can tend to dominate the markets in times likes these and cause dramatic swings in prices. In order to explore this effect, I looked into the difference between highs and lows of a stock price in any given day and wanted to pick out the top 3 instances.
## Date Flux PercentFlux
## 1 2020-05-12 8.27 0.05636587
## 2 2020-05-11 3.86 0.02552235
## 3 2020-05-08 3.65 0.02467216
The most recent trading day from the data pulled was actually the most volatile in terms intraday value change. The price fluctuated 8.27 per share. That’s upwards of a 5.63% change. When you get almost 40% return in 2 months, the risk of losing or gaining 5.63% can definitely test your intuition. Further analysis could be conducted by comparing the intraday change to the actual open to close change.
Finally, I decided to dig into the trading volume statistics. Trading volume determines the amount of shares that were sold or bought. By exploring the average trading volume per month for the historical 6-month period, I can determine just how much activity and attention Citrix’s stock had been experiencing. This could provide insight to its popularity before and after the COVID-19 outbreak, irregardless of its price.
As expected, the trading volume just about doubled from February to March, in-line with the massive jump in stock prices. Although, so far in May, the average has returned to levels that were closer to January and before the massive COVID-19 outbreak in the United States. This analysis could be improved by overlaying the price change graph to see the direct correlation between stock price and trading volume.