Problem 1

For a data set of your choosing, make a faceted plot using the trelliscopejs package. You may make any type of plot; scatter plot, histogram, etc. but, as mentioned in the discussion below, you must explain why you chose this plot and what you are investigating about the variable you are graphing.

The trelliscope plot must include one cognostic measure of your own. Include a description of what it is and what information this measure gives.


# Declare libraries
library(ggplot2)
library(tidyverse)
library(trelliscopejs)
library(plotly)

# Import data
nsdq_symb <- readRDS("C:/Users/Admin/Documents/RMU/02) Data Visualization/Week 4/Homework/nsdq_symb.rds")
nsdq2016 <- readRDS("C:/Users/Admin/Documents/RMU/02) Data Visualization/Week 4/Homework/nsdq2016.rds")

# Data cleaning

# Join datasets together, verifying that nsdq_symb is unique symbol
length(unique(nsdq_symb$symbol))
## [1] 500
nsdq_symb_2 <- nsdq_symb %>% 
                ungroup() %>% 
                arrange(symbol) %>% 
                group_by(symbol)

nsdq2016_2 <- nsdq2016 %>% 
              ungroup() %>% 
              arrange(symbol, date) %>% 
              group_by(symbol)

stocks <- left_join(nsdq2016_2, nsdq_symb_2, by = "symbol")
# Create variable transformations and feature creation to feed into graphs that will be picked up as cognostics

stocks2 <- stocks %>% 
            group_by(symbol) %>% 
            arrange(date) %>% 
            mutate( 
              daily_return = (close - lag(close))/lag(close),
              log_return = log(close/lag(close)),
              ytd_return = close / first(close),
              yearly_return = last(close)/first(close),
              avg_daily_return = mean(daily_return, na.rm = TRUE)
              )
# Create exploratory visualizations

stocks2 %>% 
  distinct(symbol, yearly_return) %>% 
  ggplot(aes(x = yearly_return)) +
  geom_histogram()

stocks2 %>% 
  ggplot(aes(x = date, y = ytd_return, group = symbol)) +
  geom_line()

# looks like there's some outliers

# filter to outliers
filter(stocks2, ytd_return > 5)
stocks2[stocks2$symbol == "TSG",]
# It's just one group, exclude them. Could be a data issue, as it looks like everything is shifted by a decimal point, but can't find external data to back it up. 

stocks2 <- stocks2[stocks2$symbol != "TSG",]

# graph again without outliers, this looks like a good graph to facet by company in trelliscope, sector info could feed into cognostics
# The nature of a ytd return line has a normalized axis per stock, which would provide a nice trelliscope view
stocks2 %>% 
  ggplot(aes(x = date, y = ytd_return, group = symbol)) +
  geom_line(aes(color = sector)) +
  facet_wrap(~sector)

# We see that the lines show the returns, but let's add in a frequency measure of average trading volume

avg_volume <- stocks2 %>% 
              group_by(symbol) %>% 
              summarize(avg_volume = mean(volume, na.rm = TRUE))

stocks2 <- left_join(stocks2, avg_volume, by = "symbol")

stocks2$volume_scaled <- stocks2$volume/stocks2$avg_volume
# trelliscope

stocks2 %>%
  ggplot(aes(x = date, y = ytd_return)) +
  geom_line() +
  labs(x = "Date",
       y = "Cumulative Year to Date Value") +
  theme(legend.position = "none") +
  theme_bw()+

  facet_trelliscope(~symbol,
                    name = "Nasdaq Return 2016",
                    desc = "Cumulative returns by stock for 2016",
                    nrow = 2, ncol = 3,
                    scales = c("same","same"),
                    path = ".",
                    self_contained = TRUE
                    )

Description 2-3 paragraphs.

Describe the data set. Explain the variable you are graphing in your plots and the reason you are investigating with it. Discuss the reason/motivation you chose the variable to facet on, and what insight or trend you are attempting to investigate. Discuss any challenges you had in making the graphs and how you dealt with these challenges. Name at least one cognostic measure (this can include the cognostic you created or be different) the reader could investigate, and explain any insight they might gain from it.

Description

This dataset was sourced from the Datacamp Trelliscope course, and it contains daily stock data for 500 companies for the year 2016 in the dataset “nsdq2016”. This contains the stock symbol, trade date, volume, and price values (open, high, low, close, and adjusted close). The other dataset is “nsdq_symb” which contains other information about the stock: company name, market cap, ipo year, sector, and industry. By left joining nsdq_symb onto nsdq2016 by symbol, the descriptive values can be mapped onto the detail table for categorial filtering in trelliscope.

The trelliscope plot that I produced shows a line graph per stock of the year-to-date (YTD) return across the data, defined as the closing price per day divided by the first closing price of the year. By using this normalized variable, all graphs work nicely with a trelliscope plot in that they’re on the same y-axis scale, so side-by-side viewing is easier to interpret. I wanted to use this variable to investigate which stocks were the best performing in terms of percentage gains for the year. With this visualization, other cognostics can be used for labels, filtering, or sorting to gain insight into volatility or other performance or descriptive measures. Additional values were mapped onto/derived from the dataset to be used in visualizations and cognostics. Because I wanted to derive faceting by stock symbol, I created values describing the return of the stock that changed by day (daily return, log return, year to date return) as well as values that were constant across each stock symbol for the year (yearly return and average daily return). When combining these measures with the descriptive labels, it allows you to investigate many questions around stock returns, such as:

Q: Which stocks had the highest or lowest yearly return? (Sort by yearly_return, ascending or descending.)
A: Lowest return is MYGN, yielding a return of 0.3892 per dollar invested. Of note, 7 of the 10 worst performing stocks were in the Health Care sector. Highest return was AMD with a return of 4.09 per dollar invested. Of note, 4 of the top 10 highest performing stocks were in Technology.

Q: Which stock per sector had the highest average daily return? (sort by avg_daily_return descending, apply sector label, select sectors as needed)
A: AMD leads the highest daily return again, but the next handful of stocks are in a different order than the yearly amount, indicating differences in volatility. Drill into each sector in the filter to see details.

Q: Which stocks had the most volatility within the year? (sort descending on daily_return_var)
A: SRPT, ORIG, and GWPH had the highest variance in daily returns. Of note 7 of the top 10 were in the Health Care sector.

I attempted a few cuts of the data that didn’t make it to the end. One was histograms faceted by industry or sector, but this view ultimately answered questions around distributions of price or volatility measures by aggregated measures instead of individual stocks. I felt that this aggregated the data too much and didn’t let trelliscope shine with its ability to glean insights from big data. I also tried to add in additional detail into the plot with a bar graph measure of trading volume relative to the yearly average, hoping that this would show periods of big trading volume centered around large price changes. However, adding it in made the scales confusing, the plot congested, and harder to interpret what was happening; so, it was ultimately abandoned as a graphing feature and instead left as a cognostic.