In this exercise you will learn to plot data using the ggplot2 package. To answer the questions below, use 4.1 Categorical vs. Categorical from Data Visualization with R.
# Load packages
library(tidyquant)
library(tidyverse)
library(lubridate) #for year()
# Pick stocks
stocks <- c("AAPL", "MSFT", "IBM")
# Import stock prices
stock_prices <- stocks %>%
tq_get(get = "stock.prices",
from = "1990-01-01",
to = "2019-05-31") %>%
group_by(symbol)
stock_prices
## # A tibble: 22,230 x 8
## # Groups: symbol [3]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 1990-01-02 1.26 1.34 1.25 1.33 45799600 1.08
## 2 AAPL 1990-01-03 1.36 1.36 1.34 1.34 51998800 1.09
## 3 AAPL 1990-01-04 1.37 1.38 1.33 1.34 55378400 1.10
## 4 AAPL 1990-01-05 1.35 1.37 1.32 1.35 30828000 1.10
## 5 AAPL 1990-01-08 1.34 1.36 1.32 1.36 25393200 1.11
## 6 AAPL 1990-01-09 1.36 1.36 1.32 1.34 21534800 1.10
## 7 AAPL 1990-01-10 1.34 1.34 1.28 1.29 49929600 1.05
## 8 AAPL 1990-01-11 1.29 1.29 1.23 1.23 52763200 1.00
## 9 AAPL 1990-01-12 1.22 1.24 1.21 1.23 42974400 1.00
## 10 AAPL 1990-01-15 1.23 1.28 1.22 1.22 40434800 0.997
## # … with 22,220 more rows
# Process stock_prices and save it under stock_returns
stock_returns <-
stock_prices %>%
# Calculate yearly returns
tq_transmute(select = adjusted, mutate_fun = periodReturn, period = "yearly") %>%
# create a new variable, year
mutate(year = year(date)) %>%
# drop date
select(-date)
stock_returns
## # A tibble: 90 x 3
## # Groups: symbol [3]
## symbol yearly.returns year
## <chr> <dbl> <dbl>
## 1 AAPL 0.169 1990
## 2 AAPL 0.323 1991
## 3 AAPL 0.0691 1992
## 4 AAPL -0.504 1993
## 5 AAPL 0.352 1994
## 6 AAPL -0.173 1995
## 7 AAPL -0.345 1996
## 8 AAPL -0.371 1997
## 9 AAPL 2.12 1998
## 10 AAPL 1.51 1999
## # … with 80 more rows
Hint: See the code in 4.3.1 Bar chart (on summary statistics).
library(dplyr)
plotdata <- stock_returns %>%
group_by(symbol) %>%
summarize(mean_returns = mean(yearly.returns))
plotdata
## # A tibble: 3 x 2
## symbol mean_returns
## <chr> <dbl>
## 1 AAPL 0.366
## 2 IBM 0.116
## 3 MSFT 0.283
Hint: See the code in 4.3.1 Bar chart (on summary statistics).
ggplot(plotdata,
aes(x = symbol,
y = mean_returns)) +
geom_bar(stat = "identity")
Hint: See the code in 4.3.1 Bar chart (on summary statistics).
library(scales)
ggplot(plotdata,
aes(x = factor(symbol,
labels = c("APPL",
"IBM",
"Microsoft")),
y = mean_returns)) +
geom_bar(stat = "identity",
fill = "cornflowerblue") +
geom_text(aes(label = dollar(mean_returns)),
vjust = -0.25) +
scale_y_continuous(breaks = seq(0, 130000, 20000),
label = dollar) +
labs(title = "Mean Yearly Returns",
subtitle = "Of the three companies listed",
x = "",
y = "")
Hint: See the code in 4.3.2 Grouped kernel density plots.
ggplot(stock_returns,
aes(x = yearly.returns,
fill = symbol)) +
geom_density(alpha = 0.4) +
labs(title = "Salary distribution by rank")
Hint: Google how to interpret density plots.
Of each of the three stocks, IBM has the highest chance due to it being at the highest point between the three, giving it more room from big loss, giving it more room for error and a substantially smaller chance of having a growth.
Hint: See the code in 4.3.3 Box plots.
ggplot(stock_returns,
aes(x = symbol,
y = yearly.returns)) +
geom_boxplot() +
labs(title = "Yearly Return Distribution")
I would say Microsoft because at all of their peaks it was at the secon dhighest and now it is substantially low, leading me to believe that it is going to have a huge spike in growth.
Hint: Use message, echo and results in the global chunk options. Refer to the RMarkdown Reference Guide.