We’re going to showcase some features of various packages from the Tidyverse!
Here we have data concerning the DogeCoin/USD exchange rates according to Yahoo Finance from 2014 to roughly the present. Using magrittr, dplyr, lubridate, and ggplot2, this should be a snap. Let’s get started.
rates <- read.csv("https://raw.githubusercontent.com/TheWerefriend/data607/master/tidyverseAssignment/DOGE-USD.csv")
colnames(rates)
## [1] "Date" "Open" "High" "Low" "Close" "Adj.Close"
## [7] "Volume"
tests <- ifelse(rates$Close==rates$Adj.Close, "Identical", "Distinct")
"Distinct" %in% tests
## [1] FALSE
This result makes sense because there are no possible corporate actions like stock splits which can cause the closing value in a day to be adjusted.
library(magrittr)
library(dplyr)
rates <- rates %>%
select(!Adj.Close)
colnames(rates)
## [1] "Date" "Open" "High" "Low" "Close" "Volume"
From the dplyr package, we can use the ! operator to select the complement of a list of variables by name.
library(lubridate)
rates$Date <- ymd(rates$Date)
Now, all the other values are stored as factors… But, something strange happens when we try to convert them directly to numeric values:
rates$Open[[1]]
## [1] 0.000293
## 1413 Levels: 0.000087 0.000089 0.000090 0.000091 0.000092 0.000093 ... null
as.numeric(rates$Open[[1]])
## [1] 190
Apparently, 0.000293 is the 190th factor level in the column! Since there are some null values in the data (days where major changes happened with the DOGE network, I assume), we must throw throw those observations out.
rates[, c(2:6)] <- rates %>%
select(!Date) %>%
sapply(as.character) %>%
sapply(as.numeric)
rates <- na.omit(rates)
anyNA(rates)
## [1] FALSE
Transmute() works exactly the same as mutate(), except that it alters an existing variable instead of creating a new one.
library(ggplot2)
a <- ggplot(rates, aes(x = rates$Date)) +
geom_line(aes(y = rates$Open), color = "steelblue") +
geom_line(aes(y = rates$High), color = "green") +
geom_line(aes(y = rates$Low), color = "red") +
geom_line(aes(y = rates$High), color = "purple")
b <- ggplot(rates, aes(x = rates$Date)) +
geom_line(aes(y = rates$Volume), color = "black")
a + labs(x = "Time", y = "Price in USD")
b + labs(x = "Time", y = "Volume")