For this assignment I will create an example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with my selected dataset.”
In light of the current situation across the world, I chose the Bob Ross elements by episode recordset.
Load the Bob Ross dataset from fivethirtyeight into the RStudio environment:
url<-"https://raw.githubusercontent.com/fivethirtyeight/data/master/bob-ross/elements-by-episode.csv"
BobRoss <- read_csv(url)## Parsed with column specification:
## cols(
## .default = col_double(),
## EPISODE = col_character(),
## TITLE = col_character()
## )
## See spec(...) for full column specifications.
Use the Reshape library’s method ‘melt’ to transform the data set froma wide format to a long format
Bulkbobr <- reshape::melt(df, id=c("EPISODE","TITLE"))
Bob <- unique(subset(Bulkbobr, value == 1, select = c("EPISODE","TITLE","variable")))
Bob <-Bob[order(Bob$EPISODE),]
Bob <- rename(Bob, c("variable"="object"))
head(Bob,10)## EPISODE TITLE object
## 2822 S01E01 "A WALK IN THE WOODS" BUSHES
## 6449 S01E01 "A WALK IN THE WOODS" DECIDUOUS
## 10882 S01E01 "A WALK IN THE WOODS" GRASS
## 19345 S01E01 "A WALK IN THE WOODS" RIVER
## 23375 S01E01 "A WALK IN THE WOODS" TREE
## 23778 S01E01 "A WALK IN THE WOODS" TREES
## 3226 S01E02 "MT. MCKINLEY" CABIN
## 5241 S01E02 "MT. MCKINLEY" CLOUDS
## 5644 S01E02 "MT. MCKINLEY" CONIFER
## 14913 S01E02 "MT. MCKINLEY" MOUNTAIN
top10 <- tail(Bob2, n = 10)
ggplot(data=top10,aes(x=object, y=freq)) +
ggplot2::aes(x=reorder(object, -freq), y=freq) +
ggplot2::geom_bar(stat="identity",fill="steelblue") +
ggplot2::labs(x="Objects",y="Frequency") +
ggplot2::ggtitle("Most used objects in Bob Ross Paintings") +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))Now use tidyr’s ‘pivot_longer()’ and dplyr methods to replicate same reshaping of the the data and reproduce the same figure as above:
Bob3 <- df %>%
pivot_longer(cols=3:69,names_to = "variable", values_to = "value") %>%
filter(value==1) %>%
select(-value) %>%
dplyr::rename( object = variable ) %>%
group_by( object ) %>%
dplyr::summarise('count'= n()) %>%
dplyr::arrange( desc(count)) %>%
dplyr::top_n(10) %>%
ggplot(aes(x=object, y=count)) +
ggplot2::aes(x=reorder(object, -count), y=count) +
ggplot2::geom_bar(stat="identity",fill="steelblue") +
ggplot2::labs(x="Objects",y="Frequency") +
ggplot2::ggtitle("Most used objects in Bob Ross Paintings") +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))
Bob3