Introduction

I used the Bob Ross dataset for this tidying project.

Code

I started by loading the required libraries and the dataset.

# load required libraries
library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# load the bob ross csv
br <- read.csv("bob_ross.csv", header = TRUE)

Next, I tidied the data using pivot_longer, filter, and separate.

# tidy the bob ross csv (pivot_longer, filter, separate, select, group_by, mutate)
br <- br %>%
  pivot_longer(c("APPLE_FRAME":"WOOD_FRAMED"),
               names_to = "element",
               values_to = "present") %>%
  filter(present == 1) %>%
  separate("EPISODE", into = c("season", "episode"), sep = "E") %>%
  separate("season", into = c("S", "season"), sep = "S") %>%
  select(-S)

br$season <- as.integer(br$season)
br$episode <- as.integer(br$episode)

Finally, I created a new dataframe with based on the most common elements in the dataset and graphed this.

# create a data frame containing only one instance of each object
most_common <- br %>%
  group_by(element) %>%
  mutate("total" = sum(present)) %>%
  distinct(element, total) %>%
  filter(total > 50)

# plot the most popular elements
ggplot(most_common, aes(x = reorder(element, total), y = total)) +
  geom_bar(stat = "identity") +
  labs(x = "Element",
       y = "Total Appearances") +
  coord_flip()

Conclusion

Based on my analysis,Bob Ross looked at a lot of paintings with trees. Tree and Trees were the two most common words in the dataset.