Let’s load my needed packages:
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Let’s add in the data from github and then sum the columns to see which objects he uses the most.
bob <- read.csv("https://raw.githubusercontent.com/jhumms/DATA607/main/bobross/elements-by-episode.csv",header = TRUE, sep = ",")
bob %>% select(!c(0:2)) %>% summarise_all(list(sum))
## APPLE_FRAME AURORA_BOREALIS BARN BEACH BOAT BRIDGE BUILDING BUSHES CABIN
## 1 1 2 17 27 2 7 1 120 69
## CACTUS CIRCLE_FRAME CIRRUS CLIFF CLOUDS CONIFER CUMULUS DECIDUOUS DIANE_ANDRE
## 1 4 2 28 8 179 212 86 227 1
## DOCK DOUBLE_OVAL_FRAME FARM FENCE FIRE FLORIDA_FRAME FLOWERS FOG FRAMED GRASS
## 1 1 1 1 24 1 1 12 23 53 142
## GUEST HALF_CIRCLE_FRAME HALF_OVAL_FRAME HILLS LAKE LAKES LIGHTHOUSE MILL MOON
## 1 22 1 1 18 143 0 1 2 3
## MOUNTAIN MOUNTAINS NIGHT OCEAN OVAL_FRAME PALM_TREES PATH PERSON PORTRAIT
## 1 160 99 11 36 38 9 49 1 3
## RECTANGLE_3D_FRAME RECTANGULAR_FRAME RIVER ROCKS SEASHELL_FRAME SNOW
## 1 1 1 126 77 1 75
## SNOWY_MOUNTAIN SPLIT_FRAME STEVE_ROSS STRUCTURE SUN TOMB_FRAME TREE TREES
## 1 109 1 11 85 40 1 361 337
## TRIPLE_FRAME WATERFALL WAVES WINDMILL WINDOW_FRAME WINTER WOOD_FRAMED
## 1 1 39 34 1 1 69 1
Would you have guessed it is trees, mountains, and clouds?
Let’s check out the column names for the dataset.
colnames(bob)
## [1] "EPISODE" "TITLE" "APPLE_FRAME"
## [4] "AURORA_BOREALIS" "BARN" "BEACH"
## [7] "BOAT" "BRIDGE" "BUILDING"
## [10] "BUSHES" "CABIN" "CACTUS"
## [13] "CIRCLE_FRAME" "CIRRUS" "CLIFF"
## [16] "CLOUDS" "CONIFER" "CUMULUS"
## [19] "DECIDUOUS" "DIANE_ANDRE" "DOCK"
## [22] "DOUBLE_OVAL_FRAME" "FARM" "FENCE"
## [25] "FIRE" "FLORIDA_FRAME" "FLOWERS"
## [28] "FOG" "FRAMED" "GRASS"
## [31] "GUEST" "HALF_CIRCLE_FRAME" "HALF_OVAL_FRAME"
## [34] "HILLS" "LAKE" "LAKES"
## [37] "LIGHTHOUSE" "MILL" "MOON"
## [40] "MOUNTAIN" "MOUNTAINS" "NIGHT"
## [43] "OCEAN" "OVAL_FRAME" "PALM_TREES"
## [46] "PATH" "PERSON" "PORTRAIT"
## [49] "RECTANGLE_3D_FRAME" "RECTANGULAR_FRAME" "RIVER"
## [52] "ROCKS" "SEASHELL_FRAME" "SNOW"
## [55] "SNOWY_MOUNTAIN" "SPLIT_FRAME" "STEVE_ROSS"
## [58] "STRUCTURE" "SUN" "TOMB_FRAME"
## [61] "TREE" "TREES" "TRIPLE_FRAME"
## [64] "WATERFALL" "WAVES" "WINDMILL"
## [67] "WINDOW_FRAME" "WINTER" "WOOD_FRAMED"
I like all the columns and I do not have anything I want to remove, so I will just add them in again. But to maek sure I’m doing more than just that, I’ll also make them all lowercase
colnames(bob) <-tolower(colnames(bob))
It looks like there are some features we could reduce, like frame, or nature. Just to keep it easy, let’s add in if it happened on land (Including Rivers and Lakes) or water.
bob$land_water <- ifelse(bob$beach > 0, "water",
ifelse(bob$boat > 0, "water",
ifelse(bob$lighthouse > 0, "water",
ifelse(bob$ocean > 0, "water",
ifelse(bob$palm_trees > 0, "water",
"land")))))
#Now let's see if the column was created
head(bob$land_water)
## [1] "land" "land" "land" "land" "land" "land"
Now let’s get a quick bar graph to see if Bob Ross favors land or water.
ggplot(bob, aes(x="", y=land_water, fill=land_water)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0)
And there we have it, there is a reason Bob Ross is synonmous with landscapes.
If I were to continue to look through Bob’s data, I would want to add a few columns, the amout of times his squirrel appeared on the show, and how many of his trees were ‘happy’. For this I would probably go through The Joy of Paintings transcripts and gather the data to do so.