R Pubs Link
Github Repository

Overview

This report will conduct exploratory data analysis using data from FiveThirtyEight’s ‘elements-by-episode’ dataset. This dataset tracks 67 different elements found in Bob Ross paintings featured in the TV show, “The Joy of Painting”. For more information about this dataset and a statistical analysis of the work of boss, please see this article.

Summary of the data

To better understand the dataset, we can run the following to return a vector showing how many rows and columns are within our dataset titled data:

dim(data)
## [1] 403  69

To view summary statistics for the first six columns of the dataframe data, we run the following:

summary(data[1:6])
##    EPISODE             TITLE            APPLE_FRAME       AURORA_BOREALIS   
##  Length:403         Length:403         Min.   :0.000000   Min.   :0.000000  
##  Class :character   Class :character   1st Qu.:0.000000   1st Qu.:0.000000  
##  Mode  :character   Mode  :character   Median :0.000000   Median :0.000000  
##                                        Mean   :0.002481   Mean   :0.004963  
##                                        3rd Qu.:0.000000   3rd Qu.:0.000000  
##                                        Max.   :1.000000   Max.   :1.000000  
##       BARN             BEACH      
##  Min.   :0.00000   Min.   :0.000  
##  1st Qu.:0.00000   1st Qu.:0.000  
##  Median :0.00000   Median :0.000  
##  Mean   :0.04218   Mean   :0.067  
##  3rd Qu.:0.00000   3rd Qu.:0.000  
##  Max.   :1.00000   Max.   :1.000

For the columns that describe the paintings, the data are binary, meaning if an element is present in the painting, the value will be recorded as 1. If the element is not present, the value will be 0. Since there are 69 columns present, it may be better to focus on a particular element, such as whether the paintings are framed.

Basic Exploratory Data Analysis

Currently, the data are at the episode level, meaning each row depicts the elements of a painting present in a given episode. For this report, we are exploring which paintings are framed and will add a column that records which season the painting was featured in:

# creating a new column titled 'SEASON'
data_v2 <- data |> 
  mutate(SEASON = substr(EPISODE, 2,3)) 

# selecting columns that contain the term FRAME
data_v3 <- data_v2 |> 
  select(EPISODE,TITLE,SEASON, contains("FRAME"))

To calculate the percentage of total episodes that include a framed painting, and the percentage of total episodes that include an unframed painting we run the following:

# calculating sum of framed paintings
sum(data_v3$FRAMED == 0)/sum(data_v3$FRAMED == 0|1)*100
## [1] 86.84864
# calculating sum of unframed paintings
sum(data_v3$FRAMED == 1)/sum(data_v3$FRAMED == 0|1)*100
## [1] 13.15136

Findings

Bob Ross did not typically frame his paintings. Frames were only featured in about 13% of the episodes. Additionally, Bob Ross did not feature a framed painting until Season 4, using a circle frame. When Bob Ross incorporated a frame for a featured painting, he most commonly used the oval frame.