General Updates

  • If you haven’t submitted HW3 already, it needs to be done before Wednesday’s class.
  • HW 4 will be available tonight or tomorrow.
    • It will be similar to the lab. Please try to be here Wednesday for the lab.
  • HW 4 is due Friday midnight, instead of noon. You will be reusing the data from HW2 and joining the data in R.
    • I have my second all day comprehensive exam that Friday (Nov. 5th) so I won’t start grading until Saturday Morning.
  • Reminder to look for a meme/image/chart/news article that you want to use for the final project.

Quarantine Time

I am not expecting you to understand everything that is going on. We are mostly having a little fun before diving into more code examples.

#install.packages("tidyverse")
#install.packages("cowplot")
#install.packages("RColorBrewer")
#install.packages("readxl")


library("tidyverse")
library("cowplot")
library("RColorBrewer")
library("readxl")

I created an object called exp_activities and stored my 6 different activities in it.

exp_activities <- c("Book Reviews",
                    "Get my life together", 
                    "Research",
                    "Sew or something else crafty",
                    "Something fun",
                    "Workout")
                    
exp_activities # look at object to check work
## [1] "Book Reviews"                 "Get my life together"        
## [3] "Research"                     "Sew or something else crafty"
## [5] "Something fun"                "Workout"

I created another object to store my five activities that actually happened.

reality_activities <- c( "Book Reviews",
                         "Crochet a blanket",
                         "Lesson Plan",
                         "Stare at the Wall",
                         "Unintentional Naps")
                         
reality_activities 
## [1] "Book Reviews"       "Crochet a blanket"  "Lesson Plan"       
## [4] "Stare at the Wall"  "Unintentional Naps"

Now, I want to assign a time value for both my expected and realistic activities

# numbers indicate a percent of my total "free" time                        
exp_timespent <- c(20, 
                   25,
                   35,
                   15,
                   5,
                   5)
class(exp_timespent)
## [1] "numeric"
exp_timespent
## [1] 20 25 35 15  5  5
reality_timespent <- c(10,
                       15,
                       35,
                       5,
                       35)
                        
reality_timespent
## [1] 10 15 35  5 35
lockdown_exp <- data.frame(exp_activities, exp_timespent )

lockdown_exp
##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5
lockdown_reality <- data.frame( reality_activities, reality_timespent)

lockdown_reality
##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35
expectations <- 
ggplot( lockdown_exp, 
        aes( x = "", y = exp_timespent, fill = exp_activities)) + 
        geom_bar( stat = "identity", # Makes a stacked bar graph
                  color = "white", size = 4) +
        coord_polar( "y", start = 0 ) + # But then puts it on a circle
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical",
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
        scale_fill_brewer(palette="RdPu", direction = -1 ) +
        ggtitle("Expectations")
    
expectations

reality <- 
ggplot( lockdown_reality, 
        aes( x = "", y = reality_timespent, fill = reality_activities)) + 
        geom_bar( stat = "identity", color = "white", size = 4) +
        coord_polar( "y", start = 0 ) +
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical", 
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
       scale_fill_brewer(palette="BuPu", direction = -1) +
        ggtitle("Reality")

reality

full_plot <- plot_grid( expectations, reality)

full_plot


Replicating this graph

Step 1

First of all, we need to install a few packages. A package is like a software that contains a list of functions that you can perform in R.

You can install packages in the following way:

# This is the main package we are using in this class
install.packages(tidyverse) 

# This package provides you with color combinations for graphs
install.pacakges(RColorBrewer)

# This package allows to organize multiple plots
install.packages(cowplot)

Once you have the packages installed, you can use them by retrieving them from your library

library(tidyverse)
library(cowplot)
library(RColorBrewer)

Note

  • Always start your script by listing the packages that are needed to reproduce it.

  • You might install packages as you work on a dataset depending on your needs, but make sure to annotate them at the very top.

  • This ensures that anyone can reproduce your code.

  • We will discuss packages more as we go.

Step 2

Now that we have our packages, we start by creating a small set of activities that you were expecting to do during the lockdown

exp_activities <- c("Book Reviews",
                    "Get my life together", 
                    "Research",
                    "Sew or something else crafty",
                    "Something fun",
                    "Workout")

To see what we are doing, type:

exp_activities

Step 3

We now create a list of activities that we actually performed during the lockdown.

  • Note what changes: we change the object name (reality_activities) and the list of activities that will be stored in it.

  • Note what doesn’t change: this part of the code <- c() stays the same

reality_activities <- c( "Book Reviews",
                         "Crochet a blanket",
                         "Lesson Plan",
                         "Stare at the Wall",
                         "Unintentional Naps")
                         
reality_activities
## [1] "Book Reviews"       "Crochet a blanket"  "Lesson Plan"       
## [4] "Stare at the Wall"  "Unintentional Naps"

Step 4

Now that we have the two lists of activities, we need to report how much time we spent on each of them.

Numbers represent a percent of my total “free time” that I spent doing the activity.

exp_timespent <- c(20, 
                   25,
                   35,
                   15,
                   5,
                   5)
                    
exp_timespent
## [1] 20 25 35 15  5  5

Step 5

We do the same for the “reality” activities

reality_timespent <- c(10,
                       15,
                       35,
                       5,
                       35)

Note that you can also write the code in one line as long as you respect the “spacing rules”. Start noticing them!

reality_timespent <- c(10, 15, 35, 5, 35)

Let’s check our output:

reality_timespent

Step 6

Now that we have all information, we combine them into two datasets.

One datasets about expectations…

lockdown_exp <- data.frame(exp_activities, exp_timespent )

lockdown_exp
##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5

…and one about reality

lockdown_reality <- data.frame(reality_activities, reality_timespent)

lockdown_reality
##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35

We can see the datasets that we just created

lockdown_exp
##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5
lockdown_reality
##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35

Step 7

We first create the “Expectations” plot.

library(ggplot2) # ggplot is needed for the ggplot() commands for graphing
# it is included automatically in tidyverse

expectations <- 
ggplot( lockdown_exp, 
        aes( x = "", y = exp_timespent, fill = exp_activities)) + 
        geom_bar( stat = "identity", # Makes a stacked bar graph
                  color = "white", size = 4) +
        coord_polar( "y", start = 0 ) + # But then puts it on a circle
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical",
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
        scale_fill_brewer(palette="RdPu", direction = -1 ) +
        ggtitle("Expectations")
    
expectations


Step 8

Now, we create the “reality” plot.

Look at the code…anything familiar? What changes from step 8? What are some ‘intuitive’ steps?

# Create a pie chart graph representing the activities done in reality.
reality <- 
  ggplot(lockdown_reality,   
         # Call the dataset that I want to use "lockdown_reality"
         aes(x = "", y = reality_timespent,     # y represents amount for each category
             fill = reality_activities)) +   # fill represents the names of the categories
  geom_bar( stat = "identity", color = "white", size = 2) + # Creates stacked bar chart
  coord_polar( "y", start = 0 ) + # puts bar chart on a circle axis (becomes pie chart)
  theme_void() +               # gets rid of extra grid stuff
  theme(legend.position = "bottom",          # Create a legend and put it at the bottom
        legend.title = element_blank(),       # Eliminates legend title
        legend.direction = "vertical",       # legend is vertical list
        plot.title = element_text(hjust = 0.5, size=22, face="bold")) +
  scale_fill_brewer(palette="BuPu", direction = -1) + # flips order of Color theme
  ggtitle("Reality")
reality # view graph

Your code could also be spaced like the chunk below with comments between lines:

reality <- 
  
  ggplot( lockdown_reality, 
          
          aes( x = "", y = reality_timespent, fill = reality_activities)) + 
  
  geom_bar( stat = "identity", color = "white", size = 4) +
  
  coord_polar( "y", start = 0 ) +
  
  theme_void() +
  
  theme( legend.position = "bottom", 
         legend.title = element_blank(), 
         legend.direction = "vertical", 
         plot.title = element_text(hjust = 0.5, size=15, face="bold")) +
  
  scale_fill_brewer(palette="BuPu", direction = -1) +
  
  ggtitle("Reality")


# View graph
reality

Step 9

We can conclude by putting the two plots together.

The plot_grid() command came from the cowplot() package.

full_plot <- plot_grid(expectations, reality) # combine two graphs into one image, side by side

full_plot # View plot


Tidyverse and Pipes

The Tidyverse includes core functions for modifying data:

  • select() allows us to select particular variables. Returns a subset of COLUMNS.
  • filter() allows us to select particular observations, much like Excel’s filter tool. Returns a subset of ROWS.
  • arrange() allows us to sort observations, much like Excel’s sort tool.
  • mutate() allows us to add or change variables.
  • group_by() allows us to group by a category within a variable.
  • summarize() aggregates measures, works with group_by() These functions follow a common syntax that is designed to work with a convenient Tidyverse tool called the “pipe” operator.

The pipe operator is part of the Tidyverse and is written %>%. Recall that an operator is just a symbol like + or * that performs some function on whatever comes before it and whatever comes after it.

Now open the Krauth example!