General Updates

If you haven’t submitted HW3 already, it needs to be done before Wednesday’s class.
HW 4 will be available tonight or tomorrow.
- It will be similar to the lab. Please try to be here Wednesday for the lab.
HW 4 is due Friday midnight, instead of noon. You will be reusing the data from HW2 and joining the data in R.
- I have my second all day comprehensive exam that Friday (Nov. 5th) so I won’t start grading until Saturday Morning.
Reminder to look for a meme/image/chart/news article that you want to use for the final project.

Quarantine Time

I am not expecting you to understand everything that is going on. We are mostly having a little fun before diving into more code examples.

#install.packages("tidyverse")
#install.packages("cowplot")
#install.packages("RColorBrewer")
#install.packages("readxl")


library("tidyverse")
library("cowplot")
library("RColorBrewer")
library("readxl")

I created an object called exp_activities and stored my 6 different activities in it.

exp_activities <- c("Book Reviews",
                    "Get my life together", 
                    "Research",
                    "Sew or something else crafty",
                    "Something fun",
                    "Workout")
                    
exp_activities # look at object to check work

## [1] "Book Reviews"                 "Get my life together"        
## [3] "Research"                     "Sew or something else crafty"
## [5] "Something fun"                "Workout"

I created another object to store my five activities that actually happened.

reality_activities <- c( "Book Reviews",
                         "Crochet a blanket",
                         "Lesson Plan",
                         "Stare at the Wall",
                         "Unintentional Naps")
                         
reality_activities

## [1] "Book Reviews"       "Crochet a blanket"  "Lesson Plan"       
## [4] "Stare at the Wall"  "Unintentional Naps"

Now, I want to assign a time value for both my expected and realistic activities

# numbers indicate a percent of my total "free" time                        
exp_timespent <- c(20, 
                   25,
                   35,
                   15,
                   5,
                   5)
class(exp_timespent)

## [1] "numeric"

exp_timespent

## [1] 20 25 35 15  5  5

reality_timespent <- c(10,
                       15,
                       35,
                       5,
                       35)
                        
reality_timespent

## [1] 10 15 35  5 35

lockdown_exp <- data.frame(exp_activities, exp_timespent )

lockdown_exp

##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5

lockdown_reality <- data.frame( reality_activities, reality_timespent)

lockdown_reality

##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35

expectations <- 
ggplot( lockdown_exp, 
        aes( x = "", y = exp_timespent, fill = exp_activities)) + 
        geom_bar( stat = "identity", # Makes a stacked bar graph
                  color = "white", size = 4) +
        coord_polar( "y", start = 0 ) + # But then puts it on a circle
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical",
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
        scale_fill_brewer(palette="RdPu", direction = -1 ) +
        ggtitle("Expectations")
    
expectations

reality <- 
ggplot( lockdown_reality, 
        aes( x = "", y = reality_timespent, fill = reality_activities)) + 
        geom_bar( stat = "identity", color = "white", size = 4) +
        coord_polar( "y", start = 0 ) +
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical", 
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
       scale_fill_brewer(palette="BuPu", direction = -1) +
        ggtitle("Reality")

reality

full_plot <- plot_grid( expectations, reality)

full_plot

Replicating this graph

Step 1

First of all, we need to install a few packages. A package is like a software that contains a list of functions that you can perform in R.

You can install packages in the following way:

# This is the main package we are using in this class
install.packages(tidyverse) 

# This package provides you with color combinations for graphs
install.pacakges(RColorBrewer)

# This package allows to organize multiple plots
install.packages(cowplot)

Once you have the packages installed, you can use them by retrieving them from your library

library(tidyverse)
library(cowplot)
library(RColorBrewer)

Note

Always start your script by listing the packages that are needed to reproduce it.
You might install packages as you work on a dataset depending on your needs, but make sure to annotate them at the very top.
This ensures that anyone can reproduce your code.
We will discuss packages more as we go.

Step 2

Now that we have our packages, we start by creating a small set of activities that you were expecting to do during the lockdown

exp_activities <- c("Book Reviews",
                    "Get my life together", 
                    "Research",
                    "Sew or something else crafty",
                    "Something fun",
                    "Workout")

To see what we are doing, type:

exp_activities

Step 3

We now create a list of activities that we actually performed during the lockdown.

Note what changes: we change the object name (reality_activities) and the list of activities that will be stored in it.
Note what doesn’t change: this part of the code <- c() stays the same

reality_activities <- c( "Book Reviews",
                         "Crochet a blanket",
                         "Lesson Plan",
                         "Stare at the Wall",
                         "Unintentional Naps")
                         
reality_activities

## [1] "Book Reviews"       "Crochet a blanket"  "Lesson Plan"       
## [4] "Stare at the Wall"  "Unintentional Naps"

Step 4

Now that we have the two lists of activities, we need to report how much time we spent on each of them.

Numbers represent a percent of my total “free time” that I spent doing the activity.

exp_timespent <- c(20, 
                   25,
                   35,
                   15,
                   5,
                   5)
                    
exp_timespent

## [1] 20 25 35 15  5  5

Step 5

We do the same for the “reality” activities

reality_timespent <- c(10,
                       15,
                       35,
                       5,
                       35)

Note that you can also write the code in one line as long as you respect the “spacing rules”. Start noticing them!

reality_timespent <- c(10, 15, 35, 5, 35)

Let’s check our output:

reality_timespent

Step 6

Now that we have all information, we combine them into two datasets.

One datasets about expectations…

lockdown_exp <- data.frame(exp_activities, exp_timespent )

lockdown_exp

##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5

…and one about reality

lockdown_reality <- data.frame(reality_activities, reality_timespent)

lockdown_reality

##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35

We can see the datasets that we just created

lockdown_exp

##                 exp_activities exp_timespent
## 1                 Book Reviews            20
## 2         Get my life together            25
## 3                     Research            35
## 4 Sew or something else crafty            15
## 5                Something fun             5
## 6                      Workout             5

lockdown_reality

##   reality_activities reality_timespent
## 1       Book Reviews                10
## 2  Crochet a blanket                15
## 3        Lesson Plan                35
## 4  Stare at the Wall                 5
## 5 Unintentional Naps                35

Step 7

We first create the “Expectations” plot.

library(ggplot2) # ggplot is needed for the ggplot() commands for graphing
# it is included automatically in tidyverse

expectations <- 
ggplot( lockdown_exp, 
        aes( x = "", y = exp_timespent, fill = exp_activities)) + 
        geom_bar( stat = "identity", # Makes a stacked bar graph
                  color = "white", size = 4) +
        coord_polar( "y", start = 0 ) + # But then puts it on a circle
        theme_void() +
        theme( legend.position = "bottom", legend.title = element_blank(), legend.direction = "vertical",
        plot.title = element_text(hjust = 0.5, size=15, face="bold"))+
        scale_fill_brewer(palette="RdPu", direction = -1 ) +
        ggtitle("Expectations")
    
expectations

Step 8

Now, we create the “reality” plot.

Look at the code…anything familiar? What changes from step 8? What are some ‘intuitive’ steps?

# Create a pie chart graph representing the activities done in reality.
reality <- 
  ggplot(lockdown_reality,   
         # Call the dataset that I want to use "lockdown_reality"
         aes(x = "", y = reality_timespent,     # y represents amount for each category
             fill = reality_activities)) +   # fill represents the names of the categories
  geom_bar( stat = "identity", color = "white", size = 2) + # Creates stacked bar chart
  coord_polar( "y", start = 0 ) + # puts bar chart on a circle axis (becomes pie chart)
  theme_void() +               # gets rid of extra grid stuff
  theme(legend.position = "bottom",          # Create a legend and put it at the bottom
        legend.title = element_blank(),       # Eliminates legend title
        legend.direction = "vertical",       # legend is vertical list
        plot.title = element_text(hjust = 0.5, size=22, face="bold")) +
  scale_fill_brewer(palette="BuPu", direction = -1) + # flips order of Color theme
  ggtitle("Reality")
reality # view graph

Your code could also be spaced like the chunk below with comments between lines:

reality <- 
  
  ggplot( lockdown_reality, 
          
          aes( x = "", y = reality_timespent, fill = reality_activities)) + 
  
  geom_bar( stat = "identity", color = "white", size = 4) +
  
  coord_polar( "y", start = 0 ) +
  
  theme_void() +
  
  theme( legend.position = "bottom", 
         legend.title = element_blank(), 
         legend.direction = "vertical", 
         plot.title = element_text(hjust = 0.5, size=15, face="bold")) +
  
  scale_fill_brewer(palette="BuPu", direction = -1) +
  
  ggtitle("Reality")


# View graph
reality

Step 9

We can conclude by putting the two plots together.

The plot_grid() command came from the cowplot() package.

full_plot <- plot_grid(expectations, reality) # combine two graphs into one image, side by side

full_plot # View plot

Tidyverse and Pipes

The Tidyverse includes core functions for modifying data:

select() allows us to select particular variables. Returns a subset of COLUMNS.
filter() allows us to select particular observations, much like Excel’s filter tool. Returns a subset of ROWS.
arrange() allows us to sort observations, much like Excel’s sort tool.
mutate() allows us to add or change variables.
group_by() allows us to group by a category within a variable.
summarize() aggregates measures, works with group_by() These functions follow a common syntax that is designed to work with a convenient Tidyverse tool called the “pipe” operator.

The pipe operator is part of the Tidyverse and is written %>%. Recall that an operator is just a symbol like + or * that performs some function on whatever comes before it and whatever comes after it.

Now open the Krauth example!

Week 10: October 25

Alea Wilbur

10/20/2021