For this example, we’re going to build on the same example we used in the slides for Lab #9 (Spring 22)!

“Specialized High Schools = Well Rounded Education?”

An education researcher is interested in the role of specialized High Schools in education. The researcher hypothesizes that specialized high schools only prioritize the subjects they specialize in, to the detriment of other subjects. She also wants to know whether or not funding might affect these patterns of results, so she measures. She compares two types of specialized high schools, Arts High Schools and Science High Schools, from two different cities, Boulder and Rifle, CO. Within each of those high school types, she gets measures of students’ average Science GPA and students’ Humanities GPA for 8 schools.

Creating the hs Dataset

library(ggplot2)
library(psych)
library(tidyverse)

set.seed(123) # don't worry about this, it's just for creating the data below
hs <- data.frame(
  School = c(1:16),
  city = rep(rep(c("Boulder", "Rifle"), each = 16)),
  Sch.Type = rep(rep(c("Science","Arts"), each = 8), times = 2),
  humGPA = round(c(rnorm(n = 8, mean = 3.6, sd = .2),rnorm(n = 8, mean = 3.8, sd = .1), rnorm(n = 8, mean = 3.2, sd = .3), rnorm(n = 8, mean = 3.5, sd = .2)),2), # creating random data for the GPAs
  sciGPA = round(c(rnorm(n = 8, mean = 3.8, sd = .05),rnorm(n = 8, mean = 3.6, sd = .3), rnorm(n = 8, mean = 3.5, sd = .2), rnorm(n = 8, mean = 3.4, sd = .4)),2) # and again here
)

head(hs) # double check that it looks right
##   School    city Sch.Type humGPA sciGPA
## 1      1 Boulder  Science   3.49   3.84
## 2      2 Boulder  Science   3.55   3.84
## 3      3 Boulder  Science   3.91   3.84
## 4      4 Boulder  Science   3.61   3.83
## 5      5 Boulder  Science   3.63   3.83
## 6      6 Boulder  Science   3.94   3.80

Looks good! Each school has one city (Boulder or Rifle), one school type (Arts or Science), and two measures of GPA (humanities, or humGPA, and science, or sciGPA). We also have one column that describes the School’s ID, or an assigned number. That won’t be needed in the graphing process, but we will need it to conduct linear mixed effect models going forward. So just keep that under your hat.

Also, notice from the head(hs) output that this dataset is currently in wide format. To properly graph your data using ggplot, we need to change it to long format. I’ll do this using the pivot_longer() command we used in class, but there are plenty of ways to do this.

From wide => long format

hs.long <- hs %>%
  pivot_longer(cols = c(humGPA, sciGPA), names_to = "GPA.Type", values_to = "GPA")

head(hs.long) # double checking it worked
## # A tibble: 6 × 5
##   School city    Sch.Type GPA.Type   GPA
##    <int> <chr>   <chr>    <chr>    <dbl>
## 1      1 Boulder Science  humGPA    3.49
## 2      1 Boulder Science  sciGPA    3.84
## 3      2 Boulder Science  humGPA    3.55
## 4      2 Boulder Science  sciGPA    3.84
## 5      3 Boulder Science  humGPA    3.91
## 6      3 Boulder Science  sciGPA    3.84

Looks good again! Each school is now represented twice in the dataset, once with its science GPA and once with its humanities GPA. That’s what we need for graphing!

Creating the Graph

People create their graphs in a lot of different ways, but I’ll show you how I do it.

First, I create an object, which I usually call “descriptive name.plot”. In this object, I include which dataset to use (that is, your long-form dataset), and the basic “aesthetics” (that is, what the x and y axis variables should be, as well as the factor I want to “fill” in the graph with).

Step One: Creating graph object, defining the aesthetics of the graph

gpa.plot <- ggplot(hs.long,                # dataset name goes first
                   aes(x = city,           # x axis variable here
                       y = GPA,            # y axis variable here
                       fill = Sch.Type))   # "fill" variable here

Next, I will tell ggplot what type of graph I want to make. I usually use barplots for graphing multiple categorical variables, so that’s what I’ll do. ALSO, these steps will build on each other!

Step Two: Defining the graph type

gpa.plot <- ggplot(hs.long,                
                   aes(x = city,           
                       y = GPA,            
                       fill = Sch.Type)) +  # add a '+' to the above code
  geom_bar(stat = 'summary',             # geom_bar specifies this will be a bargraph
           fun = 'mean',                 # stat = 'summary' and fun = 'mean' tells R to have the bar graphs be means
           position = position_dodge(.9))  # position_dodge tells R to put the bars side-by-side; I always use .9 in there but you can change that if you'd like!

Let’s look at the graph so far!

gpa.plot

So far so good. Now let’s work on the graph appearance, like colors and the name of the legend.

Step Three: Graph appearance

Since we have the basic graph defined above in the object “gpa.plot”, we can just add the appearance-related arguments and functions to that object.

There are a ton of options here, so I’m going to do my generic approach, and you can pick and choose what you want!

gpa.plot +
  theme_classic() + # theme_classic() gets rid of the grid in the background
  scale_fill_manual("School Type", # scale_fill_manual allows you to specify which colors you want your graph to be; and
#the "" at the beginning allows you to rename your legend
                    values = c('steelblue2','forestgreen')) + # these values are your chosen colors!
  coord_cartesian(ylim = c(0,4)) + # this line lets you extend the axes beyond the data
  xlab("City") # and xlab() and ylab() allow you to rename your axis labels! I think GPA is a good ylab so I'm leaving it alone.

Graph all together in one spot! Plus error bars.

gpa.plot <- ggplot(hs.long,                
                   aes(x = city,           
                       y = GPA,            
                       fill = Sch.Type)) +
  geom_bar(stat = 'summary',
           fun = 'mean',
           position = position_dodge(.9)) + 
    stat_summary(fun.data = mean_se, # this argument creates error bars using the standard error of the raw means!
               geom = "errorbar",
               position = position_dodge(.9),
               width=.3,
               aes(x = city, y = GPA, fill = Sch.Type))

gpa.plot +
  theme_classic() +
  scale_fill_manual("School Type",
                    values = c('steelblue2','forestgreen')) + 
  coord_cartesian(ylim = c(0,4)) +
  xlab("City")

More Options??

I highly recommend playing around, especially with the second “appearance” part:

gpa.plot +
  theme_classic() +
  scale_fill_brewer("School Type",        # scale_fill_brewer let you pick from pre-set "palettes"
                    palette = 'Paired') + 
  coord_cartesian(ylim = c(0,4)) +
  xlab("City")

gpa.plot +
  theme_dark() +
  scale_fill_manual("School Type",
                    values = c("cornsilk1","deepskyblue1"),
                    labels = c("Art School","Science School")) + 
  coord_cartesian(ylim = c(0,4)) +
  ggtitle("Graph with Dark Background") +
  xlab("City")

gpa.plot + 
  theme_classic() +
  scale_fill_manual("School Type",
                    values = c('steelblue2','forestgreen')) +
  ggtitle("Graph!") +
  theme(plot.title = element_text(color="red", size=14, face="bold.italic"),
        axis.title.x = element_text(color="blue", size=14, face="bold"),
        axis.title.y = element_text(color="#993333", size=14, face="bold")
) 

But also try messing with the first part too! The following two graphs are only from changing the geom_XXX argument!

gpa.plot <- ggplot(hs.long,                
       aes(x = city,
           y = GPA,
           fill = Sch.Type)) +
  geom_violin(position = position_dodge(.9)) +
  geom_point(stat = 'summary', fun = 'mean', position = position_dodge(.9)) +
  stat_summary(fun.data = mean_se, 
               geom = "errorbar",
               position = position_dodge(.9),
               width=.3,
               aes(x = city, y = GPA, fill = Sch.Type))
gpa.plot

gpa.plot <- ggplot(hs.long,                
       aes(x = city,
           y = GPA,
           fill = Sch.Type)) +
  geom_point(stat = 'summary',
           fun = 'mean',
           position = position_dodge(.9),
           aes(col = Sch.Type)) +
  stat_summary(fun.data = mean_se, 
               geom = "errorbar",
               position = position_dodge(.9),
               width=.3,
               aes(x = city, y = GPA, fill = Sch.Type))
gpa.plot

Too many options here, honestly. But this should be enough to get you started!