For this example, we’re going to build on the same example we used in the slides for Lab #9 (Spring 22)!
“Specialized High Schools = Well Rounded Education?”
An education researcher is interested in the role of specialized High Schools in education. The researcher hypothesizes that specialized high schools only prioritize the subjects they specialize in, to the detriment of other subjects. She also wants to know whether or not funding might affect these patterns of results, so she measures. She compares two types of specialized high schools, Arts High Schools and Science High Schools, from two different cities, Boulder and Rifle, CO. Within each of those high school types, she gets measures of students’ average Science GPA and students’ Humanities GPA for 8 schools.
Creating the hs Dataset
library(ggplot2)
library(psych)
library(tidyverse)
set.seed(123) # don't worry about this, it's just for creating the data below
hs <- data.frame(
School = c(1:16),
city = rep(rep(c("Boulder", "Rifle"), each = 16)),
Sch.Type = rep(rep(c("Science","Arts"), each = 8), times = 2),
humGPA = round(c(rnorm(n = 8, mean = 3.6, sd = .2),rnorm(n = 8, mean = 3.8, sd = .1), rnorm(n = 8, mean = 3.2, sd = .3), rnorm(n = 8, mean = 3.5, sd = .2)),2), # creating random data for the GPAs
sciGPA = round(c(rnorm(n = 8, mean = 3.8, sd = .05),rnorm(n = 8, mean = 3.6, sd = .3), rnorm(n = 8, mean = 3.5, sd = .2), rnorm(n = 8, mean = 3.4, sd = .4)),2) # and again here
)
head(hs) # double check that it looks right## School city Sch.Type humGPA sciGPA
## 1 1 Boulder Science 3.49 3.84
## 2 2 Boulder Science 3.55 3.84
## 3 3 Boulder Science 3.91 3.84
## 4 4 Boulder Science 3.61 3.83
## 5 5 Boulder Science 3.63 3.83
## 6 6 Boulder Science 3.94 3.80
Looks good! Each school has one city (Boulder or Rifle), one school type (Arts or Science), and two measures of GPA (humanities, or humGPA, and science, or sciGPA). We also have one column that describes the School’s ID, or an assigned number. That won’t be needed in the graphing process, but we will need it to conduct linear mixed effect models going forward. So just keep that under your hat.
Also, notice from the head(hs) output that this dataset is currently in wide format. To properly graph your data using ggplot, we need to change it to long format. I’ll do this using the pivot_longer() command we used in class, but there are plenty of ways to do this.
From wide => long format
hs.long <- hs %>%
pivot_longer(cols = c(humGPA, sciGPA), names_to = "GPA.Type", values_to = "GPA")
head(hs.long) # double checking it worked## # A tibble: 6 × 5
## School city Sch.Type GPA.Type GPA
## <int> <chr> <chr> <chr> <dbl>
## 1 1 Boulder Science humGPA 3.49
## 2 1 Boulder Science sciGPA 3.84
## 3 2 Boulder Science humGPA 3.55
## 4 2 Boulder Science sciGPA 3.84
## 5 3 Boulder Science humGPA 3.91
## 6 3 Boulder Science sciGPA 3.84
Looks good again! Each school is now represented twice in the dataset, once with its science GPA and once with its humanities GPA. That’s what we need for graphing!
Creating the Graph
People create their graphs in a lot of different ways, but I’ll show you how I do it.
First, I create an object, which I usually call “descriptive name.plot”. In this object, I include which dataset to use (that is, your long-form dataset), and the basic “aesthetics” (that is, what the x and y axis variables should be, as well as the factor I want to “fill” in the graph with).
Step One: Creating graph object, defining the aesthetics of the graph
gpa.plot <- ggplot(hs.long, # dataset name goes first
aes(x = city, # x axis variable here
y = GPA, # y axis variable here
fill = Sch.Type)) # "fill" variable hereNext, I will tell ggplot what type of graph I want to make. I usually use barplots for graphing multiple categorical variables, so that’s what I’ll do. ALSO, these steps will build on each other!
Step Two: Defining the graph type
gpa.plot <- ggplot(hs.long,
aes(x = city,
y = GPA,
fill = Sch.Type)) + # add a '+' to the above code
geom_bar(stat = 'summary', # geom_bar specifies this will be a bargraph
fun = 'mean', # stat = 'summary' and fun = 'mean' tells R to have the bar graphs be means
position = position_dodge(.9)) # position_dodge tells R to put the bars side-by-side; I always use .9 in there but you can change that if you'd like!Let’s look at the graph so far!
gpa.plotSo far so good. Now let’s work on the graph appearance, like colors and the name of the legend.
Step Three: Graph appearance
Since we have the basic graph defined above in the object “gpa.plot”, we can just add the appearance-related arguments and functions to that object.
There are a ton of options here, so I’m going to do my generic approach, and you can pick and choose what you want!
gpa.plot +
theme_classic() + # theme_classic() gets rid of the grid in the background
scale_fill_manual("School Type", # scale_fill_manual allows you to specify which colors you want your graph to be; and
#the "" at the beginning allows you to rename your legend
values = c('steelblue2','forestgreen')) + # these values are your chosen colors!
coord_cartesian(ylim = c(0,4)) + # this line lets you extend the axes beyond the data
xlab("City") # and xlab() and ylab() allow you to rename your axis labels! I think GPA is a good ylab so I'm leaving it alone.Graph all together in one spot! Plus error bars.
gpa.plot <- ggplot(hs.long,
aes(x = city,
y = GPA,
fill = Sch.Type)) +
geom_bar(stat = 'summary',
fun = 'mean',
position = position_dodge(.9)) +
stat_summary(fun.data = mean_se, # this argument creates error bars using the standard error of the raw means!
geom = "errorbar",
position = position_dodge(.9),
width=.3,
aes(x = city, y = GPA, fill = Sch.Type))
gpa.plot +
theme_classic() +
scale_fill_manual("School Type",
values = c('steelblue2','forestgreen')) +
coord_cartesian(ylim = c(0,4)) +
xlab("City")More Options??
I highly recommend playing around, especially with the second “appearance” part:
gpa.plot +
theme_classic() +
scale_fill_brewer("School Type", # scale_fill_brewer let you pick from pre-set "palettes"
palette = 'Paired') +
coord_cartesian(ylim = c(0,4)) +
xlab("City")gpa.plot +
theme_dark() +
scale_fill_manual("School Type",
values = c("cornsilk1","deepskyblue1"),
labels = c("Art School","Science School")) +
coord_cartesian(ylim = c(0,4)) +
ggtitle("Graph with Dark Background") +
xlab("City")gpa.plot +
theme_classic() +
scale_fill_manual("School Type",
values = c('steelblue2','forestgreen')) +
ggtitle("Graph!") +
theme(plot.title = element_text(color="red", size=14, face="bold.italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="#993333", size=14, face="bold")
) But also try messing with the first part too! The following two graphs are only from changing the geom_XXX argument!
gpa.plot <- ggplot(hs.long,
aes(x = city,
y = GPA,
fill = Sch.Type)) +
geom_violin(position = position_dodge(.9)) +
geom_point(stat = 'summary', fun = 'mean', position = position_dodge(.9)) +
stat_summary(fun.data = mean_se,
geom = "errorbar",
position = position_dodge(.9),
width=.3,
aes(x = city, y = GPA, fill = Sch.Type))
gpa.plotgpa.plot <- ggplot(hs.long,
aes(x = city,
y = GPA,
fill = Sch.Type)) +
geom_point(stat = 'summary',
fun = 'mean',
position = position_dodge(.9),
aes(col = Sch.Type)) +
stat_summary(fun.data = mean_se,
geom = "errorbar",
position = position_dodge(.9),
width=.3,
aes(x = city, y = GPA, fill = Sch.Type))
gpa.plotToo many options here, honestly. But this should be enough to get you started!