There are a number of different packages and functions that allow you to make graphs in R. Each of them has their own pros and cons (balancing flexibility and ease of use).
The key takeaway is that pretty much any graph that you want to make, whether it is simply for exploratory reasons or you want to present a well formatted and nice looking graph in your manuscript, will be possible in R.
Data visualization is unfortunately something that is underused in psychology. It is often easier to demonstrate a pattern using a graph than it is through statistics or descriptive language. Further, it can be easier to find errors in your data or outliers in the dataset.
ggplot2 is a package within the tidyverse
collection, so if you loaded the tidyverse, you don’t need
to load ggplot. If not, let’s do this now:
library(tidyverse)
# Let's also load the psych package, we have just a couple of psych functions
library(psych)ggplot2 is also a system for creating graphics, based on
The Grammar of Graphics (gg). You provide the data, tell
ggplot2 how to map variables to aesthetics, what graphical
primitives to use (geoms), and it takes care of the details.
ggplot2 relies on a layer system of
Basic layers:
And, we can add more:
ggplot is an incredibly flexible function that allows
you to make pretty much any graph you’d ever imagined.
What is different from ggplot vs. other functions though is that the syntax used to generate the plots follows a slightly different logic.
The “gg” before plot stands for Grammar of Graphics so it uses a system where you can add layers to your graph by adding them to a general type of graph.
ggplot2 is also part of the tidyverse set of packages so it works very well with the other data manipulation functions in a pipeline.
Here is how you can think about “layering” a plot with ggplot:
Provide the name of the dataset and make it known that you want to create a plot with ggplot
All plots (regardless of what they are) start with:
ggplot(data = name_of_dataset)For example:
ggplot(data = mtcars) # this will not give you much, because we are missing the aesthetics and geoms! We should have at least the 3 basics layers!Tell the function which variables will be involved from your dataset.
ggplot(data = name_of_data, aes(x = X_var, y = Y_var))For example:
ggplot(data = mtcars, aes(x = wt, y = mpg))Look above…the mapping is there, but we’re missing the geoms! That’s why we can’t see any actual data, because R doesn’t know how we want to visualize it (e.g., lines, dots, triangles, etc…)
The x variable will be presented on the x axis of the graph (horizontal axis) and the y variable will be presented on the y axis (vertical axis)
Tell the function about the characteristics for the plot you want to make
Here’s what the code would look like to make a scatterplot (i.e., using points/circles to represent the data):
ggplot(data = name_of_data, aes(x = X_var, y = Y_var)) + geom_point()For example:
ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point()Add additional features to change the appearance of your graph
You can add in features to change colours, point size or shape, change the scale of axes, add titles, etc.
E.g.,
ggplot(data = name_of_data, aes(x = X_var, y = Y_var)) + geom_point(size = , shape = ) +
ggtitle("My graph") + theme_bw()For example:
ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point(size = 5, shape = 1, colour = "light blue") +
ggtitle("relationship between Miles per Gallon and Car Weight") + theme_dark()# Try to play around with the shape, size, and colours!This code adds a title to the top of the graph and changes the appearance to a black background graph (but you can change the colour of points or lines if you want). There are many different ways that you can customize your graph and change its appearance.
If you’re interested in learning more about plotting in R, I would recommend checking out the free online textbook R Graphics Cookbook at https://r-graphics.org/index.html. There are chapters that show how to make common graphs and a chapter devoted to customizing the appearance of ggplot graphs (Chapter 9).
Chapter 3 in the R for Data Science book (https://r4ds.had.co.nz/data-visualisation.html) is also a good resource for getting started with graphics using the ggplot2 package.
To demonstrate the different graphs let’s use a fun dataset that is available in R: msleep
This dataset provides information about the sleep habits of various mammals!
# msleep msleep$nameHistograms are useful for examining univariate distributions (i.e., the distribution of a single variable). These can be used descriptively or for diagnostic purposes to examine extreme univariate scores.
Here’s the most basic histogram of the distribution of depression at baseline using a few different functions:
# base R - no package loading required
hist(msleep$sleep_total)multi.hist(msleep$sleep_total)# ggplot2 (loaded as part of the tidyverse)
ggplot(data = msleep, aes(x = sleep_total)) + geom_histogram(bins = 10, fill = "purple",
colour = "black") + theme_bw()You may have noticed that these plots all looked slightly different despite the same data being plotted. One thing that you may want to change is the number of bars that are shown in a histogram. These are called breaks or bins. Less breaks make chunkier bars and more breaks make narrower bars.
You can also add titles, change the colour of the bars, add names to the x and y axes, etc. You can always examine help files to find out how much more can be customized or check out the resources described above.
# examples with too small a number of breaks
hist(msleep$sleep_total, breaks = 4, main = "Histogram with few breaks")hist(msleep$sleep_total, breaks = 40, main = "Histogram with many breaks")Boxplots are another graphical technique to examine a univariate distribution. If no outliers: the thick line in the middle is the median, and the upper and lower parts of the box = the 25th and 75th percentiles with the “whiskers” being the minimum and maximum. With outliers, the box is interpreted the same but the whiskers extend to 1.5 * the interquartile range and anything outside of that is a dot or asterisk, representing an outlier.
# ggplot
ggplot(data = msleep, aes(y = sleep_total)) + geom_boxplot()Some customization examples:
ggplot(data = msleep, aes(y = sleep_total)) + geom_boxplot(colour = "midnightblue",
fill = "light yellow") + labs(y = "Total Daily Hours of Sleep") + theme_light()You can also examine boxplots by a grouping variable to compare distribution shapes
# notice that ggplot provides the NAs as a category too (which we may or may not want)
ggplot(data=msleep, aes(x=vore, y=sleep_total))+
geom_boxplot() # to remove NAs we need to filter first. ggplot is part of tidyverse so it works with dplyr functions. This means we can use the pipeline (notice we have to remove the data= argument in ggplot though)
msleep %>%
filter(is.na(vore)==FALSE) %>% # removing the NA "category" in vore
ggplot(aes(x=vore, y=sleep_total))+
geom_boxplot() # sideways boxplot
ggplot(data=msleep, aes(x=vore, y=sleep_total))+
geom_boxplot() + coord_flip()Scatterplots are probably the most useful graph in psychology because we tend to deal with multivariate data or want to examine the pattern of how two variables vary together. Any correlation or regression based analyses would benefit from examining scatterplots (for effects of interest and model diagnostics). We will be using these a fair amount in Stat II.
In our dataset we can investigate the relationship between the animal’s total sleep time and its REM sleep.
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total)) + geom_point()# Colours reflect different group
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, colour = vore)) + geom_point(size = 3) +
ggtitle("Total Sleep and REM by Vore")# Shape = different group
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, shape = vore)) + geom_point()# getting a nonparametric curve for the full dataset
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total)) + geom_point(aes(colour = vore)) +
geom_smooth(se = FALSE) # se=TRUE gives confidence bands# if you left colour in the first line of the code it would give smoothed lines
# for each group, e.g.,
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, colour = vore)) + geom_point() +
geom_smooth(se = FALSE)# separate mini plots per group
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, colour = vore)) + geom_point() +
facet_wrap(~vore)Line graphs are commonly used when you want to see how things change over conditions or longitudinally
For line graphs, we will use the ToothGrowth dataset we were working with.
tg <- ToothGrowthHere is a case where having the data in long format is helpful because the graphs will want a single variable denoting time or condition for use in the plot
tg$dose <- as.factor(tg$dose)
# Combining dplyr data wrangling with ggplot2
tg %>%
group_by(dose) %>%
summarise(mean_len = mean(len)) %>%
ggplot(aes(x = dose, y = mean_len, group = 1)) + geom_line() + geom_point()# with points added and different colour lines by supp
tg %>%
group_by(dose, supp) %>%
summarise(mean_len = mean(len)) %>%
ggplot(aes(x = dose, y = mean_len, group = supp)) + geom_line(aes(color = supp)) +
geom_point(aes(color = supp))# with points added and different colour lines by supp and different plots
tg %>%
group_by(dose, supp) %>%
summarise(mean_len = mean(len)) %>%
ggplot(aes(x = dose, y = mean_len, group = supp)) + geom_line(aes(color = supp)) +
geom_point(aes(color = supp)) + facet_wrap(~supp)plotlypromotations <- read.csv("https://raw.githubusercontent.com/udialter/intro-to-plotly/main/promotation%20data%20clean.csv")
library(plotly)
head(promotations) X year promoted sex
1 1 1998 147 Females
2 2 1999 146 Females
3 3 2000 250 Females
4 4 2001 484 Females
5 5 2002 476 Females
6 6 2003 426 Females
p <- ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, colour = vore)) +
geom_point(size = 3) + ggtitle("Total Sleep and REM by Vore")
pggplotly(p)# first, creating a ggplot2 object. If issues arise with the font roboto or and
# error message mentioning polygon (usually in macs), you can # or delete the
# hrbrthemes::theme_ft_rc()+ line below (line 45)
p <- ggplot(promotations, aes(x = year, y = promoted, group = sex)) + geom_line(aes(color = sex),
size = 1.5) + geom_point(aes(color = sex), size = 3) + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1))
p <- p + scale_color_brewer(palette = "Set2") + ggtitle("Canadian Armed Forces Officer Promotions by Sex") +
labs(x = "Year", y = "Number of personnel promoted", size = 3)
4[1] 4
ggsave("CAF.png", plot = p, dpi = 700)
#' # ggplotly
(intplt <- ggplotly(p, tooltip = c("x", "y", "group")))#' ## saving as html object
saveWidget(intplt, "ggplotlyeg.html", selfcontained = F, libdir = "lib/")