R training: How to make graphs

IMPACT Initiatives - Iraq (Apr 2021)

Pretext

In this training we will discuss how to plot data using a variety of packages.

If you need inspiration for different data visualization options, check this link. For many of the displayed examples there exists some sort of package to replicate the respective data visualization.

1. Plots in base R

R allows you to quickly plot data using base R code only, which can be quite handly for preliminary analysis or when you are just trying to make sense of your dataset. A cheatsheet with relevant base R functions can be found here.

1.1 plot()

The quickest way to plot data in R is by using the plot() function. Simply call a column in your dataset and wrap the function around it:

# to make the graphs less crowded we pick the first 100 entries from the dataset
data2 <- slice_head(data, n = 100)

# now we plot the sliced data
plot(data2$calc_total_income)

When you create plot like this, it appears in the “Plots” pane on the bottom right. Every circle stands for one observation.

You can also plot two variables against each other to study the relationship between them. The following example shows that income and expenditure are somewhat positively correlated.

plot(data2$calc_total_income, data2$calc_total_expenditure)

1.2 barplot()

A simple bar plot with frequencies of responses can be made like this:

# first, create an object displaying the frequencies of answer options
freq <- table(data$nationality)

# then plot the data
barplot(freq)

1.3 hist()

To better understand your dataset, a quick histogram can be quite handy:

hist(data2$calc_total_income)

1.4 boxplot()

A quick boxplot can be drawn as simple as this:

boxplot(data2$calc_total_expenditure)

back to top

2. Plots in ggplot2

The base R plots are easy and quick to make, and are therefore very useful for preliminary analysis. However, if you want to create graphs that you can use for reports or presentations, you better rely on one of the many packages that are out there.

The most widely use package to make graphs is ggplot2 from the tidyverse (which was created by the developers of R Studio). It is reliable, well documented, and has a number of packages that build on top of it.

Lots of examples of graphs and how they are coded in ggplot2 (and others) can be found here. More documentation on the package is found here. A ggplot2 cheatsheet can be found here.

2.1 Basic syntax

ggplot2 always follows the same basic syntax. First, you call the ggplot() function, which typically includes an argument calling the data frame with the values you want to plot, and an aes() statement, in which you specify the different dimension of your plot. Additional arguments are then added to the ggplot() function as needed using the + operator, which is the ggplot2 equivalent of dplyr’s pipe operator (%>%).

A basic bar plot displaying the counts per nationality is built by adding geo_bar() to the expression like so:

ggplot(data, aes(x = nationality)) + 
  geom_bar()

You could make this graph look nicer by adding a couple of elements to the expression:

ggplot(data, aes(x = nationality, fill = nationality)) + 
  geom_bar() +
  scale_fill_manual(values = c(rgb(0.93,0.35,0.35), rgb(0.97,0.67,0.67),
                               rgb(0.35,0.35,0.35), rgb(0.67,0.67,0.68))) +
  labs(title = "Number of surveys per nationality", fill = "Nationality") +
  theme(plot.title = element_text(face= "bold", size = 12, hjust = 0.5),
        legend.position = "none",
        panel.background = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.major.y = element_line(colour = "gainsboro"),
        axis.ticks = element_blank()) +
  scale_x_discrete(labels=c("iran" = "Iranian", "palestinian_territories" = "Palestinian",
                              "syria" = "Syrian", "turkey" = "Turkish"))

There are many ways to style your plots. You can use pre-defined themes as well as change individual elements. Whatever it is that you want to modify, google is your friend.

2.2 Long format

In order to plot data with ggplot2, you will need your data to be in long format for some cases (check training “3. Intermediate: Additional R functions” for more details). Let’s say we wanted to create boxplots for income and expenditure (in the same plot). We would transform our data frame like this:

data_long <- gather(data2, variable, value, c(calc_total_income, calc_total_expenditure))

Then we can proceed to plot the data like so:

boxplot <- ggplot(data = data_long, aes(x = variable, y = value)) +
  geom_boxplot()

boxplot

The same logic with long data applies to line graphs with multiple groups. Lets say we wanted to plot the number of daily interviews per nationality:

# First, we make sure the today column has date properties.
data$today <- as.Date(data$today, format = "%d/%m/%Y")

# Then, we summarize the data by counting the number of observations per nationality and day.
data_day <- data %>%
  group_by(nationality, today) %>%
  summarize(obs = n())

# Let's have a look at the data frame with the summarized data.
head(data_day)
## # A tibble: 6 x 3
## # Groups:   nationality [1]
##   nationality today        obs
##   <chr>       <date>     <int>
## 1 iran        2020-09-01     6
## 2 iran        2020-09-02     5
## 3 iran        2020-09-03     5
## 4 iran        2020-09-04    12
## 5 iran        2020-09-05     6
## 6 iran        2020-09-07    24
# The summarize function already returns an output in long format, so we can directly start plotting.
## Note that the aes() expression now has more arguments (i.e. "group") than in the basic example above,
## which is only possible with data in long format.
line <- data_day %>%
  ggplot(aes(x = today, y = obs, group = nationality, color = nationality)) +
  geom_line() + theme(legend.position="bottom")

line

back to top

3. Plots in plotly

plotly is another package to produce nice looking graphs. In addition to ggplot2, plotly adds some fancy looking style options as well as interactive elements.

There are two ways to produce a plotly chart. Either you use the plotly syntax to build your plot (check out the library’s documentation here to find out how to do that), or you simply wrap ggplot2 objects with plotly’s ggplotly() function:

ggplotly(line)
#Let's add a theme to the boxplot and then turn it into an interactive plotly graph
library(ggthemes)
boxplot2 <- boxplot + theme_economist() +  labs(x = "", y = "")
ggplotly(boxplot2)

back to top

4. Plots in highcharter

highcharter is another library, which produces fancy-looking, interactive plots. Documentation is found here.

Here are a few quick examples:

hcboxplot(outliers = FALSE, x = data_long$value, var = data_long$variable)
data_day %>%
  hchart('line', hcaes(x = today, y = obs, group = nationality)) %>%
  hc_exporting(enabled = TRUE)

back to top