Bar charts with ggplot2

Here we learn how we create two types of bar charts: (1) A bar chart where the bars show the count of a variable (f. ex.: How many respondents from each country?) and 2) a bar chart where the bars show a comparison between the different mean values of a metric variable for each group of a categorical variable.

For the bar chart of type (1), the y-axis shows the number of objects in each group of the varibale on the x-axis.

For the bar chart of type (2), the y-axis shows the mean (or sometimes min, max or median) value of a second, metric variable for each group of the categorical variable on the x-axis (the groups are compared for their value of the other variable).

We take the European Social Survey data and ask: How many respondents come form each country? Country is a categorical variable (cntry) and goes to the x-axis. The counteies are its groups. The y-axis then shows the count of the subjects that belong to each country. If we define in aes() only one variable and chose geom_bar(), we will get autoamtically the counts on the y-axis.

With xlab and ylab, we label the axes.

#We must first load the tidyverse, read in the data and give it a name ("ess") with the <- operator

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.2

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

ess <- read_csv("C:/Users/petemaur/Teaching/Data/ess_data.csv")

## Rows: 49519 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): cntry
## dbl (11): idno, nwspol, polintr, trstprl, trstep, trstun, vote, gndr, yrbrn,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ggplot(ess, aes(cntry))+
  geom_bar()+
  xlab("Countries in ESS")+
  ylab("Number of respondents")+
  theme_classic()

ok, here is the chart! Note that the y-axis is the count of subjects in each country: For ex. there are about n = 1.500 in Sweden and more than n = 2.000 in Italy.

We can add the sub-grouping in man and woman for each country. For this, we must map gndr to color as a second argument in aes().

ess$gndr <- recode_factor(ess$gndr, "1" = "Male", "2" = "Female")

ggplot(ess, aes(cntry, fill = gndr))+
  geom_bar()+
  xlab("Countries in ESS")+
  ylab("Number of respondents")+
  theme_classic()

Here we go. It gives the divide as stacked bars. If we want a bar for each group next to each other, we must set the argument position = “dodge” in geom_bar().

Bar chart of type (2):

Now, let’s assume we want to compare trust in their national parliament. We have it in the variable “trstprl” that measures trust as a metric variable on a 10-point rating scale, 0 meaning no trust, 10 meaning highest trust. We can calculate the mean value of it for each country and comapre the means. To create this type of bar chart, we need to map “trstprl” to the y-axis and tell ggplot to show its mean.

The mapping is done with the ggplot() function. The setting of the y-axis to show the mean is done with the stat_summary() function. In this function, we tell ggplot what type of summary statistics we want with the “fun” argument (here: the mean) and what geom object we want with the geom argument (geom = “bar”).

ggplot(ess, aes(cntry, trstprl))+
  stat_summary(fun = "mean", geom = "bar")+
  xlab("Countries in ESS")+
  ylab("Trust in parliament on scale 0-10 (Mean)")+
  theme_classic()

## Warning: Removed 1144 rows containing non-finite values (stat_summary).

We see that the means differ quite a bit on the scale! We could use a different geom object in the summary statistics, for example a point by just changing inside stat_summary the geom argument to “point”:

ggplot(ess, aes(x=forcats::fct_reorder(factor(cntry), trstprl, mean), y = trstprl))+
  stat_summary(fun = "mean", geom = "point")+
  xlab("Countries in ESS")+
  ylab("Trust in parliament on scale 0-10 (Mean)")+
  theme_classic()

## Warning: Removed 1144 rows containing non-finite values (stat_summary).

As a second example, we compare the mean of the news consumption per day between respondents with different political interest. To do this, we must first define polintr as a factor with four groups (see codebook) and then use the same code as for the previous bar chart. The variable mapped to the y-axis is “nwspol” (news consumption in minutes per day) which is a metric (continuous) variable that cannot be counted. So, we compare its mean, again.

ess$polintr <- recode_factor(ess$polintr, "1" = "very interested", "2" = "quite interested", "3" = "hardly interested", "4" = "not interested")

ggplot(ess, aes(polintr, nwspol))+
  stat_summary(fun = "mean", geom = "bar")+
  xlab("Levels of Political Interest")+
  ylab("News Consumption in Minutes/Day")+
  xlim("very interested", "quite interested", "hardly interested", "not interested")+
  theme_classic()

## Warning: Removed 661 rows containing non-finite values (stat_summary).

We see that in the group of the most interested respondents, news consumption is clearly higher than in the other groups!