Tweaking axes and using colors

As always, we load first the tidyverse package, load the ESS-data and prepare the data. We filter only the respondents from Sweden:

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.2

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

ess <- read_csv("C:/Users/petemaur/Teaching/Data/ess_data.csv")

## Rows: 49519 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): cntry
## dbl (11): idno, nwspol, polintr, trstprl, trstep, trstun, vote, gndr, yrbrn,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Recode a metric (numeric) variable to a categorical variable (factor)

ess$gndr <- recode_factor(ess$gndr, "1" = "Male", "2" = "Female")
ess$vote <- recode_factor(ess$vote, "1" = "voted", "2" = "abstained")

## Warning: Unreplaced values treated as NA as `.x` is not compatible.
## Please specify replacements exhaustively or supply `.default`.

ess$polintr <- recode_factor(ess$polintr, "1" = "very", "2" = "quite", "3" = "hardly", "4" = "not at all")

#Recode ålder

ess <- mutate(ess, age=2018-yrbrn)

#Filter Sweden

se <- filter(ess, cntry == "SE")

We take again the age in years as our variabe to play with. We first produce a basic histogram and then use two functions to tweak the x-axis and label the x- and y-axis. The function to tweak the x-axis is x_scale_continuous(), which has several arguments: n.breaks = n adds more ticks to the axis, limits = c(x,x) shortens or expands the axis. But careful: values wich are beyond the upper and the lower limits will be deleted from the chart which can be problematic when you use a function of the variable like the mean.

The function calls for changing axes are always: x_scale_ or y_scale and then continuous or discrete followed by (). More arguments can be specified in the parentheses.

The command ylabs() or xlabs() labels the axis.

ggplot(se, aes(age))+
  geom_histogram(binwidth = 3)+
  theme_classic()

ggplot(se, aes(age))+
  geom_histogram(binwidth = 3)+
  scale_x_continuous(limits = c(15,75), n.breaks = 10, name = "Deltagarnas ålder")+
  ylab("Antall")+
  theme_classic()

## Warning: Removed 179 rows containing non-finite values (stat_bin).

## Warning: Removed 2 rows containing missing values (geom_bar).

Look how the plot has changed with the new layers, x_scale_continuous() and ylab().

Next, we bring colors in and map the colors scale to a group variable (a discrete variable with four groups), namely political interest (“polintr”). The four groups are high interest, quite high interest, rather low interest and no interest. When we map in the aes() color to polintr, we see how many in each age range (a bin on the x-axis) belong to each group. We can then see if younger or orlder or middle aged respondents have more or less political interest.

ggplot(se, aes(age, fill = polintr))+
  geom_histogram(binwidth = 2)+
  scale_x_continuous(limits = c(18,68), n.breaks = 30, name = "Deltagarnas ålder")+
  scale_y_continuous(name = "Antall", n.breaks = 10)+
  scale_fill_brewer(palette = "Oranges", guide_legend("Det politiska intresset"))+
  theme_classic()+
  theme(legend.position = "bottom")

## Warning: Removed 422 rows containing non-finite values (stat_bin).

## Warning: Removed 10 rows containing missing values (geom_bar).

You see that we also changed the position of the legend (or guide) to the bottom. This is done with the theme() function call and the legend.position = argument.

In the next example, we compare the amount of trust in the Riksdag between the different groups of political interest. We assume that people who are more interested have also more trust. We use a new type of chart, the point chart. It functions like a bar chart but uses points to indicate the value of each group on the y-axis (amount of trust). Since we compare groups, we need a summary statistics for trust, like the mean for each group.

The mean is calculated with the stat_summary() function and the fun = mean argument. In this function, we can also choose the geom and its size. If we want to plot a summary function and no count of variable, we must use the stat_summary() function instead of a geom_xxx() function.

In the chart, we want the gender to be represented by color. We can use automatic colors or we can select a color palette. This is done with the scale_color_brewer() function and the palette = argument.

ggplot(remove_missing(se, vars = "polintr"), aes(polintr, trstprl, color=gndr))+
  stat_summary(fun = "mean", geom = "point", size = 4)+
  scale_y_continuous(n.breaks = 10)+
  scale_color_brewer(palette = "Dark2", guide = guide_legend(title = "Könn"))+
  scale_x_discrete(labels = c("sterk", "ganska sterk", "ganska svag", "svag"))+
  ylab("Tillit i Riksdag (0-10)")+
  xlab("Nivån på det politiska interesset")+
  theme_classic()

## Warning: Removed 2 rows containing missing values.

## Warning: Removed 12 rows containing non-finite values (stat_summary).

We also changed the labels of the groups with the labels = c() argument in the scale_x_discrete() function and the title of the legend with guide_legend(title = ““) function.

To see the color palettes available in ggplot2, use the following command:

RColorBrewer::display.brewer.all()

Now, look at these two charts: we compare here the average years of education between respondents who have voted in the last election (group 1 ) and who have abstained (group 2). As a an additional grouping variable, we take gender and map it to the shape scale (instead of the color scale which we used before). With the color = argument, we can now decide which color the shapes should have (use website linked on Canvas to know the colors).

We have also used more arguments in the scale_ functions to tweak the size of the scale labels and their color.

We have changed the grid lines in the theme() function with the color = and the linetype = arguments. You can see the different linetypes

ggplot(remove_missing(se, vars = "vote"), aes(vote, eduyrs, shape = gndr))+
  stat_summary(fun = "mean", geom = "point", size = 5, color = "brown3")+
  scale_y_continuous(n.breaks = 20, limits = c(0,NA))+
  scale_x_discrete(labels = c("har deltatt", "har inte deltatt"))+
  scale_shape_discrete(labels = c("Menn", "Kvinnor"), guide_legend(title = "Genus"))+
  ylab("Antal år inom utbildning" )+
  xlab("Har deltagit i det senaste valet?")+
  ggtitle("Whaaat?")+
  theme_classic()+
  theme(panel.grid.minor = element_line(color = "azure4", linetype = "dashed"), plot.title = element_text(size = 20, face = "bold", colour = "darkred"))+
  theme(axis.text.x = element_text(size = 12, color = "darkolivegreen"), axis.title.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 12, color = "darkblue"))

## Warning: Removed 68 rows containing missing values.

## Warning: Removed 17 rows containing non-finite values (stat_summary).

ggplot(remove_missing(se, vars = "vote"), aes(vote, eduyrs, shape = gndr))+
  stat_summary(fun = "mean", geom = "point", size = 5, color = "skyblue")+
  scale_y_continuous(n.breaks = 20, limits = c(0,NA))+
  scale_x_discrete(labels = c("har deltatt", "har inte deltatt"))+
  scale_shape_discrete(labels = c("Menn", "Kvinnor"), guide_legend(title = "Genus"))+
  ylab("Antal år inom utbildning" )+
  xlab("Har deltagit i det senaste valet?")+
  ggtitle("Whaaat?")+
  theme_classic()+
  theme(panel.grid.minor = element_line(color = "firebrick", linetype = "dashed"), plot.title = element_text(size = 20, face = "bold", colour = "orange"))+
  theme(axis.text.x = element_text(size = 12, color = "yellow"), axis.title.x = element_text(size = 14))+
  theme(axis.text.y = element_text(size = 12, color = "purple"))

## Warning: Removed 68 rows containing missing values.
## Removed 17 rows containing non-finite values (stat_summary).

To see the linetypes, use the help function with ?xxx

?linetype

## starting httpd help server ... done