Importing a data set

library(tidyverse)
library(readxl)
library(scales)

ess <- read_excel("C:/Users/Peter Maurer/Documents/_Lehre_Karlstad/MKGB93_DataVisualization/Datasets/Data_labs/ess_data.xlsx",
                  col_types = c("guess", "guess", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric",
                                "numeric", "numeric", "numeric"))

Filtering variables and observations

colnames(ess)
##  [1] "idno"    "cntry"   "nwspol"  "polintr" "trstprl" "trstep"  "trstun" 
##  [8] "vote"    "gndr"    "yrbrn"   "eduyrs"
ess %>%  #selection is not saved in this example
  select(cntry, trstprl, trstep,  trstun,  vote,  gndr) %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO"))

Analyse voting behavior in the selected countries

ess$vote <- recode_factor(ess$vote, "1" = "voted", "2" = "abstained", "3" = "non eligible")

ess %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO")) %>%
  drop_na(vote) %>%
  ggplot(aes(x = vote, fill = vote)) +
  geom_bar()+
  scale_fill_discrete(type = c("skyblue", "gray", "orange"), name = "Voted or not?")+
  labs(x = "Number", y = "Voting", title = "Voting behavior")+
  theme(legend.position = "top")+
  theme_classic()

Using theme() to make the chart more informative and look more professional

We can change the colors, adjust the legend, the title and the subtitle of the chart. This is done with theme() in the last line

ess %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO")) %>%
  drop_na(vote) %>%
  ggplot(aes(x = vote, fill = vote)) +
  geom_bar()+
  scale_fill_manual(values = c( 
                        "voted" = "tomato",
                      "abstained" = "green",
                      "non eligible" = "beige"), name = "Voted or not?")+
  labs(x = "Number", y = "Voting", title = "Voting behavior", subtitle = "With data from the ESS survey from 2020")+
  theme_classic()+
  theme(legend.position = "bottom", plot.title=element_text(hjust=0.5, vjust=1.5, face='bold'),
        plot.subtitle = element_text(hjust=0.5, vjust=1.5))

Making a chart with percent (%) instead of raw numbers (N’s)

Imagine we want to compare voter turnout between countries. We prefer to show the vote data now as relative frequencies (in percent) within each country. That facilitates comparisons between countries with a different total number (total N) of respondents.

To do this, we must first calculate the percent for each value of “vote” within each country. We do this by grouping the respondents into their countries with group_by(). Then we calculate the percentage for each value of “vote” for each country by dividing the n’s for the values “voted”, “abstained”, “etc.”non eligible” by the country-level n. Lastly, we drop the value NA from the data set with drop_na().

ess %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO")) %>%
  group_by(cntry) %>%
  count(vote) %>%
  group_by(cntry) %>%
  mutate(percent = n / sum(n)) %>%
  drop_na()

Now, we can create the chart. We join both steps with a pipe operator %>% between the last function above and ggplot(). It is important that we have loaded the scales package before we start plotting since we use a percent scale on the y axis. With n.breaks = 8, we set the number of percent labels on the y axis.

ess %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO")) %>%
  group_by(cntry) %>%
  count(vote) %>%
  group_by(cntry) %>%
  mutate(percent = n / sum(n)) %>%
  drop_na() %>%
  ggplot(aes(x = cntry, y = percent, fill = vote))+
  geom_bar(stat = "identity", position = "dodge")+
  scale_fill_manual(values = c( 
    "voted" = "tomato",
    "abstained" = "green",
    "non eligible" = "beige"), name = "Voted or not?")+
  scale_y_continuous(labels = percent, n.breaks = 8)+
  labs(x = "Percent", y = "Voting in %", title = "Voting behavior in last election", subtitle = "With data from the ESS survey from 2020")+
  theme_classic()

Making the chart more informative by ordering the bars and adding elements

Now we can make the chart a bit nicer:

ess$cntry <- factor(ess$cntry, levels = c("SE", "DK", "NO", "DE", "IT")) # This turns all other countries into NA

ess %>%
  filter(cntry %in% c("SE", "DE", "DK", "IT", "NO")) %>%
  group_by(cntry) %>%
  count(vote) %>%
  group_by(cntry) %>%
  mutate(percent = n / sum(n)) %>%
  drop_na() %>%
  ggplot(aes(x = cntry, y = percent, fill = vote))+
  geom_bar(stat = "identity", position = "dodge")+
  scale_fill_manual(values = c( 
    "voted" = "tomato",
    "abstained" = "green",
    "non eligible" = "beige"), name = "Voted or not?")+
  scale_y_continuous(labels = percent, n.breaks = 8)+
  scale_x_discrete(labels = c("Sweden", "Denmark", "Norway", "Germany", "Italy"))+
  labs(x = "Percent", y = "Voting in %", title = "Voting behavior in last election", subtitle = "With data from the ESS survey from 2020")+
  theme_classic()+
  theme(legend.position = "bottom", plot.title=element_text(hjust=0.5, vjust=1.5, face='bold'),
        plot.subtitle = element_text(hjust=0.5, vjust=1.5), panel.grid.major.y = element_line(linetype = "dashed", color = "gray40"),
        panel.background = element_rect(fill = "grey90"), axis.text.x = element_text(family = "mono", size = 11))

Exploring correlations visually with scatter plots and geom_jitter()

If we wanted to test if there is a correlation between two numeric (continuous) variables from the ESS data set, like trust in the national parliament and trust in the European parliament, we can use a scatter plot. In a scatter plot, all observations are located as dots in a 2-dimensional coordinate system between the x- and the y-axis. The shape of the distribution shows if there exists a correlation or not. We want to explore this relationship first for Sweden and then for Italy. We also want to know if the relationship is equal for men and women.

ess$gndr <- recode_factor(ess$gndr, "1" = "Male", "2" = "Female")

ess %>%
  filter(cntry %in% c("SE")) %>%
  drop_na() %>%
  ggplot(aes(x = trstprl, y = trstep, color = as_factor(gndr)))+  # map vars x = "trust in national parliament" & y = "trust in European parliament" & color "gender" 
  geom_jitter()+   # ask for a chart type scatter plot
  geom_smooth()+   # add a trend line (smoother) to see the correlation between x and y
  labs(color = "Gender", x = "Trust in Riksdag", y = "Trust in European Parl.", title = "Relationship between two sorts of political trust", subtitle = "ESS data from Sweden, 2020")+  # add a legend
  scale_color_manual(values = c("Male" = "chartreuse", "Female" = "chocolate"))+
  scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10))+
  scale_x_continuous(breaks = seq(0, 10, 1))+
  theme_classic()+
  theme(legend.position = "bottom", plot.background = element_rect(fill = "lightgrey", color = "black"), panel.grid = element_blank(), panel.background = element_rect(fill = "beige"))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The same scatter plot for Italy where the parliament is called Camera dei deputati:

ess %>%
  filter(cntry %in% c("IT")) %>%
  drop_na() %>%
  ggplot(aes(x = trstprl, y = trstep, color = as_factor(gndr)))+  # map vars x = "trust in national parliament" & y = "trust in European parliament" & color "gender" 
  geom_jitter()+   # ask for a chart type scatterplot
  geom_smooth()+   # add a trendline (smoother) to see the correlation between x and y
  labs(color = "Gender", x = "Trust in Camera dei deputati", y = "Trust in European Parl.", title = "Relationship between two sorts of political trust", subtitle = "ESS data from Italy, 2020")+  # add a legend
  scale_color_manual(values = c("Male" = "chartreuse", "Female" = "chocolate"))+
  scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10))+
  scale_x_continuous(breaks = seq(0, 10, 1))+
  theme_classic()+
  theme(legend.position = "bottom", plot.background = element_rect(fill = "lightgrey", color = "black"), panel.grid = element_blank(), panel.background = element_rect(fill = "beige"))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Comparing distributions of metric variables between groups and countries with box plots

Last not least, we can also compare trust in national parliaments with the metric variable “trstprl” between all 29 European countries. We can use box plots for each country. With coord_flip()

Because we have changed “cntry” to factor with only 5 valid values along the way, we must load the data again to have all countries.

ess <- read_excel("C:/Users/Peter Maurer/Documents/_Lehre_Karlstad/MKGB93_DataVisualization/Datasets/Data_labs/ess_data.xlsx",
                  col_types = c("guess", "guess", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric",
                                "numeric", "numeric", "numeric"))

ess %>%
  drop_na %>%
  ggplot(aes(x = cntry, y = trstprl))+
  geom_boxplot(notch = T)+
  coord_flip()+
  labs(x = "Trust in the parliament of [country]", y = "Country", title = "Who trusts their national parliament most in Europe?")+
  theme_classic()

The problem is that the countries are not yet ordered according to the average level of trust. This can be achieved with reorder(). To make this work, “cntry” must be converted to a factor (it is now a character variable) and we must specify that cntry should be ordered according to the mean value of each country in “trstprl”. With the - sign in front of “trstprl”, we determine that the order is from lowest to highest value, without it, it would be from highest to lowest.

ess %>%
  drop_na %>%
  ggplot(aes(x = reorder(as.factor(cntry), -trstprl, mean), y = trstprl))+
  geom_boxplot(notch = T)+
  coord_flip()+
  labs(x = "Country", y = "Trust in the parliament of [country]", title = "Who trusts their national parliament most in Europe?")+
  theme_classic()