This week’s data comes from Eurovision which is a song contest for European countries. The data describes the event location, year, section etc. as well as the competitors’ ranks, points, and associated country.
library(tidyverse)
library(skimr)
library(dplyr)
eurovision <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-17/eurovision.csv')
eurovision_votes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-17/eurovision-votes.csv')
To start off, I tend to use glimpse and skim to better understand the structure of the data as well as the structure of each variable in the data frame.
glimpse(eurovision)
skim(eurovision)
For this Tidy Tuesday, I wanted to see the number of self-votes (i.e. when a country votes for itself) in Eurovision.
eurovision_votes <- eurovision_votes %>%
mutate(self.voting = if_else(from_country == to_country, 1, 0))
votes_country <- eurovision_votes %>%
group_by(from_country) %>%
summarise(Total.self.votes = sum(self.voting))
Finally, I plotted it! I wanted to include all countries so apologies for small y-axis font. I do think it is interesting to see which countries voted for themselves the most versus those that did the least. A potential cool follow-up would be to facet by something like year or decade. Potentially, self-votes are associated with historical events in the country. For example, maybe Russia voted for itself the most during the Cold War. Maybe if I have time in the future, I will play around with this.
votes_country %>%
ggplot(aes(x = reorder(from_country, Total.self.votes), y = Total.self.votes, fill = reorder(from_country, Total.self.votes)))+
geom_bar(stat = "identity", show.legend = FALSE)+
theme_bw()+
theme(axis.text.y=element_text(size=6))+
scale_y_continuous(breaks=seq(0,60,by=10))+
coord_flip()+
xlab("Country")+
ylab("Number of Self-Votes")+
ggtitle("Number of Self-Votes in Eurovision, per Country (1956-2022)")