Overview

Sebastian Kurz is the Austrian Peoples’ Party (List Kurz/ÖVP) leading candidate for the upcoming general elections and according to all opinion polls bound to secure most of the votes. Pursuing an electoral campaign which has deliberately put him as an individual into the center (rather than his party) and portrayed his campaign as a broad ‘movement’ (rather than a traditional party contest), his website https://www.sebastian-kurz.at/unterstuetzer invites visitors to ‘join’ by submitting their names to a public list of supporters.

The following is a brief analysis of some of the information publicly available on these supporters, implemented in R. As always, there are likely to be more elegant ways to do some of the coding. If you see a particularly glaring misdeed, feel free to let me know.

Collection of data

While the content of static html websites can normally conveniently scrapped with the rvest package, Kurz’s website is pretty complex (at least to me) with various dynamically created elements, plenty of java scripts etc. As far as I understood, rvest turns out to be unable to do the trick in such cases. Instead, the jsonlite package allows us to retrieve the requested information via the website’s API.

library(jsonlite)
library(dplyr)
library(tidyr)

baseurl <- "https://dabei.sebastian-kurz.at/api/v1/83f9221ff237566f8f55187277dd1019/commitments/038ed334b8d9314a01007dd9748c055c/people?"
pages <- list()

all <- data.frame(supporter.surname=character(),
                        supporter.name=character(),
                        released=character(),
                        created=character(),
                        message=character())
for(i in 1:300)
try(
  {  
  mydata <- jsonlite::fromJSON(paste0(baseurl, "&page_nr=", i), flatten=TRUE)
  message("Retrieving page ", i)
  pages[[i+1]] <- mydata
  i.surname <- as.data.frame(mydata$commitment_module_people$person.surname)
  i.name <- as.data.frame(mydata$commitment_module_people$person.name)
  i.released_date <- as.data.frame(mydata$commitment_module_people$released_date)
  i.created_date <- as.data.frame(mydata$commitment_module_people$created_at)
  
  all.i <- bind_cols(i.surname, i.name, i.released_date, i.created_date, i.message)
  names(all.i) <- c("supporter.surname","supporter.name","released","created","message")
     
  if (nrow(all.i)==0) break
  
  all <- bind_rows(all, all.i)
  
  }
)

Kurz.df <- all
Kurz.df$created <-  as.POSIXct(Kurz.df$created, tz="Europe/Vienna")
Kurz.df$released <- as.POSIXct(Kurz.df$released, tz="Europe/Vienna")

Number of supporters

library(dplyr)
library(padr)
library(tidyr)
library(ggplot2)

flow.creation <- Kurz.df %>%
  arrange(created) %>%
  thicken(interval="day", colname="day", by="created")%>%
  group_by(day) %>%
  summarise(freq.created=n()) %>%
  pad() %>%
  fill_by_value(value=0) %>%
  mutate(freq.created.cum=cumsum(freq.created))

flow.release <- Kurz.df %>%
  arrange(released)%>%
  thicken(interval="day", colname="day", by="released")%>%
  group_by(day)%>%
  summarise(freq.release=n())%>%
  pad()%>%
  fill_by_value(value=0)%>%
  mutate(freq.release.cum=cumsum(freq.release))

flow <- full_join(flow.creation, flow.release, by="day")

flow <-  flow %>%
  select(day, ends_with("cum"))%>%
  gather("created_released","number",ends_with("cum"))

flow.plot <- flow %>%
  ggplot(.,aes(day,number))+
  geom_line(aes(color=created_released))+
  theme_minimal()+
  labs(title="Number of listed supporters on Kurz's website",
        subtitle="Data retrieved from www.sebastian-kurz.at",
        caption="Roland Schmidt, @zoowalk",
        y="number of listed supporters",
        x="")+
  theme(legend.position = "bottom",
        legend.title=element_blank())+
  scale_y_continuous(limits = c(0,8000))+
  scale_color_manual(label=c("profiles created", "profiles published"),
                     values=c("darkorange","darkblue"))

The collected data contains the time when a supporter joined Kurz’s list as well as the time when his or her entry was published (apparently every new entry was reviewed by an admin before being published). With these two dates, the development of Kurz’s list of supporter can be displayed.

As of 12 October, Kurz’s public list of supporters contained more than 7,400 individuals. Starting in late June, the list rapidly grew to roughly 4,000 individuals by early/mid July and continued to grow albeit with a lower pace until mid-August. From mid-August, the number of new entries again rose and continued to do so up until now.

I was somewhat wondering what might be behind the increased slope of the curves starting in the second half of August. I don’t have any definite answer, but interestingly the slope starts to increase on 18 August. On 18 August, Kurz had two media appearances, one in the most important radio news show (“Mittagsjournal”) as well as one in the main evening TV news (“ZIB 2”).

Gender of supporters

# >> data houskeeping ------------------------------------------------------
titles <- c("Mag\\.", "MMag\\.", "Prof\\.", "Dr\\.", "O\\.Univ\\.Prof", "Ing\\.", "Dipl\\.","jun\\.")
remove.list <- paste(unlist(titles), collapse = "|")

Kurz.df$mod.names <- gsub("\\-|\\+"," ",Kurz.df$supporter.name) #double names
Kurz.df$mod.names <- gsub("ü","ue", Kurz.df$mod.names)
Kurz.df$mod.names <- gsub(remove.list,"",Kurz.df$mod.names, ignore.case=TRUE)
library(stringr)
Kurz.df$mod.names <- word(Kurz.df$mod.names, -1)

# >> infering gender ------------------------------------------------------
library(gender)
Kurz.g<- gender(Kurz.df$mod.names)%>%
  distinct()

Kurz.df <- left_join(Kurz.df,Kurz.g[,c("name","gender")], by=c("mod.names"="name"))
Kurz.df$gender[is.na(Kurz.df$gender)] <- "not known"

g <- Kurz.df %>%
  group_by(gender)%>%
  summarise(freq=n())%>%
  mutate(rel=round(freq/sum(freq)*100,2))

When becoming head of the party in early 2017, Kurz announced to introduce a gender ‘zipper-system’ for the party’s electoral lists, meaning each male candidate should be preceded/followed by a female candidate. Although the eventual electoral outcome can be still changed due to preferential votes, the overall intention would be to strengthen gender parity within a party which so far had hardly been known to be progressive on women’s issues.

Partly triggered by this development, I was wondering about the men-women ratio among Kurz’s supporters. While the information provided on the public supporters’ list does not provide any specific information on each supporter’s gender, it is - with some reservations - possible to infer a person’s gender from its first name. The relatively new gender package provides for this possibility in R. Following a few ‘data house-keeping steps’ such as removing academic titles (it’s Austria after all) and taking care of relevant ‘umlauts’ (ü = ue), we get the following distribution:

library(kableExtra)
knitr::kable(g,format="html", caption="Gender of Kurz's supporters") %>%
  kable_styling(bootstrap_options = c("basic"), full_width = FALSE)
Gender of Kurz’s supporters
gender freq rel
female 2177 29.27
male 4674 62.84
not known 587 7.89

Out of 7438 supporters on Kurz’s website 62,84 % (= 4674) were identified as having male first names; a bit less than 30 % (= 2177) as having female first names; and 7,89 % of the first names could not be assigned to one of the two categories (with some more tweaking the number should further decrease). To put it differently, more than 60 % of those publicly endorsing Sebastian Kurz on his website are men, ‘only’ 30 % are women. I have no idea how this number compares with other candidates, nor would I be able to attribute this discrepancy to specific factor(s). However, I would not be surprised if it is a mix of the party’s and candidate’s appeal as well as gender specific behavior when it comes to publicly endorsing a politician, particularly on the web.

Combining the date of each supporter’s website entry with the inferred gender allows to trace the development of the gender composition of Kurz’s list of supporters over time. By and large the ratio of male - female remained stable.

flow.gender <- Kurz.df %>%
  arrange(created)%>%
  thicken(interval="day", colname="day", by="created")%>%
  group_by(day, gender)%>%
  summarise(freq=n())%>%
  group_by(gender)%>%
  pad(interval="day", end_val = max(Kurz.df$created))%>% #since no observations for last interval of "not known"
  fill_by_value(value=0)%>%
  mutate(freq.cum=cumsum(freq))%>%
  group_by(day)%>%
  mutate(max=cumsum(freq.cum))%>%
  mutate(min=lag(max,1,0))

gender.plot <- flow.gender %>%
  ggplot(.,aes(day), group=gender)+
  geom_ribbon(aes(fill=gender, ymin=min, ymax=max))+
  theme_minimal()+
  labs(title="Number and gender of listed supporters on Kurz's website",
       subtitle="Data retrieved from www.sebastian-kurz.at; gender inferred from first name",
       caption="Roland Schmidt, @zoowalk",
       y="number of listed supporters",
       x="")+
  theme(legend.position = "bottom",
        legend.title=element_blank())+
  scale_fill_manual(label=c("female", "male", "not known"),
                    values=c("#FF8C00", "#458B74","grey"))+
  scale_y_continuous(limits = c(0,8000))+
  guides(fill=guide_legend(keywidth = 3, keyheight = 0.7))
print(gender.plot)