Data Analysis of the Brexit referendum results

In the following data analysis I make an attempt to understand the relationships between the Leave vote share and other political, social and demographic variables.

myfile <- "./MyDatasets/brexit_results.csv"
brexit_data <- read.csv(file = myfile, header = TRUE, stringsAsFactors = FALSE)
str(brexit_data)
## 'data.frame':    632 obs. of  11 variables:
##  $ Seat       : chr  "Aldershot" "Aldridge-Brownhills" "Altrincham and Sale West" "Amber Valley" ...
##  $ con_2015   : num  50.6 52 53 44 60.8 ...
##  $ lab_2015   : num  18.3 22.4 26.7 34.8 11.2 ...
##  $ ld_2015    : num  8.82 3.37 8.38 2.98 7.19 ...
##  $ ukip_2015  : num  17.87 19.62 8.01 15.89 14.44 ...
##  $ leave_share: num  57.9 67.8 38.6 65.3 49.7 ...
##  $ born_in_uk : num  83.1 96.1 90.5 97.3 93.3 ...
##  $ male       : num  49.9 48.9 48.9 49.2 48 ...
##  $ unemployed : num  3.64 4.55 3.04 4.26 2.47 ...
##  $ degree     : num  13.87 9.97 28.6 9.34 18.78 ...
##  $ age_18to24 : num  9.41 7.33 6.44 7.75 5.73 ...

1 - Diving into the structure and variation of variables

head(brexit_data)
summary(brexit_data)
##      Seat              con_2015        lab_2015        ld_2015      
##  Length:632         Min.   : 0.00   Min.   : 0.00   Min.   : 0.000  
##  Class :character   1st Qu.:22.09   1st Qu.:17.67   1st Qu.: 2.975  
##  Mode  :character   Median :40.85   Median :31.20   Median : 4.581  
##                     Mean   :36.60   Mean   :32.30   Mean   : 7.809  
##                     3rd Qu.:50.84   3rd Qu.:44.37   3rd Qu.: 8.571  
##                     Max.   :65.88   Max.   :81.30   Max.   :51.491  
##                                                                     
##    ukip_2015       leave_share      born_in_uk         male      
##  Min.   : 0.000   Min.   :20.48   Min.   :40.73   Min.   :46.86  
##  1st Qu.: 9.193   1st Qu.:45.33   1st Qu.:86.42   1st Qu.:48.61  
##  Median :13.727   Median :53.69   Median :92.48   Median :49.02  
##  Mean   :13.104   Mean   :52.06   Mean   :88.15   Mean   :49.07  
##  3rd Qu.:17.106   3rd Qu.:60.15   3rd Qu.:95.42   3rd Qu.:49.43  
##  Max.   :44.432   Max.   :75.65   Max.   :98.02   Max.   :53.05  
##                                                                  
##    unemployed        degree         age_18to24    
##  Min.   :1.837   Min.   : 5.099   Min.   : 5.735  
##  1st Qu.:3.229   1st Qu.:10.786   1st Qu.: 7.301  
##  Median :4.195   Median :14.694   Median : 8.283  
##  Mean   :4.373   Mean   :16.711   Mean   : 9.294  
##  3rd Qu.:5.206   3rd Qu.:19.594   3rd Qu.: 9.601  
##  Max.   :9.527   Max.   :51.098   Max.   :32.684  
##                  NA's   :59

2 - Plot the histograms of Parties vote share

library(dplyr)
library(ggplot2)
cons15 <- ggplot(data=brexit_data, aes(x=con_2015)) + geom_histogram(bins = 9, color = "skyblue", fill = "blue")
cons15

labs15 <- ggplot(data=brexit_data, aes(x=lab_2015)) + geom_histogram(bins = 9, color = "pink", fill = "tomato")
labs15

ukip15 <- ggplot(data=brexit_data, aes(x=ukip_2015)) + geom_histogram(bins = 9, color = "royalblue", fill = "lightblue")
ukip15

3 - Seats with the largest support for each party

I want to identify the 10 seats having the largest vote share for each party and for Leave

top10 <- brexit_data %>% arrange(desc(con_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(lab_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(ukip_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(leave_share))
head(top10, n = 10)

4 - Analyse Covariation

I try to discover any possible relationships between the Leave vote share and other variables.

gg <- ggplot(data = brexit_data, aes(x = ukip_2015, y = leave_share, colour = born_in_uk > 85)) + geom_point()
gg + geom_smooth(method = "lm", se = FALSE) + theme_bw()

gg <- ggplot(data = brexit_data, aes(x = born_in_uk, y = leave_share, colour = unemployed > 5.0)) + geom_point()
gg + geom_smooth(method = "lm", se = FALSE) + theme_bw()

5 - Linear Model

The previous scatterplot shows a clear positive relationship between support for UKIP in 2015 election and Leave vote share. I want to build a linear model to better understand this relationship. But first, I need to identify and remove the outliers: seats where Ukip vote share is zero (I guess they are located outside England).

brexit_data[brexit_data$ukip_2015 == 0, ]
model <- lm(formula = leave_share ~ ukip_2015, data = brexit_data, subset = ukip_2015 > 0)

summary(model)
## 
## Call:
## lm(formula = leave_share ~ ukip_2015, data = brexit_data, subset = ukip_2015 > 
##     0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.9652  -3.3253   0.4821   4.2692  15.5815 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 31.25210    0.57448   54.40   <2e-16 ***
## ukip_2015    1.56997    0.03875   40.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.91 on 612 degrees of freedom
## Multiple R-squared:  0.7285, Adjusted R-squared:  0.728 
## F-statistic:  1642 on 1 and 612 DF,  p-value: < 2.2e-16
plot(model$fitted.values, model$residuals, cex = 0.7)

6 - Mosaic plot

The following mosaic plot displays the Leave vote share for 6 combinations of unemployment rates and percentage of people born in the UK.

new_data <- brexit_data %>% 
  mutate(Leave_pct = cut_interval(x = leave_share, n = 2), 
         Unemp = cut_number(x = unemployed, n = 3), 
         UK_born_over_85 = born_in_uk > 85.0)

mosaicplot(~ UK_born_over_85 + Unemp + Leave_pct, data = new_data, color = 2:3, las = 1, main = "Brexit 2016 - Leave vote share")

7 - Further Analysis

I want to figure out if there is any relationship between Leave vote share and support for Conservative Party. I define a new variable ConAdv which expresses the difference between Conservative and Labour.

brexit_data <- brexit_data %>% mutate(ConAdv = cut_interval(x = con_2015 - lab_2015, n = 4))

gg <- ggplot(data = brexit_data, aes(x = ConAdv, y = leave_share)) + geom_boxplot()
gg + theme_bw()