In the following data analysis I make an attempt to understand the relationships between the Leave vote share and other political, social and demographic variables.
myfile <- "./MyDatasets/brexit_results.csv"
brexit_data <- read.csv(file = myfile, header = TRUE, stringsAsFactors = FALSE)
str(brexit_data)
## 'data.frame': 632 obs. of 11 variables:
## $ Seat : chr "Aldershot" "Aldridge-Brownhills" "Altrincham and Sale West" "Amber Valley" ...
## $ con_2015 : num 50.6 52 53 44 60.8 ...
## $ lab_2015 : num 18.3 22.4 26.7 34.8 11.2 ...
## $ ld_2015 : num 8.82 3.37 8.38 2.98 7.19 ...
## $ ukip_2015 : num 17.87 19.62 8.01 15.89 14.44 ...
## $ leave_share: num 57.9 67.8 38.6 65.3 49.7 ...
## $ born_in_uk : num 83.1 96.1 90.5 97.3 93.3 ...
## $ male : num 49.9 48.9 48.9 49.2 48 ...
## $ unemployed : num 3.64 4.55 3.04 4.26 2.47 ...
## $ degree : num 13.87 9.97 28.6 9.34 18.78 ...
## $ age_18to24 : num 9.41 7.33 6.44 7.75 5.73 ...
head(brexit_data)
summary(brexit_data)
## Seat con_2015 lab_2015 ld_2015
## Length:632 Min. : 0.00 Min. : 0.00 Min. : 0.000
## Class :character 1st Qu.:22.09 1st Qu.:17.67 1st Qu.: 2.975
## Mode :character Median :40.85 Median :31.20 Median : 4.581
## Mean :36.60 Mean :32.30 Mean : 7.809
## 3rd Qu.:50.84 3rd Qu.:44.37 3rd Qu.: 8.571
## Max. :65.88 Max. :81.30 Max. :51.491
##
## ukip_2015 leave_share born_in_uk male
## Min. : 0.000 Min. :20.48 Min. :40.73 Min. :46.86
## 1st Qu.: 9.193 1st Qu.:45.33 1st Qu.:86.42 1st Qu.:48.61
## Median :13.727 Median :53.69 Median :92.48 Median :49.02
## Mean :13.104 Mean :52.06 Mean :88.15 Mean :49.07
## 3rd Qu.:17.106 3rd Qu.:60.15 3rd Qu.:95.42 3rd Qu.:49.43
## Max. :44.432 Max. :75.65 Max. :98.02 Max. :53.05
##
## unemployed degree age_18to24
## Min. :1.837 Min. : 5.099 Min. : 5.735
## 1st Qu.:3.229 1st Qu.:10.786 1st Qu.: 7.301
## Median :4.195 Median :14.694 Median : 8.283
## Mean :4.373 Mean :16.711 Mean : 9.294
## 3rd Qu.:5.206 3rd Qu.:19.594 3rd Qu.: 9.601
## Max. :9.527 Max. :51.098 Max. :32.684
## NA's :59
I want to identify the 10 seats having the largest vote share for each party and for Leave
top10 <- brexit_data %>% arrange(desc(con_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(lab_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(ukip_2015))
head(top10, n = 10)
top10 <- brexit_data %>% arrange(desc(leave_share))
head(top10, n = 10)
I try to discover any possible relationships between the Leave vote share and other variables.
gg <- ggplot(data = brexit_data, aes(x = ukip_2015, y = leave_share, colour = born_in_uk > 85)) + geom_point()
gg + geom_smooth(method = "lm", se = FALSE) + theme_bw()
gg <- ggplot(data = brexit_data, aes(x = born_in_uk, y = leave_share, colour = unemployed > 5.0)) + geom_point()
gg + geom_smooth(method = "lm", se = FALSE) + theme_bw()
The previous scatterplot shows a clear positive relationship between support for UKIP in 2015 election and Leave vote share. I want to build a linear model to better understand this relationship. But first, I need to identify and remove the outliers: seats where Ukip vote share is zero (I guess they are located outside England).
brexit_data[brexit_data$ukip_2015 == 0, ]
model <- lm(formula = leave_share ~ ukip_2015, data = brexit_data, subset = ukip_2015 > 0)
summary(model)
##
## Call:
## lm(formula = leave_share ~ ukip_2015, data = brexit_data, subset = ukip_2015 >
## 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.9652 -3.3253 0.4821 4.2692 15.5815
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.25210 0.57448 54.40 <2e-16 ***
## ukip_2015 1.56997 0.03875 40.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.91 on 612 degrees of freedom
## Multiple R-squared: 0.7285, Adjusted R-squared: 0.728
## F-statistic: 1642 on 1 and 612 DF, p-value: < 2.2e-16
plot(model$fitted.values, model$residuals, cex = 0.7)
The following mosaic plot displays the Leave vote share for 6 combinations of unemployment rates and percentage of people born in the UK.
new_data <- brexit_data %>%
mutate(Leave_pct = cut_interval(x = leave_share, n = 2),
Unemp = cut_number(x = unemployed, n = 3),
UK_born_over_85 = born_in_uk > 85.0)
mosaicplot(~ UK_born_over_85 + Unemp + Leave_pct, data = new_data, color = 2:3, las = 1, main = "Brexit 2016 - Leave vote share")
I want to figure out if there is any relationship between Leave vote share and support for Conservative Party. I define a new variable ConAdv which expresses the difference between Conservative and Labour.
brexit_data <- brexit_data %>% mutate(ConAdv = cut_interval(x = con_2015 - lab_2015, n = 4))
gg <- ggplot(data = brexit_data, aes(x = ConAdv, y = leave_share)) + geom_boxplot()
gg + theme_bw()