This document provides the dataset and source code for Stephen Peplow's article on Corn Laws voting. Each step is described, so what follows is rather laborious.
The base dataset is that produced by Professor W.O. Aydelotte with additions by Professor Iain McLeod of Nuffield College, Oxford. I have added the following information: the ratio of arable land to pastoral land in parishes within each political constituency (taken from the 1836 Tithe Commutation Files and calculated using GIS); the price of wheat in June 1845 at the nearest market to the constituency; the ratio of farm labourers to farmers in each political constituency; and the net flow either into (negative) or out of a political constituency (positive). Also the numbers of cattle per square mile, acreage of corn and numbers of sheep. Full details of sources and calculations are provided in the article. In the article I examine voting records for Members of Parliament in two divisions: the Third Reading of the bill which repealed the Corn Laws on 15 May 1846; and Villiers' unsuccessful motion to repeal the Corn Laws one year earlier. The MPs and the issues were almost exactly the same but the outcomes were very different. The differences can be captured using classification trees.
To get an idea of the characteristics of the political constituencies, I use PCA on the argricultural and religious variables. This is a subset of the main dataset. I have data only on England and Wales.The code below gets the data from my public Dropbox folder, removes some unwanted variables and restricts the constituencies to England and Wales. The variables used in the analysis then appear as a list
constituency <- read.csv("http://dl.dropboxusercontent.com/u/23281950/constituency.csv",
row.names = 1)
const <- subset(constituency, select = c(-sofwash, -ten_total, -constloc), subset = (constloc <
3))
names(const)
## [1] "COFE" "PROTDISS" "RC" "ARGRASS" "WHT1845" "TENANCY"
## [7] "ACREAGE" "FARMLAB" "WHTBAL" "CORN" "CATTLE" "SHEEP"
The R package 'FactoMineR' is used for the analysis
library(FactoMineR)
Now build an object and draw a plot
const.pca <- PCA(const, graph = FALSE)
plot(const.pca, choix = "var")
The plot shows the correlations (after standardisation) between the variables. For example, acreage in corn (CORN) and ratio between farmers and farm labourers (FARMLAB) are highly positively correlated — because arable farming is more labour-intensive than cattle. With this in hand, we examine the voting records.
Read in the first dataset 'pol' (which is contained in a public folder in my Dropbox account) and see the dimensions of the dataset and names of the variables
pol <- read.csv("http://dl.dropboxusercontent.com/u/23281950/pol.csv")
dim(pol)
## [1] 655 22
names(pol)
## [1] "name" "CONTESTED" "constloc" "GOVTOFF" "wealth2"
## [6] "gentry" "CONSTYPE" "COFE" "ARGRASS" "MAYNOOTH"
## [11] "PARTY" "WHT1845" "tenancymean" "FARMSIZE" "LANDED"
## [16] "FARMLABOUR" "WHTBALANCE" "CORN" "CATTLE" "sheep"
## [21] "ML018" "MLR016"
Aydelotte numbers the Third Reading division as DIV 018. I have recoded it as ML018. Votes against Repeal (protectionist) are coded as 1. Repeal votes are coded as 2. Absentees are coded as 3. There were 656 MPs at this time, and I have worked through all available records to correct the somewhat spotty record in Hansard. Unfortunately data on one MP is missing The tabulation of voting on Repeal is:
table(pol$ML018)
##
## 1 2 3
## 251 349 55
We don't want some of the variables and we also want to restrict the constituencies to those in England and Wales. Data for Scotland and Ireland are somewhat patchy, as we also omit missing data. The trimmed dataset contains 17 variables on 264 MPs.
pol.full <- subset(pol, select = c(-name, -constloc, -GOVTOFF, -tenancymean,
-MLR016), subset = (constloc < 3))
pol.full <- na.omit(pol.full)
dim(pol.full)
## [1] 264 17
We use the package 'party' so load that
library(party)
and then recode some of the variables so that they are 'factors':
pol.full$PARTY <- as.factor(pol.full$PARTY)
pol.full$CONTESTED <- as.factor(pol.full$CONTESTED)
pol.full$CONSTYPE <- as.factor(pol.full$CONSTYPE)
pol.full$MAYNOOTH <- as.factor(pol.full$MAYNOOTH)
pol.full$ML018 <- as.factor(pol.full$ML018)
Now construct the tree as an object called 'ctree.div018'and plot it
ctree.div018 <- ctree(ML018 ~ ., data = pol.full)
plot(ctree.div018)
Now measure the percentage of correct predictions rounded to two decimal places
round(mean(predict(ctree.div018) == pol.full$ML018), 2)
## [1] 0.81
Load the package 'evtree'
library(evtree)
Using the same dataset as for the conditional tree above, so that we can compare accuracy of predictions. Set a seed first for reasons of reproducibility. Plot the tree and get the percentage of correct predictions
set.seed(125)
evtree.div018 <- evtree(ML018 ~ ., data = pol.full)
plot(evtree.div018)
round(mean(predict(evtree.div018) == pol.full$ML018), 2)
## [1] 0.82
The motion proposed by the Conservative MP Charles Villiers was that the Corn Laws be repealed. The motion was defeated 124 to 256. The division is denoted as MLR016 and is coded as follows: 1 against Repeal (Protectionist), 2 for Repeal. 3 absent. The data first need some preparation. Then build the plot and find the correct prediction percentage
pol <- read.csv("http://dl.dropboxusercontent.com/u/23281950/pol.csv")
Villiers <- subset(pol, select = c(-name, -MAYNOOTH, -constloc, -GOVTOFF, -tenancymean,
-ML018), subset = (constloc < 3))
Villiers$PARTY <- as.factor(Villiers$PARTY)
Villiers$CONTESTED <- as.factor(Villiers$CONTESTED)
Villiers$CONSTYPE <- as.factor(Villiers$CONSTYPE)
Villiers$MLR016 <- as.factor(Villiers$MLR016)
Villiers <- na.omit(Villiers)
Get the dimensions of the subset
dim(Villiers)
## [1] 323 16
And the breakdown by constituency 'type' where 1 = rural, 2 = small borough, 3 = large borough, 4 = university
table(Villiers$CONSTYPE)
##
## 1 2 3 4
## 125 94 104 0
Now run the conditional tree and get the predicted percentage correct
ctree.Villiers <- ctree(MLR016 ~ ., data = Villiers)
## Loading required package: Formula
plot(ctree.Villiers)
round(mean(predict(ctree.Villiers) == Villiers$MLR016), 2)
## [1] 0.64
Using the same dataset as above:
set.seed(55)
evtree.Villiers <- evtree(MLR016 ~ ., data = Villiers)
plot(evtree.Villiers)
round(mean(predict(evtree.Villiers) == Villiers$MLR016), 2)
## [1] 0.71