Victorian Agricultural Rents and Railway Development

I am trying to calculate the effect that proximity to railway track had on agricultural rents during the period 1832-1862. These three decades saw tremendous growth in railway track. There are numerous anecdotal reports of rents going up as the railways approached (as one would expect) but no one has quantified the effect or examined whether all landowners responded equally.

I have data on 24 large estates scattered around England with consistent records of rent per acre actually received on an annual basis. I also have data at the county level for amounts of wheat, cattle and sheep. I have measured the distance from the centre of the estate to the nearest railway station over time, and also the total amount of track within 40 km of the centre of the estate. The usual regression techniques don't help much…so I have been working on alternative methods.

Attack: 1. Test the correlation over time for the two measures of railway access and rent received. 2. Try to isolate factors which cause differences in statistical significance of the correlation and the lag between track and rent (if any)

The correlation over time can be tested using ACF. The plot shows results for the Petworth Estate:

estates <- read.csv("C:/Users/Stephen/Desktop/rrtrimmed1.csv")
max <- subset(estates, subset = (fid == 24 & year < 1863))
max <- max[order(max$year), ]
ccf(max$rent_recd, max$track, main = "Petworth Track")

plot of chunk unnamed-chunk-1

ccf(max$rent_recd, max$nearstat, main = "Petworth Nearest Station")

plot of chunk unnamed-chunk-1

The points above/below the blue line are those which are statistically significant. The nearest station plot has the expected negative sign: as the track approach the estate (the distance to the nearest station decreased) then the rent went up. I have recorded the measurements for the 24 estates and combined them with the corn, cattle and sheep variables. What we are looking for is a way to identify patterns, or similarities in changes. A plot of changes in rent over time (using splines) doesn't help much. Here is a trellis of the rent received using the R package 'sme' with rents received normalised so that rent (1832) = 1

library(sme)
rrtrimmed1 <- read.csv("C:/Users/Stephen/Desktop/rrtrimmed1.csv")
rrsub <- subset(rrtrimmed1, (subset = year < 1863))
print(xyplot(rentnorm ~ year | estate, data = rrsub[rrsub$IN == 1, ], xlab = "Year", 
    ylab = "Rent Received", main = "Estate Rents"))

plot of chunk unnamed-chunk-2

The estates are here, geographically, with the extent of the railways coverage of 1865. Since railway construction began in earnest only in 1832, the amount of track built is remarkable.

alt text

We can fit the 24 estates split by some factor—here is a plot of the rents over time for estates whose rent and track correlation was statistically significant

r2sub <- subset(rrtrimmed1, (subset = tme < 1863))
r2subomit <- na.exclude(r2sub)
fitcorrelated <- sme(r2subomit[(r2subomit$km40corr == 1), c("y", "tme", "ind")], 
    criteria = "AIC")
plot(fitcorrelated, type = "model", ylab = "Normalised Rent", xlab = "Year", 
    main = "Correlated Estate Rents")

plot of chunk unnamed-chunk-3

Now split by no correlation

fituncorrelated <- sme(r2subomit[(r2subomit$km40corr == 0), c("y", "tme", "ind")], 
    criteria = "AIC")
plot(fituncorrelated, type = "model", ylab = "Normalised Rent", xlab = "Year", 
    main = "Uncorrelated Estate Rents")

plot of chunk unnamed-chunk-4

There is quite a difference—-the rents for estates where the changes in rents were not correlated with the amount of track are in relative terms all over the place. We can get a sense of why this might be so using Principal Component Analysis on the 'static' variables, such as amount of wheat, cattle and sheep in the county in which the estate lies. The FactoMineR package is good for this. First a plot of the correlation of the variables. Reassuringly, this shows that lags were closely correlated positively with each other….and that amounts of track available within 40km and distance to nearest station were negatively correlated.

rrpca <- read.csv("C:/Users/Stephen/Desktop/rrpca.csv")
library(FactoMineR)
bb <- FAMD(rrpca[, -c(2, 3, 4)], graph = FALSE)
plot(bb, choix = "quanti")

plot of chunk unnamed-chunk-5

But the amount of variance contained by the two components is small. Now the distancesbetween each estate..

plot(bb, choix = "ind", cex = 0.4)

plot of chunk unnamed-chunk-6

I need to tidy up the labels a bit…but the results are informative. Another and probably better approach might be to look at the six estates whose rents weren't correlated, here:

Hmm…bit of work to do here but perhaps on the right track? Another probably better approach might be to look at the six estates whose rents weren't correlated, here:

plot(fituncorrelated, type = "raw", showModelFits = TRUE, main = "Rents for the six uncorrelated estates")

plot of chunk unnamed-chunk-7

The six estates have this in common: they are all in areas (especially Yorkshire) which are quite remote.

Now looking for some other factors…perhaps to be revealed by a classification tree. This one uses the correlation between rent and amount of track within 40m of the estate as the dependent variable, and amount of sheep, corn and cattle as the explanatory variables.

library(tree)
xtree <- tree(X40kmCor ~ Cattle + Corn + Sheep, data = rrpca)
plot(xtree)
text(xtree)

plot of chunk unnamed-chunk-8

The classifier has picked amount of sheep the most important variable. The greatest correlation (0.7) was for estates where the number of sheep was more than 50.3, cattle more than 12.45 and finally sheep more than 64.7. An initial conclusion from this is that the rents of livestock farmers were more sensitive to amount of track.

Next I need to 1. Run some robustness tests 2. Work out which estates this refers to, and link this result to the other tests.