Agricultural potential of the land and temporal variation

The goal of this query was to consider whether the sites are located in desirable agricultural land. If agricultural production was an important factor in medieval village formation, I would expect to see a shift towards more desirable land during either the Middle or Late Saxon period. This query mimics the process of model building in ArcGIS. It will assign a score out of 100 to each site based on the spatial parameters that I will set below.

The data loading process is masked here, but is identical to the chunk visibile here

Calcuating the desirability score

For each of the five parameters, I created a column with a numeric value. The steps below assign a numeric value based on the value for the original column. I scaled the assigned numeric values according to the weights that I assigned while designing the model: -10%: Temperature, max value=10 -20%: Slope, elevation, nucleation score, max value=20 (each) -30%: Terrain, max value=30

#remove sites with missing spatial attributes
sites_sp_new <- na.omit(sites_sp)

sites_sp_new$temp_value <- NA
sites_sp_new[sites_sp_new$temppoly == 4, "temp_value"] <- 5
sites_sp_new[sites_sp_new$temppoly==5, "temp_value"] <- 10

sites_sp_new$slope_value <- NA
sites_sp_new[sites_sp_new$slope == 1, "slope_value"] <- 20
sites_sp_new[sites_sp_new$slope==2, "slope_value"] <- 15
sites_sp_new[sites_sp_new$slope == 3, "slope_value"] <- 10
sites_sp_new[sites_sp_new$slope==4, "slope_value"] <- 5

sites_sp_new$dem_value <- NA
sites_sp_new[sites_sp_new$dem == 1, "dem_value"] <- 20
sites_sp_new[sites_sp_new$dem==2, "dem_value"] <- 15
sites_sp_new[sites_sp_new$dem == 3, "dem_value"] <- 10
sites_sp_new[sites_sp_new$dem==4, "dem_value"] <- 5

sites_sp_new$nucleation_value <- NA
sites_sp_new[sites_sp_new$nucleation == 1, "nucleation_value"] <- 5
sites_sp_new[sites_sp_new$nucleation==2, "nucleation_value"] <- 10
sites_sp_new[sites_sp_new$nucleation == 3, "nucleation_value"] <- 15
sites_sp_new[sites_sp_new$nucleation==4, "nucleation_value"] <- 20

#terrain value assignments: alluvial plains most desirable, clay and red drift are moderately desirable, and limestone and chalky are semi-desirable
sites_sp_new$terrain_value <- NA
sites_sp_new[grep("Clay", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value"] <- 24
sites_sp_new[grep("alluvial", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value"] <- 30
sites_sp_new[grep("limestone", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value"] <- 18
sites_sp_new[grep("chalky", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value"] <- 18
sites_sp_new[grep("red drift", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value"] <- 24

head(sites_sp_new)
##     her_no her_cit dem slope temppoly
## 1 MLI60564       1   1     1        4
## 2 MLI61729       1   1     1        4
## 3 MLI36511       1   3     3        4
## 4 MLI37096       1   4     2        4
## 5 MLI65747       1   1     2        4
## 6 MLI43229       2   2     2        4
##                                       terrain nucleation temp_value
## 1          Alluvial plains and river terraces          3          5
## 2          Alluvial plains and river terraces          3          5
## 3               Jurassic limestone landscapes          2          5
## 4               Jurassic limestone landscapes          2          5
## 5                       Clay or marl lowlands          2          5
## 6 Landscapes smothered with deep Chalky drift          1          5
##   slope_value dem_value nucleation_value terrain_value
## 1          20        20               15            30
## 2          20        20               15            30
## 3          10        10               10            18
## 4          15         5               10            18
## 5          15        20               10            24
## 6          15        15                5            18

Preparing the data for export

I had previously prepared a csv file using an R script to generate a map of sites by phase and decided to import the file rather than repeating the process here. The original script assigns a value such as “Early Saxon” or “Mid/Late Saxon” with conditional statements evaluating the periods during which a site was occupied. All of the steps involved are visible on github

sites_sp_new$score <- rowSums(sites_sp_new[,8:12])
sites_phase <- read.csv("~/Grad Year 3/Advanced Data Structures/output/sites_phase.csv")
sites_phase_new <- subset(sites_phase, select=c("her_no", "her_cit", "lat", "lon", "new"))
sites_phase_score <- merge(sites_phase_new, sites_sp_new, by= c("her_no", "her_cit"))
sites_phase_score_red <- subset(sites_phase_score, select=c("her_no", "her_cit", "lat", "lon", "new", "score"))

#write.csv(sites_phase_score_red, "~/Grad Year 3/Advanced Data Structures/output/sites_phase_score.csv")

Adjusting the desirability score of clay terrain

With the introduction of the heavier plow technology, medieval farmers would have been able to expand cultivation to heavy clay fields. These fields would have previously taken tremendous effort to plow with the preceding simple ard. As a result, the clay terrain should not be considered desirable for agriculture until after the introduction of the heavier plow. For the purposes of comparison, I redesigned the values of the clay terrain types in the model to lower the desirability rating of clay land. I expected many of the Mid Saxon and Late Saxon sites with high desirability ratings to drop as a result of my model adjustment. This would confirm the hypothesis that new plowing technology led to increased cultivation on clay terrain.

#assign new terrain values
sites_sp_new$terrain_value_new <- NA
sites_sp_new[grep("Clay", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value_new"] <- 18
sites_sp_new[grep("alluvial", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value_new"] <- 30
sites_sp_new[grep("limestone", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value_new"] <- 18
sites_sp_new[grep("chalky", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value_new"] <- 18
sites_sp_new[grep("red drift", sites_sp_new$terrain, ignore.case=TRUE), "terrain_value_new"] <- 24

#recalculate the desirability score
sites_sp_new$recalc <- rowSums(sites_sp_new[,8:11])
sites_sp_new$recalc <- sites_sp_new$recalc + sites_sp_new$terrain_value_new
#merge the table from before that contains the temporal data
sites_sp_recalc <- merge(sites_phase_new, sites_sp_new, by= c("her_no", "her_cit"))
#remove all of the fields used to derive the new desirability score
sites_sp_recalc <- subset(sites_sp_recalc, select=c("her_no", "her_cit", "lat", "lon", "new", "score", "recalc"))

#write.csv(sites_sp_recalc, file="~/Grad Year 3/Advanced Data Structures/output/recalc.csv")

Comparison of results from the two models

I created used the multiplot function (defined here and loaded in the hidden portion of the markdown to reduce clutter) to compare the agricultural scores between the two models on histograms with color set to phase of occupation. I excluded the Early Saxon and Early/Mid Saxon phases because these precede the introduction of the heavier plow. If the heavier plow allowed Saxon farmers to spread to clay terrain, the Early Saxon settlements should be located on non-clay terrain types and not be impacted by the adjustments to my model.

sites_sp_mid_late <- sites_sp_recalc[sites_sp_recalc$new == 'mid' | sites_sp_recalc$new == 'mid/late' | sites_sp_recalc$new == 'late'| sites_sp_recalc$new == 'late/med'|sites_sp_recalc$new== 'mid/late/med',]
colnames(sites_sp_mid_late)[5] <- "Phase"
p1 <- ggplot(sites_sp_mid_late, aes(x=recalc, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth= 5) + labs(x="Recalculated Score")
p2 <- ggplot(sites_sp_mid_late, aes(x=score, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth= 5)+ labs(x="Original Score")
multiplot(p2, p1)

I was very surprised to see the sites that scored highly on the agricultural index changed very little as a result of the lowered value of clay soils. I decided to create a plot to go back and look at the Early and Early/Mid Saxon sites. I was very surprised to realized that 4 of these 5 sites dropped in their agricultural model value as a result of the reduction in the score assigned to clay. Therefore, these sites were almost all located on clay terrain. This is the exact opposite of what I expected given the traditional emphasis of the heavy plow as enabling agricultural expansion. I would need to expand my sample of Early Saxon sites in order to confirm this trend.

sites_sp_early_mid <- sites_sp_recalc[sites_sp_recalc$new == 'early' | sites_sp_recalc$new == 'early/mid' ,]
colnames(sites_sp_early_mid)[5] <- "Phase"
p1 <- ggplot(sites_sp_early_mid, aes(x=recalc, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth= 5) + labs(x="Recalculated Score") + xlim(50,90)
p2 <- ggplot(sites_sp_early_mid, aes(x=score, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth= 5)+ labs(x="Original Score") + xlim(50, 90)
multiplot(p2, p1)

More Plots

After the surprising results from the adjustment to the clay agricultural value, I decided to play around with the other spatial data parameters. I wanted to see whether there were any strong shifts in the other spatial data parameters across several phases. I was surprised to see that the cultural variable, “nucleation”, produced the most interesting histogram of temporal variation. The Mid Saxon sites tended to have the highest nucleation values while Late Saxon settlements were located in a combination of nucleated and dispersed zones. This could suggest the core of medieval settlement and was established in the Mid Saxon period and expanded outwards to less populated or desirable lands during the Late Saxon period.

colnames(sites_phase_score)[5] <- "Phase"
ggplot(sites_phase_score, aes(x=dem, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth=1)+xlim(0,5) + labs(x="Elevation")

ggplot(sites_phase_score, aes(x=slope, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth=1)+xlim(0,5) + labs(x="slope")

ggplot(sites_phase_score, aes(x=nucleation, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth=1)+xlim(0,5) + labs(x="Nucleation")

ggplot(sites_phase_score, aes(x=temppoly, color=Phase, fill=Phase)) + geom_histogram(position="dodge", binwidth=1)+xlim(3, 6) + labs(x="Temperature")