Apr20Update

Developing the agricultural model with a DTM

Another important factor that could be influencing both settlement choice and agricultural strategies is topography. Other published studies that compare growing conditions to crop production consider the influence of elevation and slope. I downloaded the 50m digital terrain model (DTM) from the Ordnance Survey here. I received Ordnance survey data for all of England divided into folders based on the primary grid square (a diagram of the grid is shown here). Each grid square was divided into 100 smaller grid squares (ex:SK90) that needed to be extracted separately before they could be added to my map in QGIS. Once I added each individual grid square to the map, I merged them into larger rasters by the primary grid square (TA, TF, SF, SE). I merged the four large raster into a single raster and clipped it by the Lincolnshire county boundaries so that the DTM represented Lincolnshire.

The next step that I was interested in was to reclassify the DTM raster from raw elevation values to 5 intervals (each grid square within the raster will be assigned a number from 1 to 5 depending on which interval it falls into). This is easily done in ArcGIS, but my preliminary search on the help forums suggested that this would be much easier to do in R.

While I was still working in QGIS, I attempted to create a slope raster calculation derived from my DTM. The resulting raster didn’t look like the calculation had been performed correctly (despite my clicking the “slope” option, following the online QGIS tutorial instructions, and trying it again). I decided to redo this step in R.

#load packages
library(raster)
library(rgdal)
library(classInt)
library(sqldf)
library(ggplot2)

dem <- raster("~/Grad Year 3/Advanced Data Structures/spatial data/DEM_Lincolns_final.tif")

# view the raster in R
plot(dem)

Performing the reclass function

The raster package makes most GIS-type functions fairly easy. Before I could perform the reclass function, however, I needed to create a matrix telling the computer how to perform the reclassification. The first column provides the first number in the interval, the second column provides the last number in the interval, and the third column assigns a value to all numbers within the specified interval. I used the classIntervals function to create the intervals based on kmeans (a clustering analysis). I had actually wanted to use other statistical measures but they were taking too long on my large data file. It is also possible to use the classIntervals function to create equal interval breaks as well, so I could go back and try other methods of creating the intervals.

#create intervals based on the 
demClass <- classIntervals(values(dem), n=5,style="kmeans")

a <- demClass$brks[1:5]
a <- c((a[1]-0.001), a[2:5])
b <- demClass$brks[2:6]
b <- c((b[1:4]-0.001), b[5])
c <- c(1, 2, 3, 4, 5)
new_vec <- c(a, b, c)

#create the reclassify matrix
recl <- matrix(new_vec, nrow=5, ncol=3)

reclass_dem <- reclassify(dem, recl)

#writeRaster(reclass_dem, "~/Grad Year 3/Advanced Data Structures/spatial data/DEM_reclass.tif", drivername="GTiff", overwrite=TRUE)

Repeating the process for slope

It was really easy to use the terrain function within the raster package to create a slope model that looked much more believable than the raster that QGIS produced (it somehow only had 2 values!)

Once I had the slope raster, I followed the same steps as above and was able to repeat the code to create a reclassified slope raster.

slope <- terrain(dem, opt='slope', unit='degrees')

#writeRaster(slope, "~/Grad Year 3/Advanced Data Structures/spatial data/slope.tif", drivername="GTiff", overwrite=TRUE)

slopeClass <- classIntervals(values(slope), n=5,style="kmeans")

a <- slopeClass$brks[1:5]
a <- c((a[1]-0.001), a[2:5])
b <- slopeClass$brks[2:6]
b <- c((b[1:4]-0.001), b[5])
c <- c(1, 2, 3, 4, 5)
new_vec <- c(a, b, c)

recl <- matrix(new_vec, nrow=5, ncol=3)

reclass_slope <- reclassify(slope, recl)

plot(reclass_slope)

Integrating the topographic data with the sites

I decided to apply this new topographic data to the sites to draw initial comparisons between site location and topography by phase. Through the construction of more complex queries, I could integrate the archaeobotanical data as well. I found a package that would allow me to write a SQL query in R (similar to the method we have used to write SQL queries in Jupyter notebooks). Because my SQL database consists of uploaded csvs, I decided to upload the necessary csvs for this component of my initial analysis rather than figure out how to open a SQL connection through the Rmarkdown. SQL was useful because the query structure and table joins seem more intuitive than in R or Python. I also used my SQL database to test the query before running it in R.

report_info <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/reportinfo.csv")
occupation_data <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/occupationdata.csv")

colnames(occupation_data)[1] <- "her_no"

sites_occ <- sqldf("SELECT DISTINCT report_info.her_no, report_info.her_cit, occupation_data.phase, lat, lon FROM report_info LEFT JOIN occupation_data ON (report_info.her_no=occupation_data.her_no AND report_info.her_cit = occupation_data.her_cit)")

#transform dataframe into a spatial points dataframe
coordinates(sites_occ) = ~lon + lat
projection(sites_occ) <- CRS("+proj=lonlat +ellps=WGS84")
sites_occ <- spTransform(sites_occ, CRS(projection(reclass_dem)))

#test to show points
plot(reclass_dem)
points(sites_occ)

New dataframes

For the final step in this week’s work, I extracted the reclassified raster values to each of my archaeological sites for both the elevation and the slope. This demonstrates that I can effectively integrate some of the spatial data with my primary data tables. I created histograms to show the counts of sites by phase. Within each phase, the bars were broken up based on the reclassified elevation values (and the reclassified slope values in the second graph). I was interested in seeing whether there were any differences in preferred elevation and slope for site location over time. Both barplots show there is a strong preference for low elevation and low slope sites. The time periods for which there are some high elevation and slope sites are also time periods for which there are substantially more sites. The broader range of elevation and slope simply could be the product of a larger sample size for these time periods.

sites_dem <- extract(reclass_dem, sites_occ)
sites_slope <- extract(reclass_slope, sites_occ)

sites_occ_new <- as.data.frame(sites_occ)
sites_occ_new$dem <- as.character(sites_dem)
sites_occ_new$slope <- as.character(sites_slope)

ggplot(data=sites_occ_new, aes(x=phase, fill=dem)) +
  geom_bar(stat="count", position="dodge")

ggplot(data=sites_occ_new, aes(x=phase, fill=slope)) + 
  geom_bar(stat="count", position="dodge")