This markdown largely models the direction I would have liked to go with my data if the quality of the available archaeobotanical data was better. I consulted ethographic studies of cereal processing in Greece. The author (Jones 1987, Jones 1990) used the ethnographic studies as a basis of comparison to archaeological samples in order to infer the stage of processing at which the archaeological sample was deposited. Although I could not access the original study in time for this week’s update (Jones 1984), I was able to infer enough from the publications to come up with reasonable parameters for distinguishing fully processed cereals from processing debris.
library(sqldf)
library(leaflet)
library(maptools)
library(raster)
#I decided loaded all 8 data tables even though I only drew from 3 for the purposes of this work.
report_info <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/reportinfo.csv")
occupation_data <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/occupationdata.csv")
excav_info <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/excavationinfo.csv")
biblio <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/bibliographic.csv")
finds <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/archaeobotfinds.csv")
planttype <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/planttype.csv")
sampling <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/samplinglog.csv")
sites_sp <- read.csv("~/Grad Year 3/Advanced Data Structures/LincolnshireArchaeobotany/sitesspatial.csv")
#there was a problem loading the first column name in the datatable, so this manually writes over the correct name of the first column for each table.
names(biblio)[1] <- "her_no"
names(excav_info)[1] <- "her_no"
names(finds)[1] <- "bot_finds_id"
names(occupation_data)[1] <- "her_no"
names(planttype)[1] <- "family"
names(report_info)[1] <- "report_type"
names(sampling)[1] <- "sample_id"
names(sites_sp)[1] <- "her_no"
I used sqldf to query my datatables because it was much easier to pull information from across multiple joins. The first parameter of applying ethnographic data to archaeobotanical samples requires having a sufficient number of cereals parts (grain, rachis, etc.) present within the sample for analysis. I constructed a query to first select based on the sum of all cereal parts for the four major cereal crops (wheat, barley, oats, and rye) per sample.
sample_cereal_sum <- sqldf("SELECT sampling.sample_id, SUM(finds.count) AS finds_count FROM sampling LEFT JOIN finds ON (sampling.sample_id=finds.sample_id) WHERE genus IN ('Hordeum', 'Triticum', 'Avena', 'Secale') GROUP BY sampling.sample_id ORDER BY SUM(finds.count) DESC")
head(sample_cereal_sum, n=15)
## sample_id finds_count
## 1 S044 2019
## 2 S067 984
## 3 S068 873
## 4 S057 173
## 5 S063 149
## 6 S056 87
## 7 S058 81
## 8 S095 44
## 9 S065 36
## 10 S045 35
## 11 S064 35
## 12 S047 29
## 13 S060 24
## 14 S099 24
## 15 S059 21
Based on the original publications, I should have about 100 cereal parts in order to proceed with comparisons to ethnographic samples. I decided to set my cut at 80 cereal parts so that I could have a slightly larger data set. The next lowest sum of all cereal parts for a given sample was in the 30s, so 80 seemed to be a reasonable cut given the data that I was working with.
The ethnographic model (Jones 1990) utilizes percentages of various parts of the cerealin order to infer the phase of processing. The query below groups sums for each sample by the plant part of the cereal rather than providing a sum for all cereals as above. I then calculated the percentage of each cereal plant part of the whole cereal assemblage (ex: % of grains vs. % of rachis fragments)
sample_cereal_sum <- subset(sample_cereal_sum, finds_count > 80)
sample_plantpart <- sqldf("SELECT sample_cereal_sum.sample_id, plant_part, SUM(count) AS plantpartsum FROM sample_cereal_sum LEFT JOIN finds ON (sample_cereal_sum.sample_id=finds.sample_id) WHERE genus IN ('Hordeum', 'Triticum', 'Avena', 'Secale') GROUP BY plant_part, sample_cereal_sum.sample_id ORDER BY sample_cereal_sum.sample_id")
head(sample_cereal_sum, n=15)
## sample_id finds_count
## 1 S044 2019
## 2 S067 984
## 3 S068 873
## 4 S057 173
## 5 S063 149
## 6 S056 87
## 7 S058 81
#merge with the previous table so that I have the total of all cereal parts
cereal_totals <- merge(sample_plantpart, sample_cereal_sum, by="sample_id")
head(cereal_totals, n=15)
## sample_id plant_part plantpartsum finds_count
## 1 S044 glume NA 2019
## 2 S044 grain 2019 2019
## 3 S044 plumules NA 2019
## 4 S044 rachis NA 2019
## 5 S044 stem NA 2019
## 6 S056 grain 85 87
## 7 S056 rachis 2 87
## 8 S057 grain 170 173
## 9 S057 rachis 3 173
## 10 S058 grain 81 81
## 11 S063 grain 149 149
## 12 S067 grain 972 984
## 13 S067 rachis 12 984
## 14 S068 grain 840 873
## 15 S068 rachis 33 873
#calculation of percentage within each sample by plant part
cereal_totals$percent <- cereal_totals$plantpartsum/cereal_totals$finds_count
I currently have a preliminary method of assessing the stage at which the material became carbonized. I selected all samples for which the percentage of grains (vs. all ther cereal parts) was greater than 80%. All of the samples met this condition. This shows the samples that have sufficient numbers for ethnographic comparison were likely finished products. My comparison to the ethonographic data would be stronger if I incorporated the weed seed data, so this is only a preliminary conclusion.
grain_products <- sqldf("SELECT * FROM cereal_totals WHERE plant_part='grain' AND percent > 0.80")
#a query to connect the grain products data to the spatial information in order to create a map. This requires joining across several tables.
grain_products_geo <- sqldf("SELECT report_info.her_no, report_info.her_cit, lat, lon, sampling.sample_id, percent FROM report_info JOIN occupation_data ON (report_info.her_no = occupation_data.her_no AND report_info.her_cit=occupation_data.her_cit) JOIN sampling ON (occupation_data.her_no= sampling.her_no AND occupation_data.her_cit= sampling.her_cit AND occupation_data.context=sampling.context) JOIN grain_products ON (sampling.sample_id = grain_products.sample_id)")
#conversion to a spatial object
coordinates(grain_products_geo) = ~lon + lat
projection(grain_products_geo) <- CRS("+proj=lonlat +ellps=WGS84")
#inclusion of all reports to compare distribution of all sites to the probable grain products.
coordinates(report_info) = ~lon + lat
projection(report_info) <- CRS("+proj=lonlat +ellps=WGS84")
leaflet() %>% addCircles(data=grain_products_geo, color="blue", popup=report_info$her_no) %>% addCircles(data=report_info, color="gray")%>% addTiles()