Introduction: The biggest problem posed by this part of the assignment was writing a code that would work for the negative (BC) date values. I solved this problem by adding a set number, “n”, to all of the integer start date values so that they would all become positive. Once the integers were converted to the class “date”, I used the package “lubridate” to subtract a period of “n” years from the dates in order to restore the original values as the start dates. The package “lubridate” can account for the fact that there was not a year 0 on its own, making this method a simple solution to the problem of negative dates.
This loads all packages required for all of the code.
require(curl)
require(jsonlite)
require(plyr)
require(lubridate)
require(xts)
require(dygraphs)
require(rgdal)
require(leaflet)
require(maptools)
require(RColorBrewer)
require(metricsgraphics)
require(htmlwidgets)
This section loads the necessary data from github that was the basis of the following plots.
ramphs <-read.csv(curl("https://gist.githubusercontent.com/sfsheath/5c5987269e8aad412416/raw/a12dda2b8a681f1ab01421f68ac237ab591ff0f9/roman-amphitheaters.csv"))
periods <- fromJSON("https://gist.githubusercontent.com/sfsheath/cc082cc6db3ac8343e3a/raw/6e938ed37b9083f393a67bec10b46ba7fc693661/amphitheater-chronogrps.json")
This new dataframe will be the starting point for all remaining code.
join(ramphs, periods, by= "chronogrp") -> ramphs
#This will create a new dataframe and overwrite the "ramphs" dataframe loaded during the previous step.
ramphs2 <- ramphs
#Create a copy of the base dataframe for manipulating the time values
R does not like BC dates as negative numbers. To get around this, I added a period of years to the dates to ensure they all became positive. This period of time, “n”, was calculated to be 5 years more than the minimum start date value to ensure that no start date value will become 0 as a result of adding only the minimum. It will be subtracted from the dates once they have been switched to the date class.
(min(ramphs2$start_date)*-1)+5 -> n
#Calculates the minimum start date, converts it to a positive, and adds 5 years to ensure all values are above 0
ramphs2$start_date + n -> ndate
d <- paste((ndate),"01", "01")
#Adds a month and date to the year because R will not convert years alone
dates <- as.Date(d, "%Y%m%d")
#Changes the class from integer to date
dates-period(n, units="years") -> x
#Subtracts the period, "n", from the date class to return the original year values using package "lubridate"
R will only reorder data if it is in a dataframe, so I have written this in here before converting the data format to zoo.
c <- data.frame((x), (ramphs2$capacity), (ramphs2$extmajor))
c2 <- arrange(c, c$X.x)
This step translates the dataframe to xts and zoo so that these can be plotted.
x2 <- as.xts(c2$X.x)
#Formats dates
rt <- cbind(x2, c2$X.ramphs2.capacity, c2$X.ramphs2.extmajor)
#Binds new date format to selected columns to create new data table
zoo.rt <- as.zoo(rt)
This will produce a single plot of the cumulative capacity and cumulative length over time.
options(scipen=7)
plot(x= cumsum(na.omit(zoo.rt)), xlab="Time", ylab=list("Capacity", "Length"), main="Cumulative Sums")
I used the package dygraph to create an interactive plot of capacity. This required first making a new dataframe because the cumulative capacity was on a significantly larger than the cumulative length. Otherwise, the cumulative length change would be masked by the larger scale of capacity change. I did not need to set it as an xts object because I used the same xts file from above.
I repeated the process to create an interactive plot of cumulative length just below it, illustrating that these values increase together.
options(scipen=8)
cc <- cbind(x2, c2$X.ramphs2.capacity)
dygraph(cumsum(na.omit(cc)), xlab="Year", ylab="Capacity", main="Cumulative Capacity")
cl <- cbind(x2, c2$X.ramphs2.extmajor)
dygraph(cumsum(na.omit(cl)), xlab="Year", ylab="Cumulative Length", main="Cumulative Length")
This map plots all points from the ramphs dataframe onto a layer of modern countries. This required assigning coordinates to the ramphs data before entering them into the leaflet
ramphs.sp <- ramphs
#Creates a new dataframe to which spatial coordinates can be added.
coordinates(ramphs.sp) <- ~longitude + latitude
proj4string(ramphs.sp) <- CRS('+proj=longlat +datum=WGS84')
leaflet() %>% addCircles(data = ramphs.sp) %>%addTiles()
Introduction: The goal of these plots is to use a time series plot to examine how the cumulative capacities change for ampitheatres built on territory conquered before 60 BC in comparison to territory conquered after 60BC. Does the speed of building amphitheatre seating capacity change between the two groups? This will require determining which amphitheatres fall within a shape file downloaded from [http://awmc.unc.edu/awmc/map_data/shapefiles]
After the shapefile for the Roman Empire at 60 BC has been downloaded, it must be formatted for comparison with the amphitheatre data
tbc <- readShapeSpatial("roman_empire_bc_60_extent.shp", delete_null_obj=TRUE)
proj4string(tbc) <- CRS('+proj=longlat +datum=WGS84')
The next step will simply borrow the ramphs.sp file from above since it has already been assigned coordinates and a projection. The function sp::over will select the points within the shapefile and assign all points an “NA” value
over(ramphs.sp, as(tbc, "SpatialPolygons"))->b
#Creates a new column with "NA" for points outside the shapefile and numeric values for points within the shapefile.
ramphs.old <-data.frame(ramphs, b)
#Adds this new column to the original dataframe
ramphs.old$b[is.na(b)] <-0
ramphs.old$b[b>0] <- 5
#Sets all values within the shapefile to b=5 and all values outside the shapefile to b=0
old <- data.frame((x), (ramphs.old$capacity), (ramphs.old$b))
#Borrows set of formatted dates from above and joins with the capacity column and the shapefile column
old <- arrange(old, c$X.x)
old[old$X.ramphs.old.b == 5,] -> before60
old[old$X.ramphs.old.b == 0,] -> after60
#Creates a dataframe for points within territory conquered at 60 BC and a dataframe for points in territory conquered after 60 BC
This step will create an xts dataframe to plot with dygraph for each of the two dataframes created above. This will allow for direct comparison to see whether the seating capacity was built at different rates within territory conquered early in the empire’s history compared to territory conquered later.
x2.b <-as.xts(before60$X.x)
before60.x <- cbind(x2.b, before60$X.ramphs.old.capacity)
x2.a <-as.xts(after60$X.x)
after60.x <- cbind(x2.a, after60$X.ramphs.old.capacity)
dygraph(cumsum(na.omit(before60.x)), xlab="Year", ylab="Cumulative Capacity", main="Cumulative Capacity in Roman Territory Conquered Before 60 BC")
dygraph(cumsum(na.omit(after60.x)), xlab="Year", ylab="Cumulative Capacity", main="Cumulative Capacity in Roman Territory Conquered After 60 BC")
These plots appear to show that a difference in the rate of construction of seating capacity prior to 99 AD. After 99 AD, the rate of seating capacity construction appears to be fairly consistent.
ramphs.oldsp <- ramphs.old
coordinates(ramphs.oldsp) <- ~longitude + latitude
proj4string(ramphs.oldsp) <- CRS('+proj=longlat +datum=WGS84')
#The easiest way to plot the two groups in different colors was to assign each group as a subset to a differently colored circle layer
leaflet() %>% addCircles(data = ramphs.oldsp[ramphs.oldsp$b==0,], color='blue') %>%addCircles(data=ramphs.oldsp[ramphs.oldsp$b==5,], color='red') %>%addTiles()
Introduction: The goal of this section is to create a visualization to examine whether amphitheatre construction has a relationship with distance to other amphitheatres over time. It would be interesting to know whether the Romans increased the distances between their amphitheatre construction as they conquered more territory.
In order to improve the information conveyed through the visualization, I decided to create an extra column for distance to the fifth nearest amphitheatre in quintiles. I selected the fifth nearest amphitheatre in order to see whether some amphitheatres were more clustered around several amphitheatres than others.
dist <- ramphs
dist <- within(dist, quintile <- as.integer(cut(nearestFifth, quantile(nearestFifth, probs=0:5/5), include.lowest=TRUE)))
I attempted to select an RColorBrewer palette that would not generate colors that were too similar to be helpful.
mjs_plot(dist, x=start_date, y=nearestFifth) %>% mjs_point(color_accessor=quintile, color_range=brewer.pal(n=8, name="Dark2")[c(1, 2, 4, 6, 8)]) %>%mjs_labs(x="Earliest Possible Construction Date", y="Distance to the Fifth Nearest Amphitheatre")
After formatting the data for mapping, I mapped each quintile of the “nearest fifth amphitheatre” category as a differently colored layer. To make the visualization somewhat intuitive to understand, I used the order of the rainbow to select colors with red as the lowest quintile and purple as the highest quintile.
coordinates(dist) <- ~longitude + latitude
proj4string(dist) <- CRS('+proj=longlat +datum=WGS84')
leaflet() %>% addTiles() %>% addCircles(data=dist[dist$quintile==1,], color="red") %>% addCircles(data=dist[dist$quintile==2,], color="orange") %>% addCircles(data=dist[dist$quintile==3,], color="green") %>% addCircles(data=dist[dist$quintile==4,], color="blue") %>% addCircles(data=dist[dist$quintile==5,], color="purple")
It appears that distance from the nearest fifth amphitheatre does not change significantly over time. The latest amphitheatres on the plot represent locations that are both close to many other amphitheatres as well as locations that are far away from other amphitheatres. The map, on the other hand, shows that the greatest amount of clustering occurs in Italy and Tunisia.
The goal of this section is to determine when/whether there were periods of time during which the Romans were investing more effort into constructing amphitheatres. This will be done primarily through dividing up the data on distance to the nearest quarry using a similar process to above. The plot will be created slightly differently.
Using a similar process as above, a column will be created dividing the data on nearest quarry into quartiles.
stone <- ramphs
stone <- within(stone, quartile <- as.integer(cut(quarries.nearest, quantile(quarries.nearest, probs=0:4/4), include.lowest=TRUE)))
In addition to the quartile data for nearest quarry, I used the existing “elevation quartile” column in order to create points of different sizes. This assumes that higher elevation locations required greater effort to construct the amphitheatre.
mjs_plot(stone, x=start_date, y=quarries.nearest) %>% mjs_point(color_accessor=quartile, color_range=brewer.pal(n=8, name="Dark2")[c(1, 2, 4, 6, 8)], size_accessor=elevation.quartile) %>%mjs_labs(x="Earliest Possible Construction Date", y="Distance to the Nearest Quarry")
Repeating the process above from the other leaflet maps, I created a fourth leaflet map to display the distance to the quarry by color-coded quartile. As before, colors were selected in the order of the rainbow, with red as the lowest quartile and purple as the highest quartile.
coordinates(stone) <- ~longitude + latitude
proj4string(stone) <- CRS('+proj=longlat +datum=WGS84')
leaflet() %>% addTiles() %>% addCircles(data=stone[stone$quartile==1,], color="red") %>% addCircles(data=stone[stone$quartile==2,], color="orange") %>% addCircles(data=stone[stone$quartile==3,], color="blue") %>% addCircles(data=stone[stone$quartile==4,], color="purple")
It is interesting to note that the interactive scatterplot shows a significant amount of “high effort” amphitheatre construction during the first century AD. I had initially created the map thinking that the greater effort would come with greater expansion of Roman territory. The map also shows that the highest quartile of distance to quarry is both concentrated in eastern Italy and spread throughout the far ends of the Roman provinces. I did not expect to see this clustering of high distance to quarry within Italy.