Data Source: https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges
bridges<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1", header= FALSE, sep=",")
colnames(bridges) <- c("IDENTIF","RIVER","LOCATION","ERECTED","PURPOSE","LENGTH","LANES",
"CLEAR-G","T-OR-D","MATERIAL","SPAN","REL-L","TYPE")
head(bridges)
bridges_subset <- subset(bridges, PURPOSE=="HIGHWAY", select=c(ERECTED,PURPOSE,MATERIAL))
bridges_subset
summary(bridges_subset)
## ERECTED PURPOSE MATERIAL
## Min. :1818 AQUEDUCT: 0 ? : 2
## 1st Qu.:1890 HIGHWAY :71 IRON : 7
## Median :1923 RR : 0 STEEL:51
## Mean :1912 WALK : 0 WOOD :11
## 3rd Qu.:1945
## Max. :1986
This histogram shows the amount of highways built in each grouping of 10 years. Overall, the distribution of highways built over the years is left skewed - the majority of bridges were built prior to 1930. The historgram is also bimodal, showing bursts in highway construction between 1890 - 1900 and 1920 - 1930.
hist(bridges_subset$ERECTED, breaks = 20)
The boxplot below shows the median years that each highway material was used. Based on the plot, it seems like the prefered highway material was wood until 1875, followed by iron until 1900 and steel from 1900 onwards.
require(MASS)
## Loading required package: MASS
data(iris)
boxplot(bridges_subset$ERECTED ~ bridges_subset$MATERIAL, at=rank(tapply(bridges_subset$ERECTED, bridges_subset$MATERIAL, median)))