I first loaded the data into r using read_csv
bridges <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1')
I then created a data frame:
bridges_df <- data.frame(bridges)
I used the view command to see what the data looked like in the two different formats:
View(bridges)
View(bridges_df)
Next I renamed the rows of the Data Frame to be more discriptive. This was based on the website: https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges
N.B., that some of the data lacks context. Identifier is not explained in any detail. Nor is location, prehaps these makes sense in how the city records are kept. No measurement units are given for length, they could be feet or yards. As for River, A = Allegheny, M = Monongahela, O = Ohio
bridges_df <- setNames(bridges_df, c("ID", "River", "Location", "Year_Erected", "Purpose", "Length", "Lanes", "Clear", "T_or_D", "Material", "Span", "Rel-L", "Type"))
I was interest in how many bridges had more than 2 lanes
multi_lane_df <- subset(bridges_df, as.numeric(Lanes) > 3, select= c("Lanes", "Year_Erected", "Purpose", "Material"))
multi_lane_df
## Lanes Year_Erected Purpose Material
## 21 4 1876 HIGHWAY WOOD
## 56 4 1904 RR STEEL
## 66 4 1915 HIGHWAY STEEL
## 70 4 1923 HIGHWAY STEEL
## 71 4 1924 HIGHWAY STEEL
## 72 4 1926 HIGHWAY STEEL
## 73 4 1926 HIGHWAY STEEL
## 76 4 1927 HIGHWAY STEEL
## 77 4 1927 HIGHWAY STEEL
## 78 4 1928 HIGHWAY STEEL
## 80 4 1928 HIGHWAY STEEL
## 82 4 1931 HIGHWAY STEEL
## 83 4 1931 HIGHWAY STEEL
## 84 4 1931 HIGHWAY STEEL
## 85 4 1937 HIGHWAY STEEL
## 86 4 1939 HIGHWAY STEEL
## 87 4 1945 HIGHWAY STEEL
## 94 4 1951 HIGHWAY STEEL
## 95 4 1951 HIGHWAY STEEL
## 96 4 1951 HIGHWAY STEEL
## 97 4 1955 HIGHWAY STEEL
## 100 6 1959 HIGHWAY STEEL
## 101 4 1961 HIGHWAY STEEL
## 102 4 1962 HIGHWAY STEEL
## 103 6 1969 HIGHWAY STEEL
## 104 6 1975 HIGHWAY STEEL
## 105 6 1978 HIGHWAY STEEL
Suumarize the multilane subset:
summary(multi_lane_df)
## Lanes Year_Erected Purpose Material
## ?: 0 Min. :1876 AQUEDUCT: 0 ? : 0
## 1: 0 1st Qu.:1926 HIGHWAY :26 IRON : 0
## 2: 0 Median :1931 RR : 1 STEEL:26
## 4:23 Mean :1938 WALK : 0 WOOD : 1
## 6: 4 3rd Qu.:1953
## Max. :1978
From this summary we can see that there are 23 four lane bridges and 4 six lane bridges. The earliest built was in 1876 which was the only wooden bridge in the group. The only railroad bridge in the subset was built in 1904, the rest are highway bridges.
Here is a barplot for the year each multilane bridge was built.
barplot(table(multi_lane_df$Year_Erected), main = "Multilane Brigdes by Year Erected")