Lab 1 of CUNY MSDA DATA 607 request that the this dataset be analyzed. There are two potential datasets, V1 and V2. I chose to analyze V2 because it had better numeric properies. (See link to understand the full reason.)
Loading the data:
df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version2",
header= FALSE, sep=",")
head(df)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 E1 M 3 CRAFTS HIGHWAY ? 2 N THROUGH WOOD SHORT S WOOD
## 2 E2 A 25 CRAFTS HIGHWAY MEDIUM 2 N THROUGH WOOD SHORT S WOOD
## 3 E3 A 39 CRAFTS AQUEDUCT ? 1 N THROUGH WOOD ? S WOOD
## 4 E5 A 29 CRAFTS HIGHWAY MEDIUM 2 N THROUGH WOOD SHORT S WOOD
## 5 E6 M 23 CRAFTS HIGHWAY ? 2 N THROUGH WOOD ? S WOOD
## 6 E7 A 27 CRAFTS HIGHWAY SHORT 2 N THROUGH WOOD MEDIUM S WOOD
From this we can see that the features/columns have no names. To fix this it would be nice to use another dataset with the name but unfortunetely there isn’t an easily accesible dictionary. Instead we must cut and paste frome here and just do some good old fashioned delete, copy and paste.
colnames(df) <- c("IDENTIF", "RIVER", "LOCATION", "ERECTED", "PURPOSE", "LENGTH", "LANES", "CLEAR-G", "T-OR-D", "MATERIAL", "SPAN", "REL-L", "TYPE")
We can now see the named data frame:
head(df)
## IDENTIF RIVER LOCATION ERECTED PURPOSE LENGTH LANES CLEAR-G T-OR-D
## 1 E1 M 3 CRAFTS HIGHWAY ? 2 N THROUGH
## 2 E2 A 25 CRAFTS HIGHWAY MEDIUM 2 N THROUGH
## 3 E3 A 39 CRAFTS AQUEDUCT ? 1 N THROUGH
## 4 E5 A 29 CRAFTS HIGHWAY MEDIUM 2 N THROUGH
## 5 E6 M 23 CRAFTS HIGHWAY ? 2 N THROUGH
## 6 E7 A 27 CRAFTS HIGHWAY SHORT 2 N THROUGH
## MATERIAL SPAN REL-L TYPE
## 1 WOOD SHORT S WOOD
## 2 WOOD SHORT S WOOD
## 3 WOOD ? S WOOD
## 4 WOOD SHORT S WOOD
## 5 WOOD ? S WOOD
## 6 WOOD MEDIUM S WOOD
While it is subjective, I think interesting variables include:
subBridge <- subset(x = df, select = c(IDENTIF, RIVER, PURPOSE, LENGTH, LANES) )
head(subBridge)
## IDENTIF RIVER PURPOSE LENGTH LANES
## 1 E1 M HIGHWAY ? 2
## 2 E2 A HIGHWAY MEDIUM 2
## 3 E3 A AQUEDUCT ? 1
## 4 E5 A HIGHWAY MEDIUM 2
## 5 E6 M HIGHWAY ? 2
## 6 E7 A HIGHWAY SHORT 2
I am a bit unsure what else to do with the lab. There are some really cool examples on the CUNY sight with people doing graphs and stuff. (Just histograms etc.) Since this is ungraded, I will concentrate on the assigment.