Lab 1 of CUNY MSDA DATA 607 request that the this dataset be analyzed. There are two potential datasets, V1 and V2. I chose to analyze V2 because it had better numeric properies. (See link to understand the full reason.)

Loading the data:

df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version2",
        header= FALSE, sep=",") 
head(df)
##   V1 V2 V3     V4       V5     V6 V7 V8      V9  V10    V11 V12  V13
## 1 E1  M  3 CRAFTS  HIGHWAY      ?  2  N THROUGH WOOD  SHORT   S WOOD
## 2 E2  A 25 CRAFTS  HIGHWAY MEDIUM  2  N THROUGH WOOD  SHORT   S WOOD
## 3 E3  A 39 CRAFTS AQUEDUCT      ?  1  N THROUGH WOOD      ?   S WOOD
## 4 E5  A 29 CRAFTS  HIGHWAY MEDIUM  2  N THROUGH WOOD  SHORT   S WOOD
## 5 E6  M 23 CRAFTS  HIGHWAY      ?  2  N THROUGH WOOD      ?   S WOOD
## 6 E7  A 27 CRAFTS  HIGHWAY  SHORT  2  N THROUGH WOOD MEDIUM   S WOOD

From this we can see that the features/columns have no names. To fix this it would be nice to use another dataset with the name but unfortunetely there isn’t an easily accesible dictionary. Instead we must cut and paste frome here and just do some good old fashioned delete, copy and paste.

colnames(df) <- c("IDENTIF", "RIVER", "LOCATION", "ERECTED", "PURPOSE", "LENGTH", "LANES", "CLEAR-G", "T-OR-D", "MATERIAL", "SPAN", "REL-L", "TYPE")

We can now see the named data frame:

head(df)
##   IDENTIF RIVER LOCATION ERECTED  PURPOSE LENGTH LANES CLEAR-G  T-OR-D
## 1      E1     M        3  CRAFTS  HIGHWAY      ?     2       N THROUGH
## 2      E2     A       25  CRAFTS  HIGHWAY MEDIUM     2       N THROUGH
## 3      E3     A       39  CRAFTS AQUEDUCT      ?     1       N THROUGH
## 4      E5     A       29  CRAFTS  HIGHWAY MEDIUM     2       N THROUGH
## 5      E6     M       23  CRAFTS  HIGHWAY      ?     2       N THROUGH
## 6      E7     A       27  CRAFTS  HIGHWAY  SHORT     2       N THROUGH
##   MATERIAL   SPAN REL-L TYPE
## 1     WOOD  SHORT     S WOOD
## 2     WOOD  SHORT     S WOOD
## 3     WOOD      ?     S WOOD
## 4     WOOD  SHORT     S WOOD
## 5     WOOD      ?     S WOOD
## 6     WOOD MEDIUM     S WOOD

While it is subjective, I think interesting variables include:

subBridge <- subset(x = df, select = c(IDENTIF, RIVER, PURPOSE, LENGTH, LANES) )
head(subBridge)
##   IDENTIF RIVER  PURPOSE LENGTH LANES
## 1      E1     M  HIGHWAY      ?     2
## 2      E2     A  HIGHWAY MEDIUM     2
## 3      E3     A AQUEDUCT      ?     1
## 4      E5     A  HIGHWAY MEDIUM     2
## 5      E6     M  HIGHWAY      ?     2
## 6      E7     A  HIGHWAY  SHORT     2

I am a bit unsure what else to do with the lab. There are some really cool examples on the CUNY sight with people doing graphs and stuff. (Just histograms etc.) Since this is ungraded, I will concentrate on the assigment.