From the lab pdf (https://bbhosted.cuny.edu/bbcswebdav/pid-37075352-dt-content-rid-234288038_1/xid-234288038_1):

Your task is to work with the Pittsburgh bridges dataset: https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges

Import the data

# Read CSV into R
BridgesRaw <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version2", header=FALSE, sep=",")

Explore the data

You should first study the data and its associated description (i.e. “data dictionary”).

summary(BridgesRaw)
##        V1      V2           V3            V4            V5          V6    
##  E1     :  1   A:49   28     : 5   CRAFTS  :18   AQUEDUCT: 4   ?     :27  
##  E10    :  1   M:41   39     : 5   EMERGING:15   HIGHWAY :71   LONG  :21  
##  E100   :  1   O:15   25     : 4   MATURE  :54   RR      :32   MEDIUM:48  
##  E101   :  1   Y: 3   27     : 4   MODERN  :21   WALK    : 1   SHORT :12  
##  E102   :  1          29     : 4                                          
##  E103   :  1          1      : 3                                          
##  (Other):102          (Other):83                                          
##  V7     V8           V9        V10         V11      V12           V13    
##  ?:16   ?: 2   ?      : 6   ?    : 2   ?     :16   ?  : 5   SIMPLE-T:44  
##  1: 4   G:80   DECK   :15   IRON :11   LONG  :30   F  :58   WOOD    :16  
##  2:61   N:26   THROUGH:87   STEEL:79   MEDIUM:53   S  :30   ARCH    :13  
##  4:23                       WOOD :16   SHORT : 9   S-F:15   CANTILEV:11  
##  6: 4                                                       SUSPEN  :11  
##                                                             CONT-T  :10  
##                                                             (Other) : 3

Name the data

You should provide relevant column names.

From the Data Dicionary file (https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.names):

name type possible values comments
1. IDENTIF - - identifier of the examples
2. RIVER n A, M, O
3. LOCATION n 1 to 52
4. ERECTED c,n 1818-1986 ; CRAFTS, EMERGING, MATURE, MODERN
5. PURPOSE n WALK, AQUEDUCT, RR, HIGHWAY
6. LENGTH c,n 804-4558 ; SHORT, MEDIUM, LONG
7. LANES c,n 1, 2, 4, 6 ; 1, 2, 4, 6
8. CLEAR-G n N, G
9. T-OR-D n THROUGH, DECK
10. MATERIAL n WOOD, IRON, STEEL
11. SPAN n SHORT, MEDUIM, LONG
12. REL-L n S, S-F, F
13. TYPE n WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, CONT-T
colnames(BridgesRaw) <- c("IDENTIF","RIVER","LOCATION","ERECTED","PURPOSE","LENGTH","LANES","CLEAR-G","T-OR-D","MATERIAL","SPAN","REL-L","TYPE")

Subset the data

You should take the data, and create an R data frame with a subset of the columns in the dataset.

MediumWoodBridges <- subset(BridgesRaw,MATERIAL=="WOOD" & LENGTH=="MEDIUM",SELECT = c(IDENTIF,RIVER,LOCATION,PURPOSE,TYPE))
summary(MediumWoodBridges)
##     IDENTIF  RIVER    LOCATION     ERECTED      PURPOSE     LENGTH  LANES
##  E11    :1   A:7   29     :3   CRAFTS  :5   AQUEDUCT:1   ?     :0   ?:0  
##  E14    :1   M:1   24     :1   EMERGING:2   HIGHWAY :7   LONG  :0   1:1  
##  E19    :1   O:0   25     :1   MATURE  :1   RR      :0   MEDIUM:8   2:6  
##  E2     :1   Y:0   27     :1   MODERN  :0   WALK    :0   SHORT :0   4:1  
##  E20    :1         32     :1                                        6:0  
##  E22    :1         6      :1                                             
##  (Other):2         (Other):0                                             
##  CLEAR-G     T-OR-D   MATERIAL     SPAN   REL-L         TYPE  
##  ?:0     ?      :0   ?    :0   ?     :0   ?  :0   WOOD    :8  
##  G:1     DECK   :0   IRON :0   LONG  :0   F  :0   ?       :0  
##  N:7     THROUGH:8   STEEL:0   MEDIUM:4   S  :8   ARCH    :0  
##                      WOOD :8   SHORT :4   S-F:0   CANTILEV:0  
##                                                   CONT-T  :0  
##                                                   NIL     :0  
##                                                   (Other) :0