From the lab pdf (https://bbhosted.cuny.edu/bbcswebdav/pid-37075352-dt-content-rid-234288038_1/xid-234288038_1):
Your task is to work with the Pittsburgh bridges dataset: https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges
Import the data
# Read CSV into R
BridgesRaw <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version2", header=FALSE, sep=",")
You should first study the data and its associated description (i.e. “data dictionary”).
summary(BridgesRaw)
## V1 V2 V3 V4 V5 V6
## E1 : 1 A:49 28 : 5 CRAFTS :18 AQUEDUCT: 4 ? :27
## E10 : 1 M:41 39 : 5 EMERGING:15 HIGHWAY :71 LONG :21
## E100 : 1 O:15 25 : 4 MATURE :54 RR :32 MEDIUM:48
## E101 : 1 Y: 3 27 : 4 MODERN :21 WALK : 1 SHORT :12
## E102 : 1 29 : 4
## E103 : 1 1 : 3
## (Other):102 (Other):83
## V7 V8 V9 V10 V11 V12 V13
## ?:16 ?: 2 ? : 6 ? : 2 ? :16 ? : 5 SIMPLE-T:44
## 1: 4 G:80 DECK :15 IRON :11 LONG :30 F :58 WOOD :16
## 2:61 N:26 THROUGH:87 STEEL:79 MEDIUM:53 S :30 ARCH :13
## 4:23 WOOD :16 SHORT : 9 S-F:15 CANTILEV:11
## 6: 4 SUSPEN :11
## CONT-T :10
## (Other) : 3
You should provide relevant column names.
From the Data Dicionary file (https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.names):
name type possible values comments |
---|
1. IDENTIF - - identifier of the examples |
2. RIVER n A, M, O |
3. LOCATION n 1 to 52 |
4. ERECTED c,n 1818-1986 ; CRAFTS, EMERGING, MATURE, MODERN |
5. PURPOSE n WALK, AQUEDUCT, RR, HIGHWAY |
6. LENGTH c,n 804-4558 ; SHORT, MEDIUM, LONG |
7. LANES c,n 1, 2, 4, 6 ; 1, 2, 4, 6 |
8. CLEAR-G n N, G |
9. T-OR-D n THROUGH, DECK |
10. MATERIAL n WOOD, IRON, STEEL |
11. SPAN n SHORT, MEDUIM, LONG |
12. REL-L n S, S-F, F |
13. TYPE n WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, CONT-T |
colnames(BridgesRaw) <- c("IDENTIF","RIVER","LOCATION","ERECTED","PURPOSE","LENGTH","LANES","CLEAR-G","T-OR-D","MATERIAL","SPAN","REL-L","TYPE")
You should take the data, and create an R data frame with a subset of the columns in the dataset.
MediumWoodBridges <- subset(BridgesRaw,MATERIAL=="WOOD" & LENGTH=="MEDIUM",SELECT = c(IDENTIF,RIVER,LOCATION,PURPOSE,TYPE))
summary(MediumWoodBridges)
## IDENTIF RIVER LOCATION ERECTED PURPOSE LENGTH LANES
## E11 :1 A:7 29 :3 CRAFTS :5 AQUEDUCT:1 ? :0 ?:0
## E14 :1 M:1 24 :1 EMERGING:2 HIGHWAY :7 LONG :0 1:1
## E19 :1 O:0 25 :1 MATURE :1 RR :0 MEDIUM:8 2:6
## E2 :1 Y:0 27 :1 MODERN :0 WALK :0 SHORT :0 4:1
## E20 :1 32 :1 6:0
## E22 :1 6 :1
## (Other):2 (Other):0
## CLEAR-G T-OR-D MATERIAL SPAN REL-L TYPE
## ?:0 ? :0 ? :0 ? :0 ? :0 WOOD :8
## G:1 DECK :0 IRON :0 LONG :0 F :0 ? :0
## N:7 THROUGH:8 STEEL:0 MEDIUM:4 S :8 ARCH :0
## WOOD :8 SHORT :4 S-F:0 CANTILEV:0
## CONT-T :0
## NIL :0
## (Other) :0
Your deliverable is the R Markdown code to perform these transformation tasks.