You should first study the data and its associated description (i.e. “data dictionary”).

You should take the data, and create an R dataframe with a subset of the columns in the dataset.

You should provide relevant column names. Your deliverable is the R Markdown code to perform

these transformation tasks.

Solutions:

The given link in the Hands on Lab pdf file could not work so I was able to google around and found the dataset on the link below: https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1

pittsburgBridges <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1', header=TRUE,na.strings = "?")
summary(pittsburgBridges)
##        E1      M            X3            X1818          HIGHWAY  
##  E10    :  1   A:49   Min.   : 1.00   Min.   :1819   AQUEDUCT: 4  
##  E100   :  1   M:40   1st Qu.:16.00   1st Qu.:1884   HIGHWAY :70  
##  E101   :  1   O:15   Median :27.50   Median :1903   RR      :32  
##  E102   :  1   Y: 3   Mean   :26.20   Mean   :1906   WALK    : 1  
##  E103   :  1          3rd Qu.:37.75   3rd Qu.:1928                
##  E105   :  1          Max.   :52.00   Max.   :1986                
##  (Other):101          NA's   :1                                   
##        X.             X2           N         THROUGH      WOOD   
##  Min.   : 804   Min.   :1.000   G   :80   DECK   :15   IRON :11  
##  1st Qu.:1000   1st Qu.:2.000   N   :25   THROUGH:86   STEEL:79  
##  Median :1300   Median :2.000   NA's: 2   NA's   : 6   WOOD :15  
##  Mean   :1567   Mean   :2.637                          NA's : 2  
##  3rd Qu.:2000   3rd Qu.:4.000                                    
##  Max.   :4558   Max.   :6.000                                    
##  NA's   :26     NA's   :16                                       
##     SHORT       S           WOOD.1  
##  LONG  :30   F   :58   SIMPLE-T:44  
##  MEDIUM:53   S   :29   WOOD    :15  
##  SHORT : 8   S-F :15   ARCH    :13  
##  NA's  :16   NA's: 5   CANTILEV:11  
##                        SUSPEN  :11  
##                        (Other) :11  
##                        NA's    : 2
head(pittsburgBridges)
##   E1 M X3 X1818  HIGHWAY   X. X2 N THROUGH WOOD  SHORT S WOOD.1
## 1 E2 A 25  1819  HIGHWAY 1037  2 N THROUGH WOOD  SHORT S   WOOD
## 2 E3 A 39  1829 AQUEDUCT   NA  1 N THROUGH WOOD   <NA> S   WOOD
## 3 E5 A 29  1837  HIGHWAY 1000  2 N THROUGH WOOD  SHORT S   WOOD
## 4 E6 M 23  1838  HIGHWAY   NA  2 N THROUGH WOOD   <NA> S   WOOD
## 5 E7 A 27  1840  HIGHWAY  990  2 N THROUGH WOOD MEDIUM S   WOOD
## 6 E8 A 28  1844 AQUEDUCT 1000  1 N THROUGH IRON  SHORT S SUSPEN
dim(pittsburgBridges)
## [1] 107  13

The complete details about this dataset including the column names and contributors can be gotten from the url below:

ftp://ftp.ics.uci.edu/pub/machine-learning-databases/bridges/bridges.names

textStream <- readLines('ftp://ftp.ics.uci.edu/pub/machine-learning-databases/bridges/bridges.names')
#replace tabs with spaces, trim whit spaces at beginning and end, and remove quotations
print(gsub("\t"," ",trimws(textStream)), quote=FALSE)
##   [1] 1. Title: Pittsburgh bridges                                                
##   [2]                                                                             
##   [3] 2. Sources:                                                                 
##   [4] -- Yoram Reich & Steven J. Fenves                                           
##   [5] Department of Civil Engineering                                             
##   [6] and                                                                         
##   [7] Engineering Design Research Center                                          
##   [8] Carnegie Mellon University                                                  
##   [9] Pittsburgh, PA 15213                                                        
##  [10]                                                                             
##  [11] Compiled from various sources.                                              
##  [12]                                                                             
##  [13] -- Donor: Yoram Reich (yoram.reich@cs.cmu.edu)                              
##  [14] -- Date: 1 August 1990                                                      
##  [15]                                                                             
##  [16] 3. Past Usage:                                                              
##  [17]                                                                             
##  [18] -- Reich & Fenves (1989). Incremental Learning for Capturing Design         
##  [19] Expertise. Technical Report: EDRC 12-34-89, Engineering Design              
##  [20] Research Center, Carnegie Mellon University, Pittsburgh, PA.                
##  [21] -- Qualitative results and runs with original ordering of examples.         
##  [22] using COBWEB.                                                               
##  [23]                                                                             
##  [24] -- Reich (1989). Converging to ``Ideal'' Design Knowledge by Learning,      
##  [25] Proceedings of the First International Workshop on Formal Methods in        
##  [26] Engineering Design, pp: 330-349, Colorado Springs, CO, January 1990.        
##  [27] -- Describes a new design method with Bridger (variant of COBWEB) using     
##  [28] this domain. (Also an EDRC report: 12-35-89)                                
##  [29]                                                                             
##  [30] -- Reich (1989) Combining Nominal and Continuous Properties in an           
##  [31] Incremental Learning System for Design. Technical Report: EDRC 12-33-89.    
##  [32] -- Comparison of performance of Bridger when running on both versions       
##  [33] (V1 and V2) of the database                                                 
##  [34]                                                                             
##  [35] -- Reich (1989) Incremental Concept Formation with Mixed Property Types     
##  [36] Unpublished Manuscript.                                                     
##  [37] -- Results using 10 random 10-fold cross-validation test with Bridger       
##  [38] (relative error rate):                                                      
##  [39] Version V1 of the database:                                                 
##  [40] MATERIAL 18.4%, REL-L 38.7%, SPAN 42.7%, T-OR-D 14.7%, TYPE 47.6%.          
##  [41] Version V2 of the database:                                                 
##  [42] MATERIAL 24.2%, REL-L 41.7%, SPAN 39.9%, T-OR-D 14.7%, TYPE 56.5%.          
##  [43]                                                                             
##  [44] -- Quinlan (1989) Personal communication.                                   
##  [45] -- Results of a 10-fold cross-validation test with C4.5, and with           
##  [46] a separate decision tree for each design property obtained the              
##  [47] following error rates on version V1 of the database:                        
##  [48] MATERIAL 15%, REL-L 32%, SPAN 32%, T-OR-D 15%, TYPE 44%.                    
##  [49]                                                                             
##  [50] 4. Number of instances: 108                                                 
##  [51]                                                                             
##  [52] 5. Relevant Information:                                                    
##  [53]                                                                             
##  [54] There are two versions to the database:                                     
##  [55] V1 contains the original examples and                                       
##  [56] V2 contains descriptions after discretizing numeric properties.             
##  [57]                                                                             
##  [58] There are no ``classes'' in the domain. Rather this is a DESIGN domain where
##  [59] 5 properties (design description) need to be predicted based on 7           
##  [60] specification properties.                                                   
##  [61]                                                                             
##  [62] 6. Number of Attributes: 13: 7 specifications, 5 design description, and 1  
##  [63] identifier (not used for the classification)                                
##  [64]                                                                             
##  [65] 7. Attribute Information:                                                   
##  [66]                                                                             
##  [67] The type field state whether a property is continuous/integer (c)           
##  [68] or nominal (n).                                                             
##  [69] For properties with c,n type, the range of continuous numbers is given      
##  [70] first and the possible values of the nominal follow the semi-colon.         
##  [71]                                                                             
##  [72]                                                                             
##  [73] name     type    possible values  comments                                  
##  [74] ------------------------------------------------------------------------    
##  [75] 1.  IDENTIF - -   identifier of the examples                                
##  [76] 2.  RIVER n A, M, O                                                         
##  [77] 3.  LOCATION n       1 to 52                                                
##  [78] 4.  ERECTED c,n 1818-1986 ; CRAFTS, EMERGING, MATURE, MODERN                
##  [79] 5.  PURPOSE n WALK, AQUEDUCT, RR, HIGHWAY                                   
##  [80] 6.  LENGTH c,n 804-4558 ; SHORT, MEDIUM, LONG                               
##  [81] 7.  LANES c,n 1, 2, 4, 6 ; 1, 2, 4, 6                                       
##  [82] 8.  CLEAR-G n N, G                                                          
##  [83] 9.  T-OR-D n THROUGH, DECK                                                  
##  [84] 10. MATERIAL n WOOD, IRON, STEEL                                            
##  [85] 11. SPAN n SHORT, MEDUIM, LONG                                              
##  [86] 12. REL-L n S, S-F, F                                                       
##  [87] 13. TYPE n WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, CONT-T                   
##  [88]                                                                             
##  [89]                                                                             
##  [90] 8. More complicated attributes:                                             
##  [91]                                                                             
##  [92] One can use a hierarchical structure for the Type property. There are two   
##  [93] options.                                                                    
##  [94]                                                                             
##  [95] option 1 (use examples without modification)                                
##  [96] --------                                                                    
##  [97]                                                                             
##  [98] Type                                                                        
##  [99] /      /      \\     \\                                                     
## [100] /       /       \\      \\                                                  
## [101] wood suspen  arch truss                                                     
## [102] /  |    \\                                                                  
## [103] /    |      \\                                                              
## [104] cantilev  cont-t   simple                                                   
## [105]                                                                             
## [106]                                                                             
## [107] option 2 (requires changes in the Type property - specified bellow)         
## [108] --------                                                                    
## [109]                                                                             
## [110] Type                                                                        
## [111]                                                                             
## [112] /      /        |          \\                                               
## [113] /     /          |          \\                                              
## [114] wood   suspen arch        truss                                             
## [115] / \\         /  |  \\    \\                                                 
## [116] /     \\     /    |   \\ \\                                                 
## [117] tied-a    not-tied  cantilev cont-t simple arch-t                           
## [118]                                                                             
## [119]                                                                             
## [120] Change the Type  property of the following examples (in both V1 and V2):    
## [121] E28   ->  arch-t                                                            
## [122] E91,E90,E84,E83,E73  -> tied-a                                              
## [123] E97,E78,E77,E75,E66,E64,E43  -> not-tied                                    
## [124]                                                                             
## [125]                                                                             
## [126] 9. Missing Attribute Values:                                                
## [127] Attribute #:  # instances with missing values:                              
## [128] 2    1                                                                      
## [129] 6   27                                                                      
## [130] 7   16                                                                      
## [131] 8    2                                                                      
## [132] 9    6                                                                      
## [133] 10    2                                                                     
## [134] 11   16                                                                     
## [135] 12    5                                                                     
## [136] 13    3

From the above information, I can conveniently rename the columns as follows

names(pittsburgBridges) <- c('IDENTIF','RIVER', 'LOCATION','ERECTED','PURPOSE','LENGTH','LANES','CLEAR-G','T-OR-D','MATERIAL','SPAN','REL-L','TYPE')
head(pittsburgBridges)
##   IDENTIF RIVER LOCATION ERECTED  PURPOSE LENGTH LANES CLEAR-G  T-OR-D
## 1      E2     A       25    1819  HIGHWAY   1037     2       N THROUGH
## 2      E3     A       39    1829 AQUEDUCT     NA     1       N THROUGH
## 3      E5     A       29    1837  HIGHWAY   1000     2       N THROUGH
## 4      E6     M       23    1838  HIGHWAY     NA     2       N THROUGH
## 5      E7     A       27    1840  HIGHWAY    990     2       N THROUGH
## 6      E8     A       28    1844 AQUEDUCT   1000     1       N THROUGH
##   MATERIAL   SPAN REL-L   TYPE
## 1     WOOD  SHORT     S   WOOD
## 2     WOOD   <NA>     S   WOOD
## 3     WOOD  SHORT     S   WOOD
## 4     WOOD   <NA>     S   WOOD
## 5     WOOD MEDIUM     S   WOOD
## 6     IRON  SHORT     S SUSPEN

A subset of the coumns can be used to create a new dataframe as follows:

newBridges <- pittsburgBridges[, c(1,2,3,4,7,9)]
head(newBridges)
##   IDENTIF RIVER LOCATION ERECTED LANES  T-OR-D
## 1      E2     A       25    1819     2 THROUGH
## 2      E3     A       39    1829     1 THROUGH
## 3      E5     A       29    1837     2 THROUGH
## 4      E6     M       23    1838     2 THROUGH
## 5      E7     A       27    1840     2 THROUGH
## 6      E8     A       28    1844     1 THROUGH