The given link in the Hands on Lab pdf file could not work so I was able to google around and found the dataset on the link below: https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1
pittsburgBridges <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/bridges/bridges.data.version1', header=TRUE,na.strings = "?")
summary(pittsburgBridges)
## E1 M X3 X1818 HIGHWAY
## E10 : 1 A:49 Min. : 1.00 Min. :1819 AQUEDUCT: 4
## E100 : 1 M:40 1st Qu.:16.00 1st Qu.:1884 HIGHWAY :70
## E101 : 1 O:15 Median :27.50 Median :1903 RR :32
## E102 : 1 Y: 3 Mean :26.20 Mean :1906 WALK : 1
## E103 : 1 3rd Qu.:37.75 3rd Qu.:1928
## E105 : 1 Max. :52.00 Max. :1986
## (Other):101 NA's :1
## X. X2 N THROUGH WOOD
## Min. : 804 Min. :1.000 G :80 DECK :15 IRON :11
## 1st Qu.:1000 1st Qu.:2.000 N :25 THROUGH:86 STEEL:79
## Median :1300 Median :2.000 NA's: 2 NA's : 6 WOOD :15
## Mean :1567 Mean :2.637 NA's : 2
## 3rd Qu.:2000 3rd Qu.:4.000
## Max. :4558 Max. :6.000
## NA's :26 NA's :16
## SHORT S WOOD.1
## LONG :30 F :58 SIMPLE-T:44
## MEDIUM:53 S :29 WOOD :15
## SHORT : 8 S-F :15 ARCH :13
## NA's :16 NA's: 5 CANTILEV:11
## SUSPEN :11
## (Other) :11
## NA's : 2
head(pittsburgBridges)
## E1 M X3 X1818 HIGHWAY X. X2 N THROUGH WOOD SHORT S WOOD.1
## 1 E2 A 25 1819 HIGHWAY 1037 2 N THROUGH WOOD SHORT S WOOD
## 2 E3 A 39 1829 AQUEDUCT NA 1 N THROUGH WOOD <NA> S WOOD
## 3 E5 A 29 1837 HIGHWAY 1000 2 N THROUGH WOOD SHORT S WOOD
## 4 E6 M 23 1838 HIGHWAY NA 2 N THROUGH WOOD <NA> S WOOD
## 5 E7 A 27 1840 HIGHWAY 990 2 N THROUGH WOOD MEDIUM S WOOD
## 6 E8 A 28 1844 AQUEDUCT 1000 1 N THROUGH IRON SHORT S SUSPEN
dim(pittsburgBridges)
## [1] 107 13
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/bridges/bridges.names
textStream <- readLines('ftp://ftp.ics.uci.edu/pub/machine-learning-databases/bridges/bridges.names')
#replace tabs with spaces, trim whit spaces at beginning and end, and remove quotations
print(gsub("\t"," ",trimws(textStream)), quote=FALSE)
## [1] 1. Title: Pittsburgh bridges
## [2]
## [3] 2. Sources:
## [4] -- Yoram Reich & Steven J. Fenves
## [5] Department of Civil Engineering
## [6] and
## [7] Engineering Design Research Center
## [8] Carnegie Mellon University
## [9] Pittsburgh, PA 15213
## [10]
## [11] Compiled from various sources.
## [12]
## [13] -- Donor: Yoram Reich (yoram.reich@cs.cmu.edu)
## [14] -- Date: 1 August 1990
## [15]
## [16] 3. Past Usage:
## [17]
## [18] -- Reich & Fenves (1989). Incremental Learning for Capturing Design
## [19] Expertise. Technical Report: EDRC 12-34-89, Engineering Design
## [20] Research Center, Carnegie Mellon University, Pittsburgh, PA.
## [21] -- Qualitative results and runs with original ordering of examples.
## [22] using COBWEB.
## [23]
## [24] -- Reich (1989). Converging to ``Ideal'' Design Knowledge by Learning,
## [25] Proceedings of the First International Workshop on Formal Methods in
## [26] Engineering Design, pp: 330-349, Colorado Springs, CO, January 1990.
## [27] -- Describes a new design method with Bridger (variant of COBWEB) using
## [28] this domain. (Also an EDRC report: 12-35-89)
## [29]
## [30] -- Reich (1989) Combining Nominal and Continuous Properties in an
## [31] Incremental Learning System for Design. Technical Report: EDRC 12-33-89.
## [32] -- Comparison of performance of Bridger when running on both versions
## [33] (V1 and V2) of the database
## [34]
## [35] -- Reich (1989) Incremental Concept Formation with Mixed Property Types
## [36] Unpublished Manuscript.
## [37] -- Results using 10 random 10-fold cross-validation test with Bridger
## [38] (relative error rate):
## [39] Version V1 of the database:
## [40] MATERIAL 18.4%, REL-L 38.7%, SPAN 42.7%, T-OR-D 14.7%, TYPE 47.6%.
## [41] Version V2 of the database:
## [42] MATERIAL 24.2%, REL-L 41.7%, SPAN 39.9%, T-OR-D 14.7%, TYPE 56.5%.
## [43]
## [44] -- Quinlan (1989) Personal communication.
## [45] -- Results of a 10-fold cross-validation test with C4.5, and with
## [46] a separate decision tree for each design property obtained the
## [47] following error rates on version V1 of the database:
## [48] MATERIAL 15%, REL-L 32%, SPAN 32%, T-OR-D 15%, TYPE 44%.
## [49]
## [50] 4. Number of instances: 108
## [51]
## [52] 5. Relevant Information:
## [53]
## [54] There are two versions to the database:
## [55] V1 contains the original examples and
## [56] V2 contains descriptions after discretizing numeric properties.
## [57]
## [58] There are no ``classes'' in the domain. Rather this is a DESIGN domain where
## [59] 5 properties (design description) need to be predicted based on 7
## [60] specification properties.
## [61]
## [62] 6. Number of Attributes: 13: 7 specifications, 5 design description, and 1
## [63] identifier (not used for the classification)
## [64]
## [65] 7. Attribute Information:
## [66]
## [67] The type field state whether a property is continuous/integer (c)
## [68] or nominal (n).
## [69] For properties with c,n type, the range of continuous numbers is given
## [70] first and the possible values of the nominal follow the semi-colon.
## [71]
## [72]
## [73] name type possible values comments
## [74] ------------------------------------------------------------------------
## [75] 1. IDENTIF - - identifier of the examples
## [76] 2. RIVER n A, M, O
## [77] 3. LOCATION n 1 to 52
## [78] 4. ERECTED c,n 1818-1986 ; CRAFTS, EMERGING, MATURE, MODERN
## [79] 5. PURPOSE n WALK, AQUEDUCT, RR, HIGHWAY
## [80] 6. LENGTH c,n 804-4558 ; SHORT, MEDIUM, LONG
## [81] 7. LANES c,n 1, 2, 4, 6 ; 1, 2, 4, 6
## [82] 8. CLEAR-G n N, G
## [83] 9. T-OR-D n THROUGH, DECK
## [84] 10. MATERIAL n WOOD, IRON, STEEL
## [85] 11. SPAN n SHORT, MEDUIM, LONG
## [86] 12. REL-L n S, S-F, F
## [87] 13. TYPE n WOOD, SUSPEN, SIMPLE-T, ARCH, CANTILEV, CONT-T
## [88]
## [89]
## [90] 8. More complicated attributes:
## [91]
## [92] One can use a hierarchical structure for the Type property. There are two
## [93] options.
## [94]
## [95] option 1 (use examples without modification)
## [96] --------
## [97]
## [98] Type
## [99] / / \\ \\
## [100] / / \\ \\
## [101] wood suspen arch truss
## [102] / | \\
## [103] / | \\
## [104] cantilev cont-t simple
## [105]
## [106]
## [107] option 2 (requires changes in the Type property - specified bellow)
## [108] --------
## [109]
## [110] Type
## [111]
## [112] / / | \\
## [113] / / | \\
## [114] wood suspen arch truss
## [115] / \\ / | \\ \\
## [116] / \\ / | \\ \\
## [117] tied-a not-tied cantilev cont-t simple arch-t
## [118]
## [119]
## [120] Change the Type property of the following examples (in both V1 and V2):
## [121] E28 -> arch-t
## [122] E91,E90,E84,E83,E73 -> tied-a
## [123] E97,E78,E77,E75,E66,E64,E43 -> not-tied
## [124]
## [125]
## [126] 9. Missing Attribute Values:
## [127] Attribute #: # instances with missing values:
## [128] 2 1
## [129] 6 27
## [130] 7 16
## [131] 8 2
## [132] 9 6
## [133] 10 2
## [134] 11 16
## [135] 12 5
## [136] 13 3
names(pittsburgBridges) <- c('IDENTIF','RIVER', 'LOCATION','ERECTED','PURPOSE','LENGTH','LANES','CLEAR-G','T-OR-D','MATERIAL','SPAN','REL-L','TYPE')
head(pittsburgBridges)
## IDENTIF RIVER LOCATION ERECTED PURPOSE LENGTH LANES CLEAR-G T-OR-D
## 1 E2 A 25 1819 HIGHWAY 1037 2 N THROUGH
## 2 E3 A 39 1829 AQUEDUCT NA 1 N THROUGH
## 3 E5 A 29 1837 HIGHWAY 1000 2 N THROUGH
## 4 E6 M 23 1838 HIGHWAY NA 2 N THROUGH
## 5 E7 A 27 1840 HIGHWAY 990 2 N THROUGH
## 6 E8 A 28 1844 AQUEDUCT 1000 1 N THROUGH
## MATERIAL SPAN REL-L TYPE
## 1 WOOD SHORT S WOOD
## 2 WOOD <NA> S WOOD
## 3 WOOD SHORT S WOOD
## 4 WOOD <NA> S WOOD
## 5 WOOD MEDIUM S WOOD
## 6 IRON SHORT S SUSPEN
newBridges <- pittsburgBridges[, c(1,2,3,4,7,9)]
head(newBridges)
## IDENTIF RIVER LOCATION ERECTED LANES T-OR-D
## 1 E2 A 25 1819 2 THROUGH
## 2 E3 A 39 1829 1 THROUGH
## 3 E5 A 29 1837 2 THROUGH
## 4 E6 M 23 1838 2 THROUGH
## 5 E7 A 27 1840 2 THROUGH
## 6 E8 A 28 1844 1 THROUGH