First we have to install the package ggplot2:
#install.packages("ggplot2")
Next, we have to load the following packages (ggplot2 to generate scatterplots, plyr to change the column names, and datasets so that we can obtain the mtcars dataset:
library(ggplot2)
library(plyr)
library(datasets)
Next, we change the names of the columns as follows:
colnames(mtcars) <- c("Mileage (miles/gallon)",
"Number of Cylinders",
"Displacement (cubic inches)",
"Gross Horsepower",
"Rear Axle Ratio",
"Weight (lb/1000)",
"Quarter Mile Time (Seconds)",
"Engine Shape (V or Straight)",
"Manual or Automatic Transmission",
"Number of Forward Gears",
"Number of Carburettors")
mtcars
## Mileage (miles/gallon) Number of Cylinders
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
## Duster 360 14.3 8
## Merc 240D 24.4 4
## Merc 230 22.8 4
## Merc 280 19.2 6
## Merc 280C 17.8 6
## Merc 450SE 16.4 8
## Merc 450SL 17.3 8
## Merc 450SLC 15.2 8
## Cadillac Fleetwood 10.4 8
## Lincoln Continental 10.4 8
## Chrysler Imperial 14.7 8
## Fiat 128 32.4 4
## Honda Civic 30.4 4
## Toyota Corolla 33.9 4
## Toyota Corona 21.5 4
## Dodge Challenger 15.5 8
## AMC Javelin 15.2 8
## Camaro Z28 13.3 8
## Pontiac Firebird 19.2 8
## Fiat X1-9 27.3 4
## Porsche 914-2 26.0 4
## Lotus Europa 30.4 4
## Ford Pantera L 15.8 8
## Ferrari Dino 19.7 6
## Maserati Bora 15.0 8
## Volvo 142E 21.4 4
## Displacement (cubic inches) Gross Horsepower
## Mazda RX4 160.0 110
## Mazda RX4 Wag 160.0 110
## Datsun 710 108.0 93
## Hornet 4 Drive 258.0 110
## Hornet Sportabout 360.0 175
## Valiant 225.0 105
## Duster 360 360.0 245
## Merc 240D 146.7 62
## Merc 230 140.8 95
## Merc 280 167.6 123
## Merc 280C 167.6 123
## Merc 450SE 275.8 180
## Merc 450SL 275.8 180
## Merc 450SLC 275.8 180
## Cadillac Fleetwood 472.0 205
## Lincoln Continental 460.0 215
## Chrysler Imperial 440.0 230
## Fiat 128 78.7 66
## Honda Civic 75.7 52
## Toyota Corolla 71.1 65
## Toyota Corona 120.1 97
## Dodge Challenger 318.0 150
## AMC Javelin 304.0 150
## Camaro Z28 350.0 245
## Pontiac Firebird 400.0 175
## Fiat X1-9 79.0 66
## Porsche 914-2 120.3 91
## Lotus Europa 95.1 113
## Ford Pantera L 351.0 264
## Ferrari Dino 145.0 175
## Maserati Bora 301.0 335
## Volvo 142E 121.0 109
## Rear Axle Ratio Weight (lb/1000)
## Mazda RX4 3.90 2.620
## Mazda RX4 Wag 3.90 2.875
## Datsun 710 3.85 2.320
## Hornet 4 Drive 3.08 3.215
## Hornet Sportabout 3.15 3.440
## Valiant 2.76 3.460
## Duster 360 3.21 3.570
## Merc 240D 3.69 3.190
## Merc 230 3.92 3.150
## Merc 280 3.92 3.440
## Merc 280C 3.92 3.440
## Merc 450SE 3.07 4.070
## Merc 450SL 3.07 3.730
## Merc 450SLC 3.07 3.780
## Cadillac Fleetwood 2.93 5.250
## Lincoln Continental 3.00 5.424
## Chrysler Imperial 3.23 5.345
## Fiat 128 4.08 2.200
## Honda Civic 4.93 1.615
## Toyota Corolla 4.22 1.835
## Toyota Corona 3.70 2.465
## Dodge Challenger 2.76 3.520
## AMC Javelin 3.15 3.435
## Camaro Z28 3.73 3.840
## Pontiac Firebird 3.08 3.845
## Fiat X1-9 4.08 1.935
## Porsche 914-2 4.43 2.140
## Lotus Europa 3.77 1.513
## Ford Pantera L 4.22 3.170
## Ferrari Dino 3.62 2.770
## Maserati Bora 3.54 3.570
## Volvo 142E 4.11 2.780
## Quarter Mile Time (Seconds)
## Mazda RX4 16.46
## Mazda RX4 Wag 17.02
## Datsun 710 18.61
## Hornet 4 Drive 19.44
## Hornet Sportabout 17.02
## Valiant 20.22
## Duster 360 15.84
## Merc 240D 20.00
## Merc 230 22.90
## Merc 280 18.30
## Merc 280C 18.90
## Merc 450SE 17.40
## Merc 450SL 17.60
## Merc 450SLC 18.00
## Cadillac Fleetwood 17.98
## Lincoln Continental 17.82
## Chrysler Imperial 17.42
## Fiat 128 19.47
## Honda Civic 18.52
## Toyota Corolla 19.90
## Toyota Corona 20.01
## Dodge Challenger 16.87
## AMC Javelin 17.30
## Camaro Z28 15.41
## Pontiac Firebird 17.05
## Fiat X1-9 18.90
## Porsche 914-2 16.70
## Lotus Europa 16.90
## Ford Pantera L 14.50
## Ferrari Dino 15.50
## Maserati Bora 14.60
## Volvo 142E 18.60
## Engine Shape (V or Straight)
## Mazda RX4 0
## Mazda RX4 Wag 0
## Datsun 710 1
## Hornet 4 Drive 1
## Hornet Sportabout 0
## Valiant 1
## Duster 360 0
## Merc 240D 1
## Merc 230 1
## Merc 280 1
## Merc 280C 1
## Merc 450SE 0
## Merc 450SL 0
## Merc 450SLC 0
## Cadillac Fleetwood 0
## Lincoln Continental 0
## Chrysler Imperial 0
## Fiat 128 1
## Honda Civic 1
## Toyota Corolla 1
## Toyota Corona 1
## Dodge Challenger 0
## AMC Javelin 0
## Camaro Z28 0
## Pontiac Firebird 0
## Fiat X1-9 1
## Porsche 914-2 0
## Lotus Europa 1
## Ford Pantera L 0
## Ferrari Dino 0
## Maserati Bora 0
## Volvo 142E 1
## Manual or Automatic Transmission
## Mazda RX4 1
## Mazda RX4 Wag 1
## Datsun 710 1
## Hornet 4 Drive 0
## Hornet Sportabout 0
## Valiant 0
## Duster 360 0
## Merc 240D 0
## Merc 230 0
## Merc 280 0
## Merc 280C 0
## Merc 450SE 0
## Merc 450SL 0
## Merc 450SLC 0
## Cadillac Fleetwood 0
## Lincoln Continental 0
## Chrysler Imperial 0
## Fiat 128 1
## Honda Civic 1
## Toyota Corolla 1
## Toyota Corona 0
## Dodge Challenger 0
## AMC Javelin 0
## Camaro Z28 0
## Pontiac Firebird 0
## Fiat X1-9 1
## Porsche 914-2 1
## Lotus Europa 1
## Ford Pantera L 1
## Ferrari Dino 1
## Maserati Bora 1
## Volvo 142E 1
## Number of Forward Gears Number of Carburettors
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
## Duster 360 3 4
## Merc 240D 4 2
## Merc 230 4 2
## Merc 280 4 4
## Merc 280C 4 4
## Merc 450SE 3 3
## Merc 450SL 3 3
## Merc 450SLC 3 3
## Cadillac Fleetwood 3 4
## Lincoln Continental 3 4
## Chrysler Imperial 3 4
## Fiat 128 4 1
## Honda Civic 4 2
## Toyota Corolla 4 1
## Toyota Corona 3 1
## Dodge Challenger 3 2
## AMC Javelin 3 2
## Camaro Z28 3 4
## Pontiac Firebird 3 2
## Fiat X1-9 4 1
## Porsche 914-2 5 2
## Lotus Europa 5 2
## Ford Pantera L 5 4
## Ferrari Dino 5 6
## Maserati Bora 5 8
## Volvo 142E 4 2
Next, we have to substitute some of the quantitative data with qualitative data:
mtcars$`Engine Shape (V or Straight)` <- gsub("0", "V",
mtcars$`Engine Shape (V or Straight)`)
mtcars$`Engine Shape (V or Straight)` <- gsub("1", "Straight",
mtcars$`Engine Shape (V or Straight)`)
mtcars$`Manual or Automatic Transmission` <- gsub("0", "Automatic",
mtcars$`Manual or Automatic Transmission`)
mtcars$`Manual or Automatic Transmission` <- gsub("1", "Manual",
mtcars$`Manual or Automatic Transmission`)
mtcars
## Mileage (miles/gallon) Number of Cylinders
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Datsun 710 22.8 4
## Hornet 4 Drive 21.4 6
## Hornet Sportabout 18.7 8
## Valiant 18.1 6
## Duster 360 14.3 8
## Merc 240D 24.4 4
## Merc 230 22.8 4
## Merc 280 19.2 6
## Merc 280C 17.8 6
## Merc 450SE 16.4 8
## Merc 450SL 17.3 8
## Merc 450SLC 15.2 8
## Cadillac Fleetwood 10.4 8
## Lincoln Continental 10.4 8
## Chrysler Imperial 14.7 8
## Fiat 128 32.4 4
## Honda Civic 30.4 4
## Toyota Corolla 33.9 4
## Toyota Corona 21.5 4
## Dodge Challenger 15.5 8
## AMC Javelin 15.2 8
## Camaro Z28 13.3 8
## Pontiac Firebird 19.2 8
## Fiat X1-9 27.3 4
## Porsche 914-2 26.0 4
## Lotus Europa 30.4 4
## Ford Pantera L 15.8 8
## Ferrari Dino 19.7 6
## Maserati Bora 15.0 8
## Volvo 142E 21.4 4
## Displacement (cubic inches) Gross Horsepower
## Mazda RX4 160.0 110
## Mazda RX4 Wag 160.0 110
## Datsun 710 108.0 93
## Hornet 4 Drive 258.0 110
## Hornet Sportabout 360.0 175
## Valiant 225.0 105
## Duster 360 360.0 245
## Merc 240D 146.7 62
## Merc 230 140.8 95
## Merc 280 167.6 123
## Merc 280C 167.6 123
## Merc 450SE 275.8 180
## Merc 450SL 275.8 180
## Merc 450SLC 275.8 180
## Cadillac Fleetwood 472.0 205
## Lincoln Continental 460.0 215
## Chrysler Imperial 440.0 230
## Fiat 128 78.7 66
## Honda Civic 75.7 52
## Toyota Corolla 71.1 65
## Toyota Corona 120.1 97
## Dodge Challenger 318.0 150
## AMC Javelin 304.0 150
## Camaro Z28 350.0 245
## Pontiac Firebird 400.0 175
## Fiat X1-9 79.0 66
## Porsche 914-2 120.3 91
## Lotus Europa 95.1 113
## Ford Pantera L 351.0 264
## Ferrari Dino 145.0 175
## Maserati Bora 301.0 335
## Volvo 142E 121.0 109
## Rear Axle Ratio Weight (lb/1000)
## Mazda RX4 3.90 2.620
## Mazda RX4 Wag 3.90 2.875
## Datsun 710 3.85 2.320
## Hornet 4 Drive 3.08 3.215
## Hornet Sportabout 3.15 3.440
## Valiant 2.76 3.460
## Duster 360 3.21 3.570
## Merc 240D 3.69 3.190
## Merc 230 3.92 3.150
## Merc 280 3.92 3.440
## Merc 280C 3.92 3.440
## Merc 450SE 3.07 4.070
## Merc 450SL 3.07 3.730
## Merc 450SLC 3.07 3.780
## Cadillac Fleetwood 2.93 5.250
## Lincoln Continental 3.00 5.424
## Chrysler Imperial 3.23 5.345
## Fiat 128 4.08 2.200
## Honda Civic 4.93 1.615
## Toyota Corolla 4.22 1.835
## Toyota Corona 3.70 2.465
## Dodge Challenger 2.76 3.520
## AMC Javelin 3.15 3.435
## Camaro Z28 3.73 3.840
## Pontiac Firebird 3.08 3.845
## Fiat X1-9 4.08 1.935
## Porsche 914-2 4.43 2.140
## Lotus Europa 3.77 1.513
## Ford Pantera L 4.22 3.170
## Ferrari Dino 3.62 2.770
## Maserati Bora 3.54 3.570
## Volvo 142E 4.11 2.780
## Quarter Mile Time (Seconds)
## Mazda RX4 16.46
## Mazda RX4 Wag 17.02
## Datsun 710 18.61
## Hornet 4 Drive 19.44
## Hornet Sportabout 17.02
## Valiant 20.22
## Duster 360 15.84
## Merc 240D 20.00
## Merc 230 22.90
## Merc 280 18.30
## Merc 280C 18.90
## Merc 450SE 17.40
## Merc 450SL 17.60
## Merc 450SLC 18.00
## Cadillac Fleetwood 17.98
## Lincoln Continental 17.82
## Chrysler Imperial 17.42
## Fiat 128 19.47
## Honda Civic 18.52
## Toyota Corolla 19.90
## Toyota Corona 20.01
## Dodge Challenger 16.87
## AMC Javelin 17.30
## Camaro Z28 15.41
## Pontiac Firebird 17.05
## Fiat X1-9 18.90
## Porsche 914-2 16.70
## Lotus Europa 16.90
## Ford Pantera L 14.50
## Ferrari Dino 15.50
## Maserati Bora 14.60
## Volvo 142E 18.60
## Engine Shape (V or Straight)
## Mazda RX4 V
## Mazda RX4 Wag V
## Datsun 710 Straight
## Hornet 4 Drive Straight
## Hornet Sportabout V
## Valiant Straight
## Duster 360 V
## Merc 240D Straight
## Merc 230 Straight
## Merc 280 Straight
## Merc 280C Straight
## Merc 450SE V
## Merc 450SL V
## Merc 450SLC V
## Cadillac Fleetwood V
## Lincoln Continental V
## Chrysler Imperial V
## Fiat 128 Straight
## Honda Civic Straight
## Toyota Corolla Straight
## Toyota Corona Straight
## Dodge Challenger V
## AMC Javelin V
## Camaro Z28 V
## Pontiac Firebird V
## Fiat X1-9 Straight
## Porsche 914-2 V
## Lotus Europa Straight
## Ford Pantera L V
## Ferrari Dino V
## Maserati Bora V
## Volvo 142E Straight
## Manual or Automatic Transmission
## Mazda RX4 Manual
## Mazda RX4 Wag Manual
## Datsun 710 Manual
## Hornet 4 Drive Automatic
## Hornet Sportabout Automatic
## Valiant Automatic
## Duster 360 Automatic
## Merc 240D Automatic
## Merc 230 Automatic
## Merc 280 Automatic
## Merc 280C Automatic
## Merc 450SE Automatic
## Merc 450SL Automatic
## Merc 450SLC Automatic
## Cadillac Fleetwood Automatic
## Lincoln Continental Automatic
## Chrysler Imperial Automatic
## Fiat 128 Manual
## Honda Civic Manual
## Toyota Corolla Manual
## Toyota Corona Automatic
## Dodge Challenger Automatic
## AMC Javelin Automatic
## Camaro Z28 Automatic
## Pontiac Firebird Automatic
## Fiat X1-9 Manual
## Porsche 914-2 Manual
## Lotus Europa Manual
## Ford Pantera L Manual
## Ferrari Dino Manual
## Maserati Bora Manual
## Volvo 142E Manual
## Number of Forward Gears Number of Carburettors
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
## Duster 360 3 4
## Merc 240D 4 2
## Merc 230 4 2
## Merc 280 4 4
## Merc 280C 4 4
## Merc 450SE 3 3
## Merc 450SL 3 3
## Merc 450SLC 3 3
## Cadillac Fleetwood 3 4
## Lincoln Continental 3 4
## Chrysler Imperial 3 4
## Fiat 128 4 1
## Honda Civic 4 2
## Toyota Corolla 4 1
## Toyota Corona 3 1
## Dodge Challenger 3 2
## AMC Javelin 3 2
## Camaro Z28 3 4
## Pontiac Firebird 3 2
## Fiat X1-9 4 1
## Porsche 914-2 5 2
## Lotus Europa 5 2
## Ford Pantera L 5 4
## Ferrari Dino 5 6
## Maserati Bora 5 8
## Volvo 142E 4 2
Next, we obtain data regarding the internal structure of the data:
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ Mileage (miles/gallon) : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ Number of Cylinders : num 6 6 4 6 8 6 8 4 4 6 ...
## $ Displacement (cubic inches) : num 160 160 108 258 360 ...
## $ Gross Horsepower : num 110 110 93 110 175 105 245 62 95 123 ...
## $ Rear Axle Ratio : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ Weight (lb/1000) : num 2.62 2.88 2.32 3.21 3.44 ...
## $ Quarter Mile Time (Seconds) : num 16.5 17 18.6 19.4 17 ...
## $ Engine Shape (V or Straight) : chr "V" "V" "Straight" "Straight" ...
## $ Manual or Automatic Transmission: chr "Manual" "Manual" "Manual" "Automatic" ...
## $ Number of Forward Gears : num 4 4 4 3 3 3 3 4 4 4 ...
## $ Number of Carburettors : num 4 4 1 1 2 1 4 2 2 4 ...
summary(mtcars)
## Mileage (miles/gallon) Number of Cylinders Displacement (cubic inches)
## Min. :10.40 Min. :4.000 Min. : 71.1
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
## Median :19.20 Median :6.000 Median :196.3
## Mean :20.09 Mean :6.188 Mean :230.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
## Max. :33.90 Max. :8.000 Max. :472.0
## Gross Horsepower Rear Axle Ratio Weight (lb/1000)
## Min. : 52.0 Min. :2.760 Min. :1.513
## 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
## Median :123.0 Median :3.695 Median :3.325
## Mean :146.7 Mean :3.597 Mean :3.217
## 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
## Max. :335.0 Max. :4.930 Max. :5.424
## Quarter Mile Time (Seconds) Engine Shape (V or Straight)
## Min. :14.50 Length:32
## 1st Qu.:16.89 Class :character
## Median :17.71 Mode :character
## Mean :17.85
## 3rd Qu.:18.90
## Max. :22.90
## Manual or Automatic Transmission Number of Forward Gears
## Length:32 Min. :3.000
## Class :character 1st Qu.:3.000
## Mode :character Median :4.000
## Mean :3.688
## 3rd Qu.:4.000
## Max. :5.000
## Number of Carburettors
## Min. :1.000
## 1st Qu.:2.000
## Median :2.000
## Mean :2.812
## 3rd Qu.:4.000
## Max. :8.000
Next, we calculate the amount of time required to travel a full mile as follows:
mtcars$`Full Mile Time (Seconds)` <- (mtcars$`Quarter Mile Time (Seconds)`)*4
The result is an entire new column consisting of quantitative data:
mtcars$`Full Mile Time (Seconds)`
## [1] 65.84 68.08 74.44 77.76 68.08 80.88 63.36 80.00 91.60 73.20 75.60
## [12] 69.60 70.40 72.00 71.92 71.28 69.68 77.88 74.08 79.60 80.04 67.48
## [23] 69.20 61.64 68.20 75.60 66.80 67.60 58.00 62.00 58.40 74.40
Next, we take the inverse of the quantity previously calculated to obtain the maximum speed of the vehicle:
mtcars$`Speed (Miles Per Second)` <- 1/(mtcars$`Full Mile Time (Seconds)`)
Here also, we obtain an entire new column consisting of quantitative data:
mtcars$`Speed (Miles Per Second)`
## [1] 0.01518834 0.01468860 0.01343364 0.01286008 0.01468860 0.01236400
## [7] 0.01578283 0.01250000 0.01091703 0.01366120 0.01322751 0.01436782
## [13] 0.01420455 0.01388889 0.01390434 0.01402918 0.01435132 0.01284027
## [19] 0.01349892 0.01256281 0.01249375 0.01481921 0.01445087 0.01622323
## [25] 0.01466276 0.01322751 0.01497006 0.01479290 0.01724138 0.01612903
## [31] 0.01712329 0.01344086
Next, we calculate the number of revolutions per minute as follows:
mtcars$`Revolutions Per Minute` <- (mtcars$`Speed (Miles Per Second)`)*2112*(mtcars$`Rear Axle Ratio`)
Like the previous two data sets, this is also a column consisting of quantitative data:
mtcars$`Revolutions Per Minute`
## [1] 125.10328 120.98707 109.23160 83.65432 97.72033 72.07122 107.00000
## [8] 97.41600 90.38253 113.10164 109.51111 93.15862 92.10000 90.05333
## [15] 86.04227 88.88889 97.90126 110.64407 140.55292 111.96784 97.63118
## [22] 86.38293 96.13873 127.80273 95.38065 113.98095 140.06228 117.78462
## [29] 153.66621 123.31355 128.02192 116.67097
Next, we calculate torque as follows:
mtcars$`Torque` <-
((mtcars$`Gross Horsepower`)*5252)/(mtcars$`Revolutions Per Minute`)
Again, we have another column consisting of quantitative data:
mtcars$`Torque`
## [1] 4617.944 4775.056 4471.563 6906.039 9405.412 7651.598 12025.607
## [8] 3342.613 5520.314 5711.641 5898.908 10147.853 10264.495 10497.779
## [15] 12513.152 12703.275 12338.554 3132.857 1943.069 3048.911 5218.046
## [22] 9119.858 8194.408 10068.173 9636.127 3041.140 3412.282 5038.655
## [29] 9022.986 7453.358 13743.115 4906.688
Next, we take two subsets of the newly updated dataset:
cardata1 <- mtcars[c(15,4)]
cardata1
## Torque Gross Horsepower
## Mazda RX4 4617.944 110
## Mazda RX4 Wag 4775.056 110
## Datsun 710 4471.563 93
## Hornet 4 Drive 6906.039 110
## Hornet Sportabout 9405.412 175
## Valiant 7651.598 105
## Duster 360 12025.607 245
## Merc 240D 3342.613 62
## Merc 230 5520.314 95
## Merc 280 5711.641 123
## Merc 280C 5898.908 123
## Merc 450SE 10147.853 180
## Merc 450SL 10264.495 180
## Merc 450SLC 10497.779 180
## Cadillac Fleetwood 12513.152 205
## Lincoln Continental 12703.275 215
## Chrysler Imperial 12338.554 230
## Fiat 128 3132.857 66
## Honda Civic 1943.069 52
## Toyota Corolla 3048.911 65
## Toyota Corona 5218.046 97
## Dodge Challenger 9119.858 150
## AMC Javelin 8194.408 150
## Camaro Z28 10068.173 245
## Pontiac Firebird 9636.127 175
## Fiat X1-9 3041.140 66
## Porsche 914-2 3412.282 91
## Lotus Europa 5038.655 113
## Ford Pantera L 9022.986 264
## Ferrari Dino 7453.358 175
## Maserati Bora 13743.115 335
## Volvo 142E 4906.688 109
cardata2 <- mtcars[c(14,4)]
cardata2
## Revolutions Per Minute Gross Horsepower
## Mazda RX4 125.10328 110
## Mazda RX4 Wag 120.98707 110
## Datsun 710 109.23160 93
## Hornet 4 Drive 83.65432 110
## Hornet Sportabout 97.72033 175
## Valiant 72.07122 105
## Duster 360 107.00000 245
## Merc 240D 97.41600 62
## Merc 230 90.38253 95
## Merc 280 113.10164 123
## Merc 280C 109.51111 123
## Merc 450SE 93.15862 180
## Merc 450SL 92.10000 180
## Merc 450SLC 90.05333 180
## Cadillac Fleetwood 86.04227 205
## Lincoln Continental 88.88889 215
## Chrysler Imperial 97.90126 230
## Fiat 128 110.64407 66
## Honda Civic 140.55292 52
## Toyota Corolla 111.96784 65
## Toyota Corona 97.63118 97
## Dodge Challenger 86.38293 150
## AMC Javelin 96.13873 150
## Camaro Z28 127.80273 245
## Pontiac Firebird 95.38065 175
## Fiat X1-9 113.98095 66
## Porsche 914-2 140.06228 91
## Lotus Europa 117.78462 113
## Ford Pantera L 153.66621 264
## Ferrari Dino 123.31355 175
## Maserati Bora 128.02192 335
## Volvo 142E 116.67097 109
We can also use data science to determine whether or not torque and gross horsepower are dependent of each other. Let us calculate the probability that a randomly selected model has a torque that is at least 5,000 and has a gross horsepower that is at least 200:
#From this, the probability that a randomly selected model has a minimal torque of 5,000 is 22/32. We determine the answer by dividing the number of observation in this subset by the total number of observations in the original set.
c1 <- cardata1[cardata1$`Torque` >= 5000, c("Torque", "Gross Horsepower")]
str(c1)
## 'data.frame': 22 obs. of 2 variables:
## $ Torque : num 6906 9405 7652 12026 5520 ...
## $ Gross Horsepower: num 110 175 105 245 95 123 123 180 180 180 ...
#From this, the probability that a randomly selected model has a minimal gross horsepower of 200 is 7/32. We determine the answer the same way we determined it perivously.
c2 <- cardata1[cardata1$`Gross Horsepower` >= 200, c("Torque", "Gross Horsepower")]
str(c2)
## 'data.frame': 7 obs. of 2 variables:
## $ Torque : num 12026 12513 12703 12339 10068 ...
## $ Gross Horsepower: num 245 205 215 230 245 264 335
#After multiplying the two probabilites we get an answer of 0.15.
c3 <- cardata1[cardata1$`Torque` >= 5000 & cardata1$`Gross Horsepower` >= 200, c("Torque", "Gross Horsepower")]
str(c3)
## 'data.frame': 7 obs. of 2 variables:
## $ Torque : num 12026 12513 12703 12339 10068 ...
## $ Gross Horsepower: num 245 205 215 230 245 264 335
#From this, the probability that a randomly selected model has both a minimal gross horsepower of 200 and a minimal torque of 5,000 is 7/32 or 0.22. Since this does not exactly equal the probability that we previously computed, we can conclude that torque and gross horsepower are dependent on each other.
Let us do the same analysis that we did for the previous subset:
#P(Revolutions Per Minute >= 110) = 14/32
c4 <- cardata2[cardata2$`Revolutions Per Minute` >= 110, c("Revolutions Per Minute", "Gross Horsepower")]
str(c4)
## 'data.frame': 14 obs. of 2 variables:
## $ Revolutions Per Minute: num 125 121 113 111 141 ...
## $ Gross Horsepower : num 110 110 123 66 52 65 245 66 91 113 ...
#P(Gross Horsepower >= 90) = 27/32
c5 <- cardata2[cardata2$`Gross Horsepower` >= 90, c("Revolutions Per Minute", "Gross Horsepower")]
str(c5)
## 'data.frame': 27 obs. of 2 variables:
## $ Revolutions Per Minute: num 125.1 121 109.2 83.7 97.7 ...
## $ Gross Horsepower : num 110 110 93 110 175 105 245 95 123 123 ...
#Multiplying the two probabilities we get 0.37.
#P(Revolutions Per Minute >= 110 AND Gross Horsepower >= 90) = 10/32 = 0.32
c6 <- cardata2[cardata2$`Gross Horsepower` >= 90 & cardata2$`Revolutions Per Minute` >= 110, c("Revolutions Per Minute", "Gross Horsepower")]
str(c6)
## 'data.frame': 10 obs. of 2 variables:
## $ Revolutions Per Minute: num 125 121 113 128 140 ...
## $ Gross Horsepower : num 110 110 123 245 91 113 264 175 335 109
#In this case also, both quantities are dependent on each other.
We want to find out if horsepower have a linear correlation with any of the calculated quantities. We answer the question by generating scatterplots, lines of best fit, and loess curves of best fit as follows:
require(ggplot2)
ggplot(data = cardata1,
aes(x = cardata1$`Torque`,
y = cardata1$`Gross Horsepower`))+
geom_point(shape = 1)+
geom_smooth(method = lm)+
xlab("Torque")+
ylab("Gross Horsepower")+
ggtitle("Correlation between Gross Horsepower and Torque")
ggplot(data = cardata1,
aes(x = cardata1$`Torque`,
y = cardata1$`Gross Horsepower`))+
geom_point(shape = 1)+
geom_smooth()+
xlab("Torque")+
ylab("Gross Horsepower")+
ggtitle("Correlation between Gross Horsepower and Torque")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
ggplot(data = cardata2,
aes(x = cardata2$`Revolutions Per Minute`,
y = cardata2$`Gross Horsepower`))+
geom_point(shape = 1)+
geom_smooth(method = lm)+
xlab("Revolutions Per Minute")+
ylab("Gross Horsepower")+
ggtitle("Correlation between Gross Horsepower and Revolutions Per Minute")
ggplot(data = cardata2,
aes(x = cardata2$`Revolutions Per Minute`,
y = cardata2$`Gross Horsepower`))+
geom_point(shape = 1)+
geom_smooth()+
xlab("Revolutions Per Minute")+
ylab("Gross Horsepower")+
ggtitle("Correlation between Gross Horsepower and Revolutions Per Minute")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
Then, we calculate the correlation coefficients for both sets of data as follows:
cor(cardata1$`Gross Horsepower`, cardata1$`Torque`)
## [1] 0.9100991
cor(cardata2$`Gross Horsepower`, cardata2$`Revolutions Per Minute`)
## [1] 0.06119933
#Horsepower has an almost-linear correlation with torque but not with the number of revolutions per minute. A line of best fit does exist in the scatter plot. The correlation coefficient with torque is 0.91 which is close to 1 which indicates that the correlation is almost linear. The correlation coefficient with the number of revolutions per minute is 0.06 which is close to 0 which indicates that there is almost no linear correlation with the number of revolutions per minute. It is almost impossible for a line of best fit to exist. However, it is possible for a loess smothed fit curve to exist for both relations.
We generate frequency histograms for Gross Horsepower, Torque, and Revolutins Per Minute as follows:
require(ggplot2)
qplot(cardata1$`Gross Horsepower`, data = cardata1, xlab = 'Gross Horsepower')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(cardata1$`Torque`, data = cardata1, xlab = 'Torque')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(cardata2$`Revolutions Per Minute`, data = cardata2, xlab = 'Revolutions Per Minute')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#None of the above distributions are normal.
Here are the corresponding standard deviations for the above histograms:
sd(cardata1$`Gross Horsepower`)
## [1] 68.56287
sd(cardata1$`Torque`)
## [1] 3383.843
sd(cardata2$`Revolutions Per Minute`)
## [1] 18.78426
#The data values for torque are the furthest spread from the mean because the standard deviation for that data set is equivalent to 3,383.843.