setwd("C:/Users/Owner/Desktop/MachineLearningR_sampleData")
used_cars <- read.csv("usedcars.csv", stringsAsFactors = TRUE)
head(used_cars)
## year model price mileage color transmission
## 1 2011 SEL 21992 7413 Yellow AUTO
## 2 2011 SEL 20995 10926 Gray AUTO
## 3 2011 SEL 19995 7351 Silver AUTO
## 4 2011 SEL 17809 11613 Gray AUTO
## 5 2012 SE 17500 8367 White AUTO
## 6 2010 SEL 17495 25125 Silver AUTO
summary(used_cars)
## year model price mileage color
## Min. :2000 SE :78 Min. : 3800 Min. : 4867 Black :35
## 1st Qu.:2008 SEL:23 1st Qu.:10995 1st Qu.: 27200 Silver :32
## Median :2009 SES:49 Median :13592 Median : 36385 Red :25
## Mean :2009 Mean :12962 Mean : 44261 Blue :17
## 3rd Qu.:2010 3rd Qu.:14904 3rd Qu.: 55125 Gray :16
## Max. :2012 Max. :21992 Max. :151479 White :16
## (Other): 9
## transmission
## AUTO :128
## MANUAL: 22
##
##
##
##
##
table(used_cars$color)
##
## Black Blue Gold Gray Green Red Silver White Yellow
## 35 17 1 16 5 25 32 16 3
There are 9 different colors in the dataset. We are interested in weather or not the car color is conservative conservative colors (conColor): Black, Gray, Silver and White not conservative colors (notConColor): Blue, Gold, Green, Red, and Yellow
used_cars$conColor <- used_cars$color %in% c("Black", "Gray", "Silver", "White")
table(used_cars$conColor)
##
## FALSE TRUE
## 51 99
## Determine whether or not used car color is in the set of Black, Gray, Silver, and White
this is give us something like
FALSE (51) TRUE (99)
library(gmodels)
## Warning: package 'gmodels' was built under R version 3.4.2
CrossTable(x = used_cars$model, y = used_cars$conColor)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 150
##
##
## | used_cars$conColor
## used_cars$model | FALSE | TRUE | Row Total |
## ----------------|-----------|-----------|-----------|
## SE | 27 | 51 | 78 |
## | 0.009 | 0.004 | |
## | 0.346 | 0.654 | 0.520 |
## | 0.529 | 0.515 | |
## | 0.180 | 0.340 | |
## ----------------|-----------|-----------|-----------|
## SEL | 7 | 16 | 23 |
## | 0.086 | 0.044 | |
## | 0.304 | 0.696 | 0.153 |
## | 0.137 | 0.162 | |
## | 0.047 | 0.107 | |
## ----------------|-----------|-----------|-----------|
## SES | 17 | 32 | 49 |
## | 0.007 | 0.004 | |
## | 0.347 | 0.653 | 0.327 |
## | 0.333 | 0.323 | |
## | 0.113 | 0.213 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 51 | 99 | 150 |
## | 0.340 | 0.660 | |
## ----------------|-----------|-----------|-----------|
##
##
There is no difference in the types of colors chosen by the model of the car.
Machine Learning with R (2nd edition) by Brett Lantz