Get Data

setwd("C:/Users/Owner/Desktop/MachineLearningR_sampleData")
used_cars <- read.csv("usedcars.csv", stringsAsFactors = TRUE)
head(used_cars)
##   year model price mileage  color transmission
## 1 2011   SEL 21992    7413 Yellow         AUTO
## 2 2011   SEL 20995   10926   Gray         AUTO
## 3 2011   SEL 19995    7351 Silver         AUTO
## 4 2011   SEL 17809   11613   Gray         AUTO
## 5 2012    SE 17500    8367  White         AUTO
## 6 2010   SEL 17495   25125 Silver         AUTO
summary(used_cars)
##       year      model        price          mileage           color   
##  Min.   :2000   SE :78   Min.   : 3800   Min.   :  4867   Black  :35  
##  1st Qu.:2008   SEL:23   1st Qu.:10995   1st Qu.: 27200   Silver :32  
##  Median :2009   SES:49   Median :13592   Median : 36385   Red    :25  
##  Mean   :2009            Mean   :12962   Mean   : 44261   Blue   :17  
##  3rd Qu.:2010            3rd Qu.:14904   3rd Qu.: 55125   Gray   :16  
##  Max.   :2012            Max.   :21992   Max.   :151479   White  :16  
##                                                           (Other): 9  
##  transmission
##  AUTO  :128  
##  MANUAL: 22  
##              
##              
##              
##              
## 
table(used_cars$color)
## 
##  Black   Blue   Gold   Gray  Green    Red Silver  White Yellow 
##     35     17      1     16      5     25     32     16      3

Eamine the relationship between 2 nominal (car model and car color) using 2-way cross-tabulation (crosstab)

There are 9 different colors in the dataset. We are interested in weather or not the car color is conservative conservative colors (conColor): Black, Gray, Silver and White not conservative colors (notConColor): Blue, Gold, Green, Red, and Yellow

used_cars$conColor <-  used_cars$color %in% c("Black", "Gray", "Silver", "White")
table(used_cars$conColor)
## 
## FALSE  TRUE 
##    51    99
## Determine whether or not used car color is in the set of Black, Gray, Silver, and White

this is give us something like

FALSE (51) TRUE (99)

Let’s determine if the proportion of conservatively colored cars varies by the model using crosstab method

library(gmodels)
## Warning: package 'gmodels' was built under R version 3.4.2
CrossTable(x = used_cars$model, y = used_cars$conColor)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  150 
## 
##  
##                 | used_cars$conColor 
## used_cars$model |     FALSE |      TRUE | Row Total | 
## ----------------|-----------|-----------|-----------|
##              SE |        27 |        51 |        78 | 
##                 |     0.009 |     0.004 |           | 
##                 |     0.346 |     0.654 |     0.520 | 
##                 |     0.529 |     0.515 |           | 
##                 |     0.180 |     0.340 |           | 
## ----------------|-----------|-----------|-----------|
##             SEL |         7 |        16 |        23 | 
##                 |     0.086 |     0.044 |           | 
##                 |     0.304 |     0.696 |     0.153 | 
##                 |     0.137 |     0.162 |           | 
##                 |     0.047 |     0.107 |           | 
## ----------------|-----------|-----------|-----------|
##             SES |        17 |        32 |        49 | 
##                 |     0.007 |     0.004 |           | 
##                 |     0.347 |     0.653 |     0.327 | 
##                 |     0.333 |     0.323 |           | 
##                 |     0.113 |     0.213 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |        51 |        99 |       150 | 
##                 |     0.340 |     0.660 |           | 
## ----------------|-----------|-----------|-----------|
## 
## 
  • 65% of the SE cars are coloerd conservatively
  • 70% of the SEL cars are colored conservatively
  • 65% of the SES cars are colored conservatively

There is no difference in the types of colors chosen by the model of the car.

References:

Machine Learning with R (2nd edition) by Brett Lantz