library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages --------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 2.0.1 v dplyr 0.7.8
## v tidyr 0.8.2 v stringr 1.3.1
## v readr 1.3.1 v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
#Interpreting Plots
#Categorical versus Categorical
#In the MPG dataset we have manufacturer and transmission type, both of which are categorical variables.
#The first way to compare the relationship between these is with a contingency table.
data = read.csv("ContingencyTable.csv")
head(data)
## Manufacturer Automatic Manual TOTAL
## 1 Audi 11 7 18
## 2 Chevrolet 16 3 19
## 3 Dodge 30 7 37
## 4 Ford 17 8 25
## 5 Honda 4 5 9
## 6 Hyundai 7 7 14
#This consists of looking at the combination of both variables and counting how many observations.
#To simplify the dataset, we can see only automatics and manuals, not all the different types. We see that Audi has 11 cars that are automatic and 7 that are manuals.
#What is the relationship between manufacturer and transmission?
#Maybe one manufacturer produces more manuals than the others.
#To investigate that question, we need a conditional table.
#The first step is to calculate marginals,which is a technical way of saying row sums.
#Imagine totalling up all the cars in each row of the table and adding this as an extra column at the end,like this table.
#So we see that Audi has 18 cars, of which 11 are automatics and seven are manuals.
#The conditional table is then... so we can see that 61% of Audi's cars are automatic, and 39% are manuals.
#We calculate this by dividing the number of Audi's cars that were automatic, 11, by the total number of Audi cars.
#In other words, we divided each entry in the contingency table by the marginals.