library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages --------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  2.0.1     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.3.1     v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
#Interpreting Plots
#Categorical versus Categorical
#In the MPG dataset we have manufacturer and transmission type, both of which are categorical variables.
#The first way to compare the relationship between these is with a contingency table.
data = read.csv("ContingencyTable.csv")
head(data)
##   Manufacturer Automatic Manual TOTAL
## 1         Audi        11      7    18
## 2    Chevrolet        16      3    19
## 3        Dodge        30      7    37
## 4         Ford        17      8    25
## 5        Honda         4      5     9
## 6      Hyundai         7      7    14
#This consists of looking at the combination of both variables and counting how many observations.

#To simplify the dataset, we can see only automatics and manuals, not all the different types. We see that Audi has 11 cars that are automatic and 7 that are manuals.
#What is the relationship between manufacturer and transmission?
#Maybe one manufacturer produces more manuals than the others.
#To investigate that question, we need a conditional table.
#The first step is to calculate marginals,which is a technical way of saying row sums.
#Imagine totalling up all the cars in each row of the table and adding this as an extra column at the end,like this table.
#So we see that Audi has 18 cars, of which 11 are automatics and seven are manuals.

#The conditional table is then... so we can see that 61% of Audi's cars are automatic, and 39% are manuals. 
#We calculate this by dividing the number of Audi's cars that were automatic, 11, by the total number of Audi cars.
#In other words, we divided each entry in the contingency table by the marginals.