We’re interested in how one variable differs across categories of another.
Great for comparing a quantitative variable across categories.
Which class has the highest median highway MPG?
Which has the widest IQR?
Are there outliers?
Compare two categorical variables.
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
|-------------------------|
Total Observations in Table: 234
| suv_df$is_suv
suv_df$drv | Not SUV | SUV | Row Total |
-------------|-----------|-----------|-----------|
4 | 52 | 51 | 103 |
| 7.425 | 20.598 | |
| 0.505 | 0.495 | 0.440 |
-------------|-----------|-----------|-----------|
f | 106 | 0 | 106 |
| 10.124 | 28.085 | |
| 1.000 | 0.000 | 0.453 |
-------------|-----------|-----------|-----------|
r | 14 | 11 | 25 |
| 1.042 | 2.891 | |
| 0.560 | 0.440 | 0.107 |
-------------|-----------|-----------|-----------|
Column Total | 172 | 62 | 234 |
-------------|-----------|-----------|-----------|
Among front-wheel drive vehicles, what % are SUVs?
Among rear-wheel drive vehicles?
Is SUV-ness associated with drive type?
Compare the mean of a quantitative DV across categories of a categorical IV.
# A tibble: 7 × 2
class mean_hwy
<chr> <dbl>
1 compact 28.3
2 subcompact 28.1
3 midsize 27.3
4 2seater 24.8
5 minivan 22.4
6 suv 18.1
7 pickup 16.9
Which class has the highest average MPG?
Is the difference between compact and midsize vehicles large?
Any surprises?
To describe relationships: Start with the category of the IV that is “more of something.”
This applies across boxplots, crosstabs, and means.
Use boxplots to visualize distribution across groups.
Use crosstabs for two categorical variables.
Use means to summarize quantitative outcomes by group.
Rule of direction helps make clear comparisons.
GVPT201 Scope and Methods for Political Science Research