Scatterplots by Group Using GGplot2

What if you're not just interested in the relationship between two quantitative variables, but whether the strength or direction of that relationship is different across different groups. This kind of variation can be visually assessed by graphing one regression line for each group you're interested in, and comparing them on the same graph. If the lines and their confidence intervals are very different, then we say that the group variable “conditions” or “interacts” with the independent variable. In other words, the relationship between X and Y is different depending on the group. If the regression lines for different groups are about the same, then we say that the group variable does not condition the effect of X on Y.

Set your working directory to whichever folder is holding the file GSS.csv, and then load the data file into R as a dataframe named “x”.

setwd("~/Dropbox/Data General/GSS")  #set your working directory, this is mine
x <- read.csv("GSS.csv")
library(ggplot2)
scatter <- ggplot(x, aes(sei, tvhours, colour = sex))  #This just makes the plot object, which we'll add to next
scatter + geom_smooth()  # geom_smooth() adds one loess curve for each sex
## geom_smooth: method="auto" and size of largest group is >=1000, so using
## gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the
## smoothing method.
## Warning: Removed 20303 rows containing missing values (stat_smooth).
## Warning: Removed 15822 rows containing missing values (stat_smooth).

plot of chunk unnamed-chunk-2

scatter + geom_smooth(method = "lm", alpha = 0.05)
## Warning: Removed 20303 rows containing missing values (stat_smooth).
## Warning: Removed 15822 rows containing missing values (stat_smooth).

plot of chunk unnamed-chunk-2

We see slight differences across gender groups in the relationship between socioeconomic status and hours of tv watched per day. But the lines are more or less the same, and the confidence intervals are overlapping–suggesting that the sei-tv correlation is not significantly different between the males and females. People with more socioeconomic power tend to watch less television, regardless of gender.