Let’s check out the BabyNames data frame:
str(BabyNames)
## 'data.frame': 1792091 obs. of 4 variables:
## $ name : chr "Mary" "Anna" "Emma" "Elizabeth" ...
## $ sex : chr "F" "F" "F" "F" ...
## $ count: int 7065 2604 2003 1939 1746 1578 1472 1414 1320 1288 ...
## $ year : int 1880 1880 1880 1880 1880 1880 1880 1880 1880 1880 ...
We will filter out everyone except the girls who were named either “Mary” or “Leslie”, and then group them by year and (within that) by name:
B2 <- BabyNames %>% filter(name %in% c("Mary","Leslie"), sex == "F") %>%
group_by(name,year)
Let’s see what the first few lines look like:
B2
## Source: local data frame [268 x 4]
## Groups: name, year [268]
##
## name sex count year
## (chr) (chr) (int) (int)
## 1 Mary F 7065 1880
## 2 Leslie F 8 1880
## 3 Mary F 6919 1881
## 4 Leslie F 11 1881
## 5 Mary F 8148 1882
## 6 Leslie F 9 1882
## 7 Mary F 8012 1883
## 8 Leslie F 7 1883
## 9 Mary F 9217 1884
## 10 Leslie F 15 1884
## .. ... ... ... ...
Finally let’s plot the count of babies born each year against the year. We’ll plot for each group: the Marys and the Leslies.
ggplot(data = B2, mapping = aes(x = year, y = count)) +
geom_point(mapping = aes(color = name))
Looks like Mary got very popular, dipped, then got popular again, but gradually declined. Leslie was never as popular, and indeed was essentially unused prior to about 1940.