Source file ⇒ lab_4.Rmd
Examine the data you have at hand.
BabyNames, consists of four variables:
Analyze the graphic to figure out what a glyph-ready data table should look like. Mostly, this involves figuring out what variables are represented in teh graph. Write down a small example of a glyph-ready data frame that you think could be used to make something in the form of the graphic
Populartity and year for our axes. Popularity is a new variable transformed from division of the total number of births for a given name for a given year in the numerator and the total births for that year in the denominator.What variable(s) from the raw data table do not appear at all in the graph?
sex does not appear at all in the graph. count does not appear as well, but it is used to generate the Popularity aesthetic.What variable(s) in the graph are similar to corresponding variables in the raw data table, but might have been transformed in some way?
Popularity is similar to count but it is a proportion rather than an integer.Consider how the cases differ between the raw input and the glyph-ready table.
Have cases been filtered out?
Have cases been grouped and summarized within groups in any way?
BabyNames data frame, we created a new data frame by grouping name, count, and year to figure out the count of each name every year, more specifically, the yearly counts of people named Sarah, John, and Daniel. Those names are also summarized to show a glyph-ready table based on names and their collective counts per year.Using English, write down a sequence of steps that will accomplish the wrangling from the raw data table to your hypothesized glyph-ready data table.
First, we want to know how popular some of the names we picked out (I took a small sample from 10 people and Sarah, John, and Daniel were those that were popular).
From looking at the necessary variables, we need to filter out all other cases.
We then want to group all of the cases by name and year so we can handle the counts.
Using paper and pen, translate your design, step by step, into R.
To get our data in glyph-ready form:
popular_names <- BabyNames %>%
filter(name == "Sarah"| name =="John" | name == "Daniel") %>%
group_by(name,year) %>%
summarise(popularity = sum(count,groups= name & year))
popular_names
## Source: local data frame [402 x 3]
## Groups: name [?]
##
## name year popularity
## (chr) (int) (int)
## 1 Daniel 1880 643
## 2 Daniel 1881 527
## 3 Daniel 1882 594
## 4 Daniel 1883 615
## 5 Daniel 1884 573
## 6 Daniel 1885 535
## 7 Daniel 1886 555
## 8 Daniel 1887 557
## 9 Daniel 1888 566
## 10 Daniel 1889 515
## .. ... ... ...
To plot our data:
popular_names %>%
ggplot(aes(x = year, y = popularity, group = name)) +
geom_line(size = 1, alpha = 0.5, aes(color = name)) +
ylab("Popularity") + xlab("Year") +
theme(legend.position = "top")