Lab 4

Popular Names Project

Step 1:

Examine the data you have at hand.

The data frame we are provided with, BabyNames, consists of four variables:
- name
- sex
- count
- year
We can conclude that each case represents the number of babies with a given name in a given year.

Step 2:

Analyze the graphic to figure out what a glyph-ready data table should look like. Mostly, this involves figuring out what variables are represented in teh graph. Write down a small example of a glyph-ready data frame that you think could be used to make something in the form of the graphic

A glyph-ready data table would consist of all the variables mapped to aesthetics or attributes that will be visible on the graph. In this project, we will use Populartity and year for our axes. Popularity is a new variable transformed from division of the total number of births for a given name for a given year in the numerator and the total births for that year in the denominator.

What variable(s) from the raw data table do not appear at all in the graph?

sex does not appear at all in the graph. count does not appear as well, but it is used to generate the Popularity aesthetic.

What variable(s) in the graph are similar to corresponding variables in the raw data table, but might have been transformed in some way?

Popularity is similar to count but it is a proportion rather than an integer.

Step 3:

Consider how the cases differ between the raw input and the glyph-ready table.

Have cases been filtered out?

All other names besides the three names that we are looking at.

Have cases been grouped and summarized within groups in any way?

From the names in the BabyNames data frame, we created a new data frame by grouping name, count, and year to figure out the count of each name every year, more specifically, the yearly counts of people named Sarah, John, and Daniel. Those names are also summarized to show a glyph-ready table based on names and their collective counts per year.

Step 4:

Using English, write down a sequence of steps that will accomplish the wrangling from the raw data table to your hypothesized glyph-ready data table.

First, we want to know how popular some of the names we picked out (I took a small sample from 10 people and Sarah, John, and Daniel were those that were popular).
From looking at the necessary variables, we need to filter out all other cases.
We then want to group all of the cases by name and year so we can handle the counts.

Step 5:

Using paper and pen, translate your design, step by step, into R.

To get our data in glyph-ready form:

popular_names <- BabyNames %>%
  filter(name == "Sarah"| name =="John" | name == "Daniel") %>%
  group_by(name,year) %>%
  summarise(popularity = sum(count,groups= name & year))
popular_names

## Source: local data frame [402 x 3]
## Groups: name [?]
## 
##      name  year popularity
##     (chr) (int)      (int)
## 1  Daniel  1880        643
## 2  Daniel  1881        527
## 3  Daniel  1882        594
## 4  Daniel  1883        615
## 5  Daniel  1884        573
## 6  Daniel  1885        535
## 7  Daniel  1886        555
## 8  Daniel  1887        557
## 9  Daniel  1888        566
## 10 Daniel  1889        515
## ..    ...   ...        ...

To plot our data:

popular_names %>%
ggplot(aes(x = year, y = popularity, group = name)) +
  geom_line(size = 1, alpha = 0.5, aes(color = name)) +
  ylab("Popularity") + xlab("Year") + 
  theme(legend.position = "top")