This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
BabyNames DataLet’s take a look at the first few rows of the data frame BabyNames from the package DataComputing.
head(BabyNames, n = 10)
You can get more information on BabyNames using the help() function, like this:
help("BabyNames")
In R Studio a help-file will show up in the Help tab. (Note that when you knit this document, the eval = F code-chunk option prevents the above code from being executed, even though it is appears in the knitted version.)
In R Studio you can get a “spreadsheet-look” at BabyNames if you use the View() function:
View(BabyNames)
Our aim is to use some data-wrangling and data-visualization to learn about the popularity, over time, of various names for babies in the United States.
How popular is the name Mary (for girls) over time? To answer this, we’ll first wrangle the data just a bit: we’ll select only rows where the name is “Mary” and the sex is “F” for female:
BabyNames %>%
filter(name == "Mary" & sex == "F") %>%
head(n = 10)
Now we’ll make a line-graph of the counts over time:
BabyNames %>%
filter(name == "Mary" & sex == "F") %>%
ggplot(aes(x = year, y = count)) +
geom_line() + labs(x = "Year", y = "Number Born",
title = "Mary as a Girl-Name")
This is a caption, used to provide the reader with more information about the figure. You can determine the caption-text by using the fig.cap option in the code chunk.
It was important to restrict to females, because it happens that males can be named Mary, too! The plot below demonstrates this.
BabyNames %>%
filter(name == "Mary" & sex == "M") %>%
ggplot(aes(x = year, y = count)) +
geom_line() + labs(x = "Year", y = "Number Born",
title = "Mary as a Boy-Name")
Mary actually got some traction as a boy-name, for a while.
In the code chunk below, insert the code you need to see how your name has done over the years. Remember to select your sex!
BabyNames %>%
filter(name == "Homer" & sex == "M") %>%
ggplot(aes(x = year, y = count)) +
geom_line() + labs(x = "Year", y = "Number Born",
title = "Homer as a Boy-Name")
The name Homer is dying out.
Let’s look at the name Leslie, which is often found in either sex.
BabyNames %>%
filter(name == "Leslie") %>%
ggplot(aes(x = year, y = count)) +
geom_line(aes(color = sex)) + labs(x = "Year", y = "Number Born",
title = "Leslie, by Sex")
It seems that once Leslie became popular as a girl’s name, it became quite rare as a name for boys!
Think of another name that is not restricted to one sex, and study the popularity of the name was we did for the Leslie.
For both sexes, Bobbie peaked around 1930.
The process of data analysis can be pictured as follows:
The flow of data analysis.
The steps are:
CSC 303 touches on all six phases of the data analysis process. Your work today focused on:
BabyNames data table, selecting names that are of special interest