DS 1870: Module 4 Homework

Data Description

The babynames data set has the number of babies assigned each first name (with at least 10 babies per year). The columns in the data are:

sex: The sex of the child assigned at birth (either ‘F’ or ‘M’)
year: the year (1910 - 2023)
name: The first name assigned
count: The number of babies assigned the name for that year and sex combination
prop: The proportion of babies assigned the name for the year & sex combination

Question 1: Is Taylor more likely to be given to female or male babies?

Create a line graph just for the name ‘Taylor’ with 2 lines: one for female babies and one for male babies. Make sure the line graph displays the proportion, not the count. See Brightspace for how the graph should look.

Around what year was ‘Taylor’ more likely to be given to a female baby than a male baby?

Question 2: Who is the most popular Hailey since 1960?

Question 2a: Creating the data set

Create a data set named haileys that represents female babies given the names Hailey, Hayley, Haley, and Haylee since 1975. Make sure to arrange the rows by year in descending order then by name.

Make sure to display the data frame in the knitted document

If done correctly, the graph below should match what is seen in Brightspace.

Question 2b: Improved line graph

Update the code below by giving data = ... a data frame with 1 rows per name that corresponds to the year when the name was the most popular.

Question 3: Letter Popularity by Year

For this question, you’ll be creating a data frame for how common each letter is

Question 3a: Getting the starting letter

Add a column to babynames called letter that has the first letter of each name AND change ‘F’ to ‘Female’ and ‘M’ to ‘Male’ for sex. Call the resulting data frame babynames2.

To get a subset of a string, use str_sub(string, start, end). For example, if you wanted to keep the 4th - 6th letters of the word ’bananas, it would be str_sub('bananas', 4, 6)

Keep all six columns, but only display the sex, name, and letter columns in the knitted document

Question 3b: How common each letter is per year

Create a data frame that has how common each letter is per year by sex. Name it baby_letters. It should have 4 columns:

year
sex
letter
count: the total number of babies given a name that starts with that letter that year
prop: the proportion of babies given a name that starts with that letter that year

Display the results in the knitted document

If done correctly, the code chunk below should make the graph seen in Brightspace

DS 1870: Module 4 Homework - Part 1

Your Name Here

2025-03-16