Data Description

The babynames data set has the number of babies assigned each first name (with at least 10 babies per year). The columns in the data are:

  1. sex: The sex of the child assigned at birth (either ‘F’ or ‘M’)
  2. year: the year (1910 - 2023)
  3. name: The first name assigned
  4. count: The number of babies assigned the name for that year and sex combination
  5. prop: The proportion of babies assigned the name for the year & sex combination

Question 1: Is Taylor more likely to be given to female or male babies?

Create a line graph just for the name ‘Taylor’ with 2 lines: one for female babies and one for male babies. Make sure the line graph displays the proportion, not the count. See Brightspace for how the graph should look.

Around what year was ‘Taylor’ more likely to be given to a female baby than a male baby?

Question 3: Letter Popularity by Year

For this question, you’ll be creating a data frame for how common each letter is

Question 3a: Getting the starting letter

Add a column to babynames called letter that has the first letter of each name AND change ‘F’ to ‘Female’ and ‘M’ to ‘Male’ for sex. Call the resulting data frame babynames2.

To get a subset of a string, use str_sub(string, start, end). For example, if you wanted to keep the 4th - 6th letters of the word ’bananas, it would be str_sub('bananas', 4, 6)

Keep all six columns, but only display the sex, name, and letter columns in the knitted document

Question 3b: How common each letter is per year

Create a data frame that has how common each letter is per year by sex. Name it baby_letters. It should have 4 columns:

  1. year
  2. sex
  3. letter
  4. count: the total number of babies given a name that starts with that letter that year
  5. prop: the proportion of babies given a name that starts with that letter that year

Display the results in the knitted document

If done correctly, the code chunk below should make the graph seen in Brightspace