BabyNames_TG

This is a Quarto document. It allows you to run your R analyses and then generate and publish them. You can type text in this white space. The slightly darker space below is called a chunk and is for your R code. You can switch back and forth between Source and Visual on the upper left. I personally prefer Source, but you may prefer Visual.

Often one of the first chunks you’ll see in a notebook has the packages you’ll be using. Using packages in R is a two-step process. First, you’ll need to download (what R calls ‘install’) the package. Do that in the Packages pane on the right: Click Install and download the packages you want to use. Next, you use the library() function to load the packages.

(Note: when you have library commands in your code that use packages that you have not yet installed, you may get a message at the top of your window asking you if you want to install them. Go ahead and do that if you want.)

In the dark area below the line library(wordcloud2), type the following line: library(babynames)

Then press the little green arrow in the top right of the chunk to run everything.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(gt)
library(wordcloud2)
library(babynames)

babynames |>                             
  filter(year == 2001, sex == "F") |>    
  mutate(rank = row_number()) |>         
  mutate(percent = round(prop * 100, 1)) |> 
  filter(name == "Taylor") |> 
  gt()

year	sex	name	n	prop	rank	percent
2001	F	Taylor	13690	0.006914	12	0.7

This shows that Taylor was the 12th most popular girl name in 2001, with about 14,000 other babies named Taylor for less than 1% of baby girls that year.

babynames |>
  filter(year == 2001) |>     # use only one year
  filter(sex == "F") |>       # use only one sex
  slice_max(prop, n=100) |>   # use the top 100 names
  select(name, n) |>          # select the two relevant variables: the name and how often it occurs
  wordcloud2(size = .5)       # generate the word cloud

Above is a word cloud that has all of the popular female names of when I was born in 2001.

babynames |>                                    # start with the data
  filter(name == "Taylor", sex =="F") |>        # choose the name and sex
  ggplot(aes(x = year, y = prop)) +              # put year on the x-axis and prop (proportion) on y
  geom_line()                                    # make it a line graph

We are able to look at the graph and see that the name Taylor peaked in 1993 and has since started to slowly move down. I would say that it is still a pretty common name but not what it used to be. I was born in 2001, and my name had slowly started to decrease in popularity when I was born. The name has started to go down since the peak in 1993.

babynames |>                                  # Start with the dataset
  filter(name == "Taylor", sex =="F") |>  
  slice_max(prop, n = 10)

# A tibble: 10 × 5
    year sex   name       n    prop
   <dbl> <chr> <chr>  <int>   <dbl>
 1  1993 F     Taylor 21266 0.0108 
 2  1994 F     Taylor 20733 0.0106 
 3  1995 F     Taylor 20425 0.0106 
 4  1997 F     Taylor 19503 0.0102 
 5  1996 F     Taylor 19151 0.00999
 6  1998 F     Taylor 18572 0.00958
 7  1999 F     Taylor 16905 0.00869
 8  2000 F     Taylor 15078 0.00756
 9  1992 F     Taylor 14949 0.00746
10  2001 F     Taylor 13690 0.00691

The peak “Taylor” year was 1993, followed by years mostly right around that same time.

Let’s look at my name and my sisters’ names in one graph. In filter(), I put both of their names separated by the vertical line |, which is a symbol for OR. Then I set color = name, so the two names will have different color lines.

babynames |>
  filter(name == "Hailee" | name == "Kennedy" | name == "Taylor") |> 
  filter(sex == "F") |> 
  ggplot(aes(x = year, y = n, color = name)) +
  geom_line()

There was not a peak around the time that I was born and they were born (2001 and 2003). Looking at this we are able to see that Taylor became popular before the 2000s around 1990s. it hit the peak in 1993 and since then has slowly started to decrease. Looking at my sister’s name Kennedy (2003) we see that her name did not peak when she was born and since then the name has slowly started to become more popular. Looking at Hailee’s(2001) name we see that her name is not popular due to the way that it is spelled.

The Disney movie Sleeping Beauty came out in 1959 with Aurora its star character, Anastasia with Princess Anastasia came out in 1997, and The Princess and the Frog with Tiana came out in 2009.

babynames |>
  filter(name == "Aurora" | name == "Anastasia" | name == "Tiana") |> 
  filter(sex == "F") |> 
  ggplot(aes(x = year, y = n, color = name)) +
  geom_line()

It looks like Aurora peaked much later around 2009, even though the movie came out in 1959. This shows that it became a popular name when kids were able to watch it with VHS, and after Anastasia came out the name started to gain popularity but didn’t peak till after the 2000s. Tiana became popular before the movie came out and even went back up in popularity when the movie came out in 2009 but has started to go down.