James Anderson Dr. McMullen GR Stats 2/3/24

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gt)
library(wordcloud2)
library(babynames)
babynames |>                             
  filter(year == 1999, sex == "M") |>    
  mutate(rank = row_number()) |>         
  mutate(percent = round(prop * 100, 1)) |> 
  filter(name == "James") |> 
  gt()
year sex name n prop rank percent
1999 M James 18551 0.00910101 19 0.9

In the year 1999, the name ‘James’ was ranked as the 19th most popular baby name in the United States with over 18,500 baby boys holding the name for just less than 1% of the total boys born in 1999.

babynames |>
  filter(year == 1999) |>     # use only one year
  filter(sex == "M") |>       # use only one sex
  slice_max(prop, n=100) |>   # use the top 100 names
  select(name, n) |>          # select the two relevant variables: the name and how often it occurs
  wordcloud2(size = .5)                 # generate the word cloud

As you can see, the name ‘James’ is of some of the most common male baby names from the year 1999. Alongside the name ‘James’ is baby names like Michael (33,912), Jacob (35,361), and Christopher (25,603) to name some of the most popular male baby names of 1999.

babynames |>                                    # start with the data
  filter(name == "James", sex == "M") |>      # choose the name and sex
  mutate(percent = round(prop * 100, 1)) |>     # create a new variable called percent
  ggplot(aes(x = year, y = percent)) +           # put year on the x-axis and prop (proportion) on y
  geom_line(color = "blue") 

According to the data, the name ‘James’ was most common around the 1940’s. I was born in 1999 which, when referring to the graph above, shows that the popularity of my name when I was born was about 75% percent lower than in the 1940’s during the peak of it’s use.

babynames |>                                  # Start with the dataset
  filter(name == "James", sex == "M") |>    # only look at the name and sex you want
  slice_max(prop, n = 5)
# A tibble: 5 × 5
   year sex   name      n   prop
  <dbl> <chr> <chr> <int>  <dbl>
1  1944 M     James 76947 0.0554
2  1943 M     James 80258 0.0552
3  1942 M     James 77173 0.0548
4  1945 M     James 74450 0.0543
5  1941 M     James 66731 0.0532

As stated previously, the name ‘James’ peaked it’s popularity around the 1940’s. The name ‘James’ was most commonly used in the year 1943 with about 80,000 individuals holding the name ‘James’.

babynames |>
  filter(name == "James" | name == "Kevin", sex == "M") |>  
  mutate(percent = round(prop * 100, 1)) |>  
  ggplot(aes(x = year, y = percent, color = name)) +
  geom_line()

In comparison to the name ‘Kevin’, the name ‘James’ peaked much earlier in time (1940’s) compared to the name ‘Kevin’ which peaked in the 1970’s and 80’s. Furthermore, the name ‘James’ achieved a much higher level of peak popularity (about 5%) compared to the name ‘Kevin’ (about 1.5%).

babynames |>
  filter(name == "Mike" | name == "Joe" | name == "Jake") |> 
  filter(sex == "M") |> 
  ggplot(aes(x = year, y = n, color = name)) +
  geom_line()

Of the three chosen celebrity names (Jake, Joe, Mike), the highest peak popularity was the name Mike in the 1960’s. The most evenly spread celebrity name was Joe with a peak popularity over 5,000 annual names over a 40+ year period. And the latest most-popular name was Jake. With big-time celebrities like Michael (Mike) Jordan, Joe Montana, and Jake Gyllenhaal, it is possible these big time names spurred popularity during their commonly used years.