smstatenames

Sydney over time in the state of North Carolina

This project aims to explore the frequency with which parents name their little girls, Sydney, in North Carolina over time.

Our first step is to load the required packages.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(babynames)
library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Now let’s load our external data, a csv. taken from here:

StateNames <- read_csv("StateNames.csv")

Rows: 5647426 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Gender, State
dbl (3): Id, Year, Count

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now we merge the two data sets together.

StateNames |> 
  filter(Name == "Sydney") |> 
  filter(Gender == "F")

# A tibble: 1,900 × 6
      Id Name    Year Gender State Count
   <dbl> <chr>  <dbl> <chr>  <chr> <dbl>
 1  7781 Sydney  1986 F      AK        6
 2  8363 Sydney  1989 F      AK       10
 3  8641 Sydney  1990 F      AK        7
 4  8857 Sydney  1991 F      AK        7
 5  9022 Sydney  1992 F      AK       12
 6  9252 Sydney  1993 F      AK       10
 7  9422 Sydney  1994 F      AK       14
 8  9600 Sydney  1995 F      AK       18
 9  9789 Sydney  1996 F      AK       19
10  9960 Sydney  1997 F      AK       24
# ℹ 1,890 more rows

Let’s visualize the frequency parents name their baby girls, Sydney, in North Carolina over time:

StateNames |> 
  filter(Name == "Sydney") |> 
  filter(Gender == "F") |> 
  ggplot(aes(Year, Count, color = State)) + geom_line() -> plot1

ggplotly(plot1)

StateNames %>%
  group_by(Year, Gender) %>%  # Group by year and gender
  mutate(Proportion = Count / sum(Count)) %>%  # Calculate proportion
  ungroup() -> StateNamesProp

StateNamesProp |> 
  filter(Name == "Sydney") |> 
  filter(State %in% c('NC')) |> 
  filter(Gender == "F") |> 
  ggplot(aes(Year, Proportion, color = State)) + geom_line() -> plot3

ggplotly(plot3)