This project aims to explore the frequency with which parents name their little girls, Sydney, in North Carolina over time.
Our first step is to load the required packages.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(babynames)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Now let’s load our external data, a csv. taken from here:
StateNames <-read_csv("StateNames.csv")
Rows: 5647426 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Gender, State
dbl (3): Id, Year, Count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1,900 × 6
Id Name Year Gender State Count
<dbl> <chr> <dbl> <chr> <chr> <dbl>
1 7781 Sydney 1986 F AK 6
2 8363 Sydney 1989 F AK 10
3 8641 Sydney 1990 F AK 7
4 8857 Sydney 1991 F AK 7
5 9022 Sydney 1992 F AK 12
6 9252 Sydney 1993 F AK 10
7 9422 Sydney 1994 F AK 14
8 9600 Sydney 1995 F AK 18
9 9789 Sydney 1996 F AK 19
10 9960 Sydney 1997 F AK 24
# ℹ 1,890 more rows
Let’s visualize the frequency parents name their baby girls, Sydney, in North Carolina over time: