The babynames dataset contains information about baby
names, including their frequency in each year, across different states
in the U.S. This dataset allows us to explore the popularity of
different baby names over time. In this analysis, we will investigate
the most popular baby names, the trend of their popularity over the
years, and visualize these trends using interactive plots.
The dataset includes the following columns: - name: The
baby name. - sex: Gender of the baby (‘M’ for male, ‘F’ for
female). - year: The year the name was registered. -
n: The number of babies that were given that name in the
particular year. - prop: The proportion of babies given
that name.
Now, let’s load the necessary libraries and explore the dataset.
# Loading required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(babynames)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
# View the first few rows of the dataset
head(babynames)
## # A tibble: 6 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Mary 7065 0.0724
## 2 1880 F Anna 2604 0.0267
## 3 1880 F Emma 2003 0.0205
## 4 1880 F Elizabeth 1939 0.0199
## 5 1880 F Minnie 1746 0.0179
## 6 1880 F Margaret 1578 0.0162
# Get top 10 most popular names for males and females
top_names <- babynames %>%
filter(name %in% c("James", "John", "Robert", "Michael", "David", "Mary", "Patricia", "Jennifer", "Linda", "Elizabeth")) %>%
group_by(name, year, sex) %>%
summarize(total_count = sum(n)) %>%
ungroup()
## `summarise()` has grouped output by 'name', 'year'. You can override using the
## `.groups` argument.
# Plotting the trends of these names over time
plot <- ggplot(top_names, aes(x=year, y=total_count, color=name, group=name)) +
geom_line() +
facet_wrap(~sex) +
labs(title = "Popularity of Top 10 Baby Names Over Time",
x = "Year",
y = "Number of Babies",
color = "Name") +
theme_minimal()
# Convert the plot to an interactive plot using Plotly
interactive_plot <- ggplotly(plot)
interactive_plot
# Filter data for the year 1990
names_1990 <- babynames %>%
filter(year == 1990) %>%
group_by(name, sex) %>%
summarize(total_count = sum(n)) %>%
arrange(desc(total_count))
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
# Show top 10 names in 1990
head(names_1990, 10)
## # A tibble: 10 × 3
## # Groups: name [10]
## name sex total_count
## <chr> <chr> <int>
## 1 Michael M 65282
## 2 Christopher M 52332
## 3 Jessica F 46475
## 4 Ashley F 45558
## 5 Matthew M 44800
## 6 Joshua M 43216
## 7 Brittany F 36538
## 8 Amanda F 34408
## 9 Daniel M 33815
## 10 David M 33742