Baby Names Analysis

Introduction

The babynames dataset contains information about baby names, including their frequency in each year, across different states in the U.S. This dataset allows us to explore the popularity of different baby names over time. In this analysis, we will investigate the most popular baby names, the trend of their popularity over the years, and visualize these trends using interactive plots.

Dataset Overview

The dataset includes the following columns: - name: The baby name. - sex: Gender of the baby (‘M’ for male, ‘F’ for female). - year: The year the name was registered. - n: The number of babies that were given that name in the particular year. - prop: The proportion of babies given that name.

Now, let’s load the necessary libraries and explore the dataset.

# Loading required libraries
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(babynames)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

# View the first few rows of the dataset
head(babynames)

## # A tibble: 6 × 5
##    year sex   name          n   prop
##   <dbl> <chr> <chr>     <int>  <dbl>
## 1  1880 F     Mary       7065 0.0724
## 2  1880 F     Anna       2604 0.0267
## 3  1880 F     Emma       2003 0.0205
## 4  1880 F     Elizabeth  1939 0.0199
## 5  1880 F     Minnie     1746 0.0179
## 6  1880 F     Margaret   1578 0.0162

# Get top 10 most popular names for males and females
top_names <- babynames %>%
  filter(name %in% c("James", "John", "Robert", "Michael", "David", "Mary", "Patricia", "Jennifer", "Linda", "Elizabeth")) %>%
  group_by(name, year, sex) %>%
  summarize(total_count = sum(n)) %>%
  ungroup()

## `summarise()` has grouped output by 'name', 'year'. You can override using the
## `.groups` argument.

# Plotting the trends of these names over time
plot <- ggplot(top_names, aes(x=year, y=total_count, color=name, group=name)) +
  geom_line() +
  facet_wrap(~sex) +
  labs(title = "Popularity of Top 10 Baby Names Over Time",
       x = "Year",
       y = "Number of Babies",
       color = "Name") +
  theme_minimal()

# Convert the plot to an interactive plot using Plotly
interactive_plot <- ggplotly(plot)
interactive_plot

# Filter data for the year 1990
names_1990 <- babynames %>%
  filter(year == 1990) %>%
  group_by(name, sex) %>%
  summarize(total_count = sum(n)) %>%
  arrange(desc(total_count))

## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.

# Show top 10 names in 1990
head(names_1990, 10)

## # A tibble: 10 × 3
## # Groups:   name [10]
##    name        sex   total_count
##    <chr>       <chr>       <int>
##  1 Michael     M           65282
##  2 Christopher M           52332
##  3 Jessica     F           46475
##  4 Ashley      F           45558
##  5 Matthew     M           44800
##  6 Joshua      M           43216
##  7 Brittany    F           36538
##  8 Amanda      F           34408
##  9 Daniel      M           33815
## 10 David       M           33742

Baby Names Analysis

Hanifah Burch

Introduction

Dataset Overview