This project aims to discover the baby name trends over time.
# Count the number of unique names
unique_names_count <- length(unique(baby$Name))
unique_names_count
## [1] 51124
# Top 10 most popular baby names
name_counts <- table(baby$Name)
sorted_names <- names(sort(name_counts, decreasing = TRUE))
top_10_names <- head(sorted_names, 10)
print(top_10_names)
## [1] "Alva" "Charles" "Connie" "Frances" "Francis" "James" "Jean"
## [8] "Jesse" "Jessie" "John"
There are 51 124 unique names and top 10 most popular babies’ names are: Alva, Charles, Connie, Frances, Francis, James, Jean, Jesse, Jessie, John.
# Top 5 most popular names for baby girls
girl_names <- baby$Name[baby$Sex == "F"]
girl_name_counts <- table(girl_names)
sorted_girl_names <- names(sort(girl_name_counts, decreasing = TRUE))
top_5_girl_names <- head(sorted_girl_names, 5)
print("Top 5 Girl Names:")
## [1] "Top 5 Girl Names:"
print(top_5_girl_names)
## [1] "Abbie" "Abigail" "Ada" "Addie" "Adela"
Top 5 names for baby girls in the US are: Abbie, Abigail, Ada, Addie, Adela.
# Top 5 most popular names for baby boys
boy_names <- baby$Name[baby$Sex == "M"]
boy_name_counts <- table(boy_names)
sorted_boy_names <- names(sort(boy_name_counts, decreasing = TRUE))
top_5_boy_names <- head(sorted_boy_names, 5)
print("Top 5 Boy Names:")
## [1] "Top 5 Boy Names:"
print(top_5_boy_names)
## [1] "Aaron" "Abe" "Abel" "Abner" "Abraham"
Top 5 names for baby boys in the US are: Aaron, Abe, Abel, Abner, Abraham.
# Pie chart of the distribution of gender
pie(table(baby$Sex), main = "Distribution of Gender")
# Trend of baby names each year
unique_names_by_year <- sapply(unique(baby$YearOfBirth), function(year) {
length(unique(baby$Name[baby$YearOfBirth == year]))
})
plot(unique(baby$YearOfBirth), unique_names_by_year, type = "o", col = "steelblue", pch = 16,
main = "Number of Unique Names Each Year",
xlab = "Year of Birth", ylab = "Count of Unique Names")
The dataset includes more baby girls than boys.
The number of baby names steadily increased from 1880 to 1900 before surging in the years 1910 to 1920.
It can be seen that there was a reduction in the number of baby names established in the US during the time between 1920 and 1940. It then returned to the previous trend (increase) until 1980.
# Filter the data for female and male babies
female_male_baby_data <- baby %>%
group_by(YearOfBirth, Sex) %>%
summarise(TotalBabies = sum(Number))
# Total births per year with different colors for female and male
ggplot(data = female_male_baby_data, aes(x = YearOfBirth, y = TotalBabies, color = Sex)) +
geom_line() +
geom_point(size = 2) +
scale_color_manual(values = c("F" = "deeppink2", "M" = "deepskyblue1")) +
labs(x = "Year of Birth", y = "Total Babies",
title = "Trend of Total Babies Born Each Year by Gender")
Overall, there is an increase in the total number of baby girls born each year, although there are some fluctuations, including a decrease during the periods 1925 to 1938 and 1960 to 1975. The number of baby boys throughout the years shows a similar trend, but compared to the final year of observation, the number of boys’ names is higher than girls’ names.
# Trend of babies named "Abbie" (top 1 girl popular name) in the US
filteredAbbie_df <- filter(baby, Name == "Abbie")
plot_ly(data = filteredAbbie_df, x = ~YearOfBirth, y = ~Number, type = "scatter", mode = "lines") %>%
layout(title = "Number of Babies Named Abbie in the US",
xaxis = list(title = "Year"),
yaxis = list(title = "Number of Babies"))
# Trend of babies named "Aaron" (top 1 boy popular name) in the US
filteredAaron_df <- filter(baby, Name == "Aaron")
plot_ly(data = filteredAaron_df, x = ~YearOfBirth, y = ~Number, type = "scatter", mode = "lines") %>%
layout(title = "Number of Babies Named Aaron in the US",
xaxis = list(title = "Year"),
yaxis = list(title = "Number of Babies"))