This report will focus on exploring the relationship between typical Jewish names & U.S. baby names
Something important to mention is that having a Jewish name does not mean that you are Jewish. On the counter not having a Jewish name does not mean that you are not Jewish. For this reason it is important to understand I am exploring Jewish names and not the Jewish people.
library(tidyverse)
library(rvest)
library(babynames)
library(ggthemes)
library(ggplot2)
Here is where I got the Jewish Boy Names: Popular Jewish Baby Boy Names
Here is where I got the Jewish Girl Names: Popular Jewish Baby Girl Names
These websites listed above were not a comprehensive list of Jewish boy names as there are some this list does not mention. Something else to mention that is essential is that it can be difficult to define a Jewish name because you don’t have to be Jewish to name your child a Jewish name. A great example of this is the name Michael, which is one of the most popular Jewish boynames in 2017. The name Michael is prominent in Judaism because of the Jewish phrase “mi-ka’el” which means “who is like god?” It is also popular in religions such as Christianity because of an arch-angel named Michael. For reasons like this it is very hard to identify specifically Jewish names.
boynames <- read_html("https://www.aish.com/jl/l/b/48967016.html")
boynames %>%
html_nodes("b") %>%
html_text() %>%
as.data.frame() -> boynames_table
colnames(boynames_table)[1] <- "name"
boynames_table <- boynames_table %>% slice(-c(1))
babynames %>%
inner_join(boynames_table) -> merged_boys
This is not a comprehensive list of Jewish girl names as there are some this list does not mention
girlnames <- read_html("https://www.aish.com/jl/l/b/48966261.html")
girlnames %>%
html_nodes("b") %>%
html_text() %>%
as.data.frame() -> girlnames_table
colnames(girlnames_table)[1] <- "name"
girlnames_table <- girlnames_table %>% slice(-c(1))
babynames %>%
inner_join(girlnames_table) -> merged_girls
merged_boys %>%
full_join(merged_girls) -> all_jewishnames
merged_boys %>%
filter(year %in% "2017" & sex %in% "M") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
head(10)-> topboys2017
ggplot(topboys2017, aes(reorder(name,n),n)) +
geom_col() +
coord_flip() -> topboys2017_plot
topboys2017_plot +
ggtitle('10 Most Popular Jewish Boy Names in 2017') +
ylab('Number of Boys') +
xlab('Name') +
theme_gdocs()
merged_girls %>%
filter(year %in% "2017" & sex %in% "F") %>%
arrange(desc(prop)) %>%
mutate(rank = row_number()) %>%
head(10)-> topgirls2017
ggplot(topgirls2017, aes(reorder(name,n),n)) +
geom_col() +
coord_flip() -> topgirls2017_plot
topgirls2017_plot +
ggtitle('10 Most Popular Jewish Girl Names in 2017') +
ylab('Number of Girls') +
xlab('Name') +
theme_gdocs()
# How to find total number of boy babies born with Jewish Names each year
merged_boys %>%
filter(sex %in% "M") %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> merged_boys_year_total
# How to find the total number of boy babies each year
babynames %>%
filter(sex %in% "M") %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> babynames_boys_year_total
# Equation to find the percentage of boy babies born with Jewish names each year
(merged_boys_year_total$year_total)/(babynames_boys_year_total$year_total) * 100 -> prop_jewish_boys_babynames_total
# How to add the years back into the dataset with the percentages
as.data.frame(prop_jewish_boys_babynames_total) -> boy_total
boy_total %>%
mutate(row = (row_number() + 1879)) -> boy_total
# Visualization for Boys
boy_total %>%
ggplot(aes(row, prop_jewish_boys_babynames_total)) +
geom_line() +
ggtitle("Proportion of Boy Babies Born with Jewish Names per Year") -> plot_prop_boy
plot_prop_boy +
xlab('Year') +
ylab("Percentage of Boy Babies Born with Jewish Names") +
theme_linedraw()
# How to find total number of girl babies born with Jewish Names each year
merged_girls %>%
filter(sex %in% "F") %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> merged_girls_year_total
# How to find the total number of girl babies each year
babynames %>%
filter(sex %in% "F") %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> babynames_girls_year_total
# Equation to find the percentage of girl babies born with Jewish names each year
(merged_girls_year_total$year_total)/(babynames_girls_year_total$year_total) * 100 -> prop_jewish_girls_babynames_total
# How to add the years back into the dataset with the percentages
as.data.frame(prop_jewish_girls_babynames_total) -> girl_total
girl_total %>%
mutate(row = (row_number() + 1879)) -> girl_total
# Visualization for Girls
girl_total %>%
ggplot(aes(row, prop_jewish_girls_babynames_total)) +
geom_line() +
ggtitle("Prop. of Girl Babies Born with Jewish Names per Year") -> plot_prop_girl
plot_prop_girl +
xlab('Year') +
ylab("Percentage of Girl Babies Born with Jewish Names") +
theme_linedraw()
# How to find total number of babies born with Jewish Names each year
all_jewishnames %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> merged_allnames_year_total
# How to find the total number of babies each year
babynames %>%
group_by(year) %>%
summarize(year_total = sum(n)) -> babynames_allnames_year_total
# Equation to find the percentage of babies born with Jewish names each year
(merged_allnames_year_total$year_total)/(babynames_allnames_year_total$year_total) * 100 -> prop_jewish_allnames_babynames_total
# How to add the years back into the dataset with the percentages
as.data.frame(prop_jewish_allnames_babynames_total) -> allnames_total
allnames_total %>%
mutate(row = (row_number() + 1879)) -> allnames_total
# Visualization for All
allnames_total %>%
ggplot(aes(row, prop_jewish_allnames_babynames_total)) +
geom_line() +
ggtitle("Prop. of Babies Born with Jewish Names per Year") -> plot_prop_allnames
plot_prop_allnames +
xlab('Year') +
ylab("Percentage of Babies Born with Jewish Names") +
theme_linedraw()
# Combining the datasets into one
boy_total %>%
full_join(girl_total) -> boys_girls_total
boys_girls_total %>%
full_join(allnames_total) -> boys_girls_all_total
# Creating variables for the dataset columns to shorten them
boys_girls_all_total$row -> year
prop_jewish_boys_babynames_total -> prop_jboys
prop_jewish_girls_babynames_total -> prop_jgirls
prop_jewish_allnames_babynames_total -> prop_jall
# Creating a color palet for the Legend
colors <- c("Boys" = "lightblue", "Girls" = "lightpink", "All" = "Black")
# The actual visualization
ggplot(boys_girls_all_total, aes(x=year)) +
geom_line(aes(y=prop_jboys, color="Boys"), size = 1.0) +
geom_line(aes(y=prop_jgirls, color="Girls"), size = 1.0) +
geom_line(aes(y=prop_jall, color="All"), size = 1.0) +
labs(x = "Year",
y = "Percentage of Babies Born with Jewish Names",
color = "Legend") +
scale_color_manual(values = colors) -> boys_girls_all_total_plot
boys_girls_all_total_plot +
ggtitle("Prop. of Babies Born with Jewish Names") +
xlab("Year") +
ylab("Percentage of Babies Born with Jewish Names") +
theme_linedraw()
After going through all the different visualizations and data I found my observations about the proportion of babies born with Jewish names per year were the most interesting. I found that the proportion of babies born with Jewish names has fluctuated greatly over the years. When comparing the differences between boys and girls the most noticeable thing is that before 1935 the proportion of boys with Jewish names was lower than girls but after it was the opposite. Another interesting observation is that Jewish boynames seemed to skyrocket to a max of 9% of the population in 1967. On the other hand Jewish girl names only reached a max of 4% of the population in 1915. In a more modern observation of the data it seems that both boy and girl Jewish names have decreased since 2000. While in the 2010s Jewish boy names has steadily decreased in proportion while Jewish girls has fluctuated.