I wanted to analyze the difference between the baby names of republican states and democratic states. I also wanted to know if the number of babies named the president’s name throughout each president’s presidency was higher for the president’s respective party’s states. To decipher what states could be considered democratic and republican, I took the states from Pew Research Center’s study https://www.pewresearch.org/religion/religious-landscape-study/compare/party-affiliation/by/state/# and chose the states of which more than 50% of adults identified with a party. From doing so, I found that the democratic states were Delaware, Hawaii, Maryland, Massachusetts, New Jersey, New York, and Vermont. Further, I found that the republican states included Alabama, South Dakota, Utah, and Wyoming. These two sets of states represent the democrat and republican sets for this project.
Rows: 5647426 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Gender, State
dbl (3): Id, Year, Count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.0.2
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 5647426 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Name, Gender, State
dbl (3): Id, Year, Count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
First I will filter the statebabynames by each of the republican states to find the total number of each of the most popular names among the states within this dataset.
I am analyzing the most popular female names among the Republican states.
redGenF |>ggplot(aes(reorder(Name, total),total, fill = Name)) +geom_col() +coord_flip() -> topredF_plottopredF_plot +ggtitle('Top 10 Most Popular Girl Names in Republican States') +ylab('Number of female babies') +xlab('Name')
Next, I will plot the most popular male names among the Republican states.
redGenM |>ggplot(aes(reorder(Name, total),total, fill = Name)) +geom_col() +coord_flip() -> topredM_plottopredM_plot +ggtitle('Top 10 Most Popular Boy Names in Republican States') +ylab('Number of male babies') +xlab('Name')
I will now filter the statebabynames by each of the democratic states to find the total number of each of the most popular names among the states within this data set.
`summarise()` has grouped output by 'Name'. You can override using the
`.groups` argument.
Now that I took a look at the Republican states for men and women, I want to look at the Democratic states. First, I will filter each of democratic states by gender to first find the 10 most popular female names.
blueGenF |>ggplot(aes(reorder(Name, total),total, fill = Name)) +geom_col() +coord_flip() -> topblueF_plottopblueF_plot +ggtitle('Top 10 Most Popular Girl Names in Democratic States') +ylab('Number of female babies') +xlab("Name")
blueGenM |>ggplot(aes(reorder(Name, total),total, fill = Name)) +geom_col() +coord_flip() -> topblueM_plottopblueM_plot +ggtitle('Top 10 Most Popular Boy Names in Democratic States') +ylab('Number of male babies') +xlab("Name")
Next, I want to compare the proportion of baby names of each of the top ten female names among the republican states over time in 4 line graphs.
statebabynames_prop |>filter(Name %in% redGenF$Name) |>filter(Gender =="F") |>filter(State %in%c('AL', 'SD', 'UT', 'WY')) |>ggplot(aes(Year, Prop, color = Name)) +geom_line() +facet_wrap(~State) -> allredFstates_plotallredFstates_plot +ggtitle('Popularity of Each Top 10 Female Name in Each Republican State Over Time') +ylab('Name Popularity') +xlab('Year')
Then, I want to compare the proportion of baby names of each of the top ten male names among the republican states over time in 4 line graphs.
statebabynames_prop |>filter(Name %in% redGenM$Name) |>filter(Gender =="M") |>filter(State %in%c('AL', 'SD', 'UT', 'WY')) |>ggplot(aes(Year, Prop, color = Name)) +geom_line() +facet_wrap(~State) -> allredMstates_plotallredMstates_plot +ggtitle('Popularity of Each Top 10 Male Name in Each Republican State Over Time') +ylab('Name Popularity') +xlab('Year')
I did the same with the democratic states, plotting line charts for each to analyze the proportion of each state that carried the top 10 female baby names in those states over time.
statebabynames_prop |>filter(Name %in% blueGenF$Name) |>filter(Gender =="F") |>filter(State %in%c('DE', 'HI', 'MD', 'NJ','NY', 'VT' )) |>ggplot(aes(Year, Prop, color = Name)) +geom_line() +facet_wrap(~State) -> allblueFstates_plotallblueFstates_plot +ggtitle('Popularity of Each Top 10 Female Name in Each Democratic State Over Time') +ylab('Name Popularity') +xlab('Year')
Then, I want to plot the same to analyze the difference of proportions of the top 10 male baby names in each state over time.
filter(Name %in% blueGenM$Name) |>
filter(Gender == "M") |>
filter(State %in% c('DE', 'HI', 'MD', 'NJ', 'NY', 'VT')) |>
ggplot(aes(Year, Prop, color = Name)) + geom_line() +
facet_wrap(~State) -> allblueMstates_plot
allblueMstates_plot +
ggtitle('Popularity of Each Top 10 Male Name in Each Democratic State Over Time') +
ylab('Name Popularity') +
xlab('Year')
The most popular girl name in Republican states is Mary, followed by Linda and Betty.
The most popular girl name in democratic states is Mary, followed by Patricia and Barbara, while Linda and Betty are not present in the top 10 list. Within the Republican states, Mary was not popular in Alabama, and in the democratic states, Mary was most popular in New York.
The most popular boy name in Republican states is James, followed by John and William. The most popular boy name in Democratic states is John, followed by Robert and Michael, while James and William are present as well. Within the Republican states, James was most popular in Alabama, but the proportion of each name across all four states declined rapidly around 1950. Within the democratic states, specifically in New York, the most popular name began as John, and then over time it became Robert and then Michael. John There seems to be more differences in the most popular female names among the different states than among the boy names.
Now I will look at the popularity of each president’s name during his presidency.
First I will create color groups, ensuring that the democratic states will be shown as blue.
Then, I will create a full list of names to look at.
redGenM|>full_join(blueGenM) -> fullname_list
Joining with `by = join_by(Name, total)`
Here, I will first look at Richard Nixon’s presidency, seeing the change in baby names named the president changed throughout each year he held administration.
BlueRedNames_combo |>filter(Name %in%"Richard") |>filter(Year %in%c('1969', '1970', '1971', '1972', '1973', '1974')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Richard During Nixon's Presidency") -> Richard_plot
Adding the visualization
Richard_plot +xlab('Year') +ylab('Number of Babies named Richard ') +theme_linedraw()
It seems that during Nixon’s presidency, babies named Richard in democratic states declined more and at a much faster rate than in republican states.
Now I want to take a closer look at the name Richard, during his presidency, by each state.
BlueRedNames_combo |>filter(Name %in%"Richard") |>filter(Year %in%c('1969', '1970', '1971', '1972', '1973', '1974')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Richard During Nixon's Presidency") -> Richardallstates_plot
Adding the visualization
Richardallstates_plot +xlab('Year') +ylab('Number of Babies named Richard') +theme_linedraw()
Babies named Richard declined at about the same rate in each state with the exception of a large spike in New Jersey in 1972.
Now I will do the same for Gerald Ford
BlueRedNames_combo |>filter(Name %in%"Gerald") |>filter(Year %in%c('1974', '1975', '1976', '1977')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Gerald During Ford's Presidency") -> Gerald_plot
Adding the visualization
Gerald_plot +xlab('Year') +ylab('Number of Babies named Gerald') +theme_linedraw()
Babies named Gerald in republican states decreased much more than in democratic states, specifically in 1976, but babies named Gerald increased much more in republican states than in republican states from 1976-1977.
Now I want to take a closer look at the name Gerald, during his presidency, by each state.
BlueRedNames_combo |>filter(Name %in%'Gerald') |>filter(Year %in%c('1974', '1975', '1976', '1977')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Gerald During Ford’s Presidency") -> Geraldallstates_plot
Adding the visualization
Geraldallstates_plot +xlab('Year') +ylab('Number of Babies named Gerald') +theme_linedraw()
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
There does not seem to be any significant difference between the states.
Now, I will look at the same under Jimmy Carter’s Presidency.
BlueRedNames_combo |>filter(Name %in%"Jimmy") |>filter(Year %in%c('1977', '1978', '1979', '1980', '1981')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Jimmy During Carter's Presidency") -> Jimmy_plot
Adding the visualization
Jimmy_plot +xlab('Year') +ylab('Number of Babies named Jimmy') +theme_linedraw()
The amount of babies named Jimmy plummeted from 1976-1981, while the amount of babies named Jimmy in republican states rose from 1980-1981.
Now I want to take a closer look at the name Jimmy, during his presidency, by each state.
BlueRedNames_combo |>filter(Name %in%"Jimmy") |>filter(Year %in%c('1977', '1978', '1979', '1980', '1981')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Jimmy During Carter's Presidency") -> Jimmyallstates_plot
Adding the visualization
Jimmyallstates_plot +xlab('Year') +ylab('Number of Babies named Jimmy') +theme_linedraw()
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
There does not seem to be much of a significant difference between the states.
Now, I will look at the same under Ronald Reagan’s Presidency.
BlueRedNames_combo |>filter(Name %in%"Ronald") |>filter(Year %in%c('1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Ronald During Reagan's Presidency") -> Ronald_plot
Adding the visualization
Ronald_plot +xlab('Year') +ylab('Number of Babies named Ronald') +theme_linedraw()
While in democratic states, the name Ronald rose quickly twice and sunk, Ronald significantly decreased towards the end of his presidency while sharply increasing at the tail end of his term.
Now I want to take a closer look at the name Ronald, during his presidency, by each state
BlueRedNames_combo |>filter(Name %in%'Ronald') |>filter(Year %in%c('1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Ronald During Reagan’s Presidency") -> Ronaldallstates_plot
Adding the visualization
Ronaldallstates_plot +xlab('Year') +ylab('Number of Babies named Ronald') +theme_linedraw()
There is no apparent difference between the democratic and republican states with the exception of two sharp rises in the name Ronald under his presidency.
Now, I will look at the same under George H. W. Bush’s Presidency
BlueRedNames_combo |>filter(Name %in%"George") |>filter(Year %in%c('1989', '1990', '1991', '1992', '1993')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named George During H.W. Bush's Presidency") -> GeorgeHW_plot
Adding the visualization
GeorgeHW_plot +xlab('Year') +ylab('Number of Babies named George') +theme_linedraw()
In democratic states, the name George slowly and slightly decreased while it drastically sunk in republican states before rising at the end of his term.
Now I want to take a closer look at the name George, during his presidency, by each state
BlueRedNames_combo |>filter(Name %in%'George') |>filter(Year %in%c('1989', '1990', '1991', '1992', '1993')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named George During H.W. Bush’s Presidency") -> GeorgeHWallstates_plot
Adding the visualization
GeorgeHWallstates_plot +xlab('Year') +ylab('Number of Babies named George') +theme_linedraw()
There does not seem to be much of a difference among each of the states.
Now, I will look at the same under Bill Clinton’s Presidency
BlueRedNames_combo |>filter(Name %in%"Bill") |>filter(Year %in%c('1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Bill During Clinton's Presidency") -> Bill_plot
Adding the visualization
Bill_plot +xlab('Year') +ylab('Number of Babies named Bill') +theme_linedraw()
Due to the limitations of the scale, significantly more babies were named Bill in democratic states than in republican states and therefore are nonexistent within this graph. However, as Clinton was a Democratic president, this represents his comparative prominence.
Now I want to take a closer look at the name Bill, during his presidency, by each state.
BlueRedNames_combo |>filter(Name %in%'Bill') |>filter(Year %in%c('1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Bill During Clinton’s Presidency") -> Billallstates_plot
Adding the visualization
Billallstates_plot +xlab('Year') +ylab('Number of Babies named Bill') +theme_linedraw()
Just as there was not a significant amount of babies named bill in republican states, there was also not a significant amount of babies named Bill throughout each of the democratic states, and due to this, it cannot be represented on the scale of these graphs with the exception of New Jersey and New York, where both took a large decrease with an increase and quick decrease towards the end of Clinton’s term in New York.
Now, I will look at the same under George W. Bush’s Presidency
BlueRedNames_combo |>filter(Name %in%"George") |>filter(Year %in%c('2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named George During W. Bush's Presidency") -> WBush_plot
Adding the visualization
WBush_plot +xlab('Year') +ylab('Number of Babies named George') +theme_linedraw()
Now I want to take a closer look at the name George, during his presidency, by each state
BlueRedNames_combo |>filter(Name %in%'George') |>filter(Year %in%c('2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +theme_classic() +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named George During W.Bush’s Presidency") -> GeorgeWallstates_plot
Adding the visualization
GeorgeWallstates_plot +xlab('Year') +ylab('Number of Babies named George') +theme_linedraw()
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
Now, I will look at the same under Barack Obama’s Presidency, but the StateNames data set ends at 2014 so it is cut short.
BlueRedNames_combo |>filter(Name %in%"Barack") |>filter(Year %in%c('2009', '2010', '2011', '2012', '2013', '2014')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Barack During Obama's Presidency") -> Obama_plot
Adding the visualization
Obama_plot +xlab('Year') +ylab('Number of Babies named Barack') +theme_linedraw()
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
No line is shown from neither democratic nor republican states due to such a small amount of babies being named Barack during his term.
Now I want to take a closer look at the name Barack, during his presidency, by each state. The dataset ends at 2014, which is why I only took a look at his term until that year.
BlueRedNames_combo |>filter(Name %in%'Barack') |>filter(Year %in%c('2009', '2010', '2011', '2012', '2013', '2014')) |>filter(State %in%c('AL', 'SD', 'UT', 'WY', 'DE', 'NJ', 'NY', 'HI', 'MD', 'VT')) |>group_by(Color) |>ggplot(aes(Year, yearly_total, color = Color)) +stat_summary(fun = sum, geom ="line") +facet_wrap(~State) +scale_color_manual(values=c("#00AFBB", "#FC4E07")) +ggtitle("Number of Republicans and Democrats named Barack During Obama’s Presidency") -> Barackallstates_plot
Adding the visualization
Barackallstates_plot +xlab('Year') +ylab('Number of Babies named Barack') +theme_linedraw()
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
Unfortunately, due to such a low amount of babies being named Barack during his term, the amount is unable to be representted on the graph.
Conclusion: Through analyzing the popularity under each presidency, while the colors are flipped, it looks like under republican presidency, the amount of babies in republican states named the president changed more than among democratic state babies, and vice versa. The most interesting was under Gerald Ford’s presidency, in which republican babies named Gerald sky rotted towards the tail end of his term, and republican states’ babies named Jimmy plummeted during Carter’s presidency. Unfortunately, my hypothesis was wrong as there was no strong correlation between the political affiliation of the president and an increase in babies with his name in states that shared his party affiliation.